[xwiki-dev] Re: Trying to understand I8N...

Vincent Massol vincent at massol.net
Sat Apr 7 22:04:02 CEST 2007


Hi Zeljko,

On Apr 7, 2007, at 7:49 PM, Zeljko Trogrlic wrote:

[snip]

>> 3) I see that in our standalone installation we use - 
>> Dfile.encoding=iso-8859-1. Now that I've read Joel's tutorial it  
>> seems to me this is not going to work for everyone and that we  
>> should rather use -Dfile.encoding=UTF-8 by default. WDYT?
>
> That is problem if it's not your default encoding. You have two  
> options:
>
>  * use platform default encoding and don't use non-ASCII characters  
> in default configuration
>  * use UTF-8
>
> Although UTF-8 sounds better, note that you:
>  * need an editor that supports it, otherwise local encoding will  
> creep in
>  * encoding must be set manually because encoding can't be detected  
> for plain text files
>  * you have to communicate this very clearly to users
>  * text will look funny in non-UTF-8 editor and it will be hard to  
> change it

Let's look at the files xwiki manipulates:

- config files. These ones should only contain ASCII characters and  
unicode code points when there's a need as with resource bundles for  
example. Thus all encoding will work there.
- XAR files. If these are created with XWiki (with an export) they'll  
use the file.encoding specified so if it's utf8 they'll be saved in  
utf8. In addition, I propose that in our build we run native2ascii  
for all our data files (including the XAR files). This can be done  
automatically easily with maven. So all XAR files the XWiki team  
provides should work will work with any encoding.
- java files: should be using only ascii chars

That's about it I think.

[snip]

>> However, I would rather use http://jakarta.apache.org/commons/io/ 
>> api-release/org/apache/commons/io/IOUtils.html#toString 
>> (java.io.InputStream) than code it ourselves... Sounds safer,  
>> shorter, less maintenance, etc to me... :)
>
> If it adds value. I think that XWiki is plagued with different  
> libraries doing the same thing or adding small amount of  
> functionality. This makes it harder to analyse.

I'm not I would have used the word "plagued" which has a negative  
connotation... I would rather have said: "thanks to the effort of  
others in OSS we have been able to develop XWiki to a level we  
wouldn't have been able to reach otherwise... This allows us to  
reduce our maintenance efforts, our documentation efforts and our  
testing efforts..." :-)

Now if you notice 2 libraries used in XWiki that do the same thing  
let us know so that we can all decide if we want to remove one and  
only use one. I'd be in favor of that wherever possible.

I've noticed a few places myself where I think the wrong library was  
chosen IMO (like when we use Jakarta ECS for something completely  
unrelated). There are also places where the choice was historic: like  
using ORO when the Regex is now in JDK 1.4 (this has already been  
identified).

> Another place where to avoid local encoding: some source code files  
> contain French characters, which are messed up on non-8859-1  
> platforms.

Ah we need to track these down. Could you please let us know which  
files?

Thanks
-Vincent





More information about the devs mailing list