Re: [xwiki-dev] Trying to understand I8N...

7 Apr 2007

      On Apr 7, 2007, at 10:49 AM, Gilles Serasset wrote:
...
Hi,
On 6 avr. 07, at 22:28, Vincent Massol wrote:
...
...
I did not do any extend test to see if this pârameter was useful.  
I can only say that teher are many places in the code where you  
have :
InputStreamReader ir = new InputStreamReader(is)
This is one of the numerous examples of badly writen code where  
the encoding is not specified... Hence the plateform falls back  
to the file.encoding property value.
yep. I'm not sure it's bad. Or at least I'm curious to understand  
why it's bad.
It may be good, but then, you'll need your input (skin) files to be  
delivered in that encoding... Well for now it is as it only has ascii.
... and it should remain like this and we should use native2ascii or  
something like that in our build to ensure it remains like this I think.
...
...
...
However, there is currently no accentuated chars in the skin  
files (that are read on disk), hence it somehow works, because  
all plateform encodings do share the encoding of ASCII chars. It  
will ne be the same if you were to use UTF-16 for instance...
IMO the encoding to use should be left to the user and be a  
configuration option (as it is now) but we should configure  
everything to use UTF8 by default.
...
...
3) I see that in our standalone installation we use - 
Dfile.encoding=iso-8859-1. Now that I've read Joel's tutorial it  
seems to me this is not going to work for everyone and that we  
should rather use -Dfile.encoding=UTF-8 by default. WDYT?
This will mean that all files that are read by the server, will  
have to be encoded in UTF-8...
Or any compatible encoding like ISO 8859-1, etc. This is the case  
now I think.
ISO latin 1 IS NOT compatible with UTF-8... only ASCII (7bits) is...
...
...
(NOTE: resource files (as the one used in xwiki I18N are special  
as they should be encoded in ASCII with \uXXXX to represent non  
ascii chars).
...
4) Should we use the platform encoding or default to using UTF-8  
all the time? (this question is related to 1)). I think we  
should use the platform encoding but I'm curious to know what  
others think.
We should NOT use the plateform encoding. The reason is that all  
files read by the server (skin files mainly) will be read using  
the plateform encoding and their actual encoding. As they only  
contain ascii chars upo to now, it worked, but, if you add  
accents in them, and you give write them in encoding X (at edit  
time), you are not guarranteed that the plateform encoding will  
by X at run time. Hence you should specify the file encoding  
whenever you read a file.
Exactly which is why this is best left to the user to decide which  
encoding they need to use... I don't think we should force our  
encoding. However I'm proposing that we do: System.setProperty 
("file.encoding", getParam("xwiki.encoding")) in XWiki  
initialization to set the platform encoding to be the encoding  
specified in xwiki.cfg.
That's a good idea...
[snip]
...
... applied!
good
...
...
...
However, I would rather use http://jakarta.apache.org/commons/io/ 
api-release/org/apache/commons/io/IOUtils.html#toString 
(java.io.InputStream) than code it ourselves... Sounds safer,  
shorter, less maintenance, etc to me... :)
This method has exaclty the same problem, it'll use the plateform  
encoding, event if the inputstream is not encoded in the  
plateform encoding and even if it correctly declares its own  
encoding... Hence it will be buggy.
Sure but that's ok if the encoding is specified (file.encoding),  
right? That said I agree that no conversion is better.
Well, not here, as the package file is a file that has been  
produced by somebody else, on another plateform, hence either we  
decide that all files are always UTF-8, or it is encoded in the  
producer's plateform encoding, not the one that is used to read  
it... That's why we have to delegate encoding detection to the xml  
parser.
Good point. I agree.

[snip]

Thanks
-Vincent

PS: Thanks for everyone's help in bringing me up to date on I18N. I'm  
slowly starting to understand how that works... ;-)