Trying to understand I8N...
Zeljko Trogrlic
zeljko_t at post.htnet.hr
Sat Apr 7 19:49:58 CEST 2007
As a person from non-8859-1 country, I have some experience with such
problems.
Vincent Massol wrote:
> 1) Is UTF8 supported on all platforms? Is it supported on mobile
> platforms for example?
All MIDP 2 devices I worked with supported it.
> 3) I see that in our standalone installation we use
> -Dfile.encoding=iso-8859-1. Now that I've read Joel's tutorial it seems
> to me this is not going to work for everyone and that we should rather
> use -Dfile.encoding=UTF-8 by default. WDYT?
That is problem if it's not your default encoding. You have two options:
* use platform default encoding and don't use non-ASCII characters in
default configuration
* use UTF-8
Although UTF-8 sounds better, note that you:
* need an editor that supports it, otherwise local encoding will creep in
* encoding must be set manually because encoding can't be detected for
plain text files
* you have to communicate this very clearly to users
* text will look funny in non-UTF-8 editor and it will be hard to
change it
> 4) Should we use the platform encoding or default to using UTF-8 all the
> time? (this question is related to 1)). I think we should use the
> platform encoding but I'm curious to know what others think.
See previous. you should either stick to UTF-8 or platform.
> 5) Jackson Wang is proposing in a patch to modify readPackage like this:
>
> private Document readPackage(InputStream is) throws IOException,
> DocumentException
> {
> - byte[] data = new byte[4096];
> + //UTF-8 characters could cause encoding as continued bytes over
> 4096 boundary,
> + // so change byte to char. ---Jackson
> + char[] data = new char[4096];
> + BufferedReader in= new BufferedReader(new InputStreamReader(is));
> StringBuffer XmlFile = new StringBuffer();
> int Cnt;
> - while ((Cnt = is.read(data, 0, 4096)) != -1) {
> + while ((Cnt = in.read(data, 0, 4096)) != -1) {
> XmlFile.append(new String(data, 0, Cnt));
> - }
> + }
> return fromXml(XmlFile.toString());
> }
>
> However with my new understanding I'm not sure this would help as char
> are stored on 2 bytes in Java and UTF-8 encoding can store on up to 4
> bytes. Am I correct?
I don't know what do you read there, but Java can handle encoding for
you if you tell her.
> However, I would rather use
> http://jakarta.apache.org/commons/io/api-release/org/apache/commons/io/IOUtils.html#toString(java.io.InputStream)
> than code it ourselves... Sounds safer, shorter, less maintenance, etc
> to me... :)
If it adds value. I think that XWiki is plagued with different libraries
doing the same thing or adding small amount of functionality. This makes
it harder to analyse.
Another place where to avoid local encoding: some source code files
contain French characters, which are messed up on non-8859-1 platforms.
More information about the devs
mailing list