Re: Trying to understand I8N...

7 Apr 2007

As a person from non-8859-1 country, I have some experience with such
problems.
Vincent Massol wrote:
...
  1) Is UTF8 supported on all platforms? Is it supported
on mobile
 platforms for example? 
All MIDP 2 devices I worked with supported it.
...
  3) I see that in our standalone installation we use
 -Dfile.encoding=iso-8859-1. Now that I've read Joel's tutorial it seems
 to me this is not going to work for everyone and that we should rather
 use -Dfile.encoding=UTF-8 by default. WDYT? 
That is problem if it's not your default encoding. You have two options:
  * use platform default encoding and don't use non-ASCII characters in
default configuration
  * use UTF-8
Although UTF-8 sounds better, note that you:
  * need an editor that supports it, otherwise local encoding will creep in
  * encoding must be set manually because encoding can't be detected for
plain text files
  * you have to communicate this very clearly to users
  * text will look funny in non-UTF-8 editor and it will be hard to
change it
...
  4) Should we use the platform encoding or default to
using UTF-8 all the
 time? (this question is related to 1)). I think we should use the
 platform encoding but I'm curious to know what others think. 
See previous. you should either stick to UTF-8 or platform.
...
  5) Jackson Wang is proposing in a patch to modify
readPackage like this:
      private Document readPackage(InputStream is) throws IOException,
 DocumentException
      {
 -        byte[] data = new byte[4096];
 +        //UTF-8 characters could cause encoding as continued bytes over
 4096 boundary,
 +        // so change byte to char.  ---Jackson
 +        char[] data = new char[4096];
 +        BufferedReader in= new BufferedReader(new InputStreamReader(is));
          StringBuffer XmlFile = new StringBuffer();
          int Cnt;
 -        while ((Cnt = is.read(data, 0, 4096)) != -1) {
 +        while ((Cnt = in.read(data, 0, 4096)) != -1) {
              XmlFile.append(new String(data, 0, Cnt));
 -        }
 +       }
          return fromXml(XmlFile.toString());
      }
 However with my new understanding I'm not sure this would help as char
 are stored on 2 bytes in Java and UTF-8 encoding can store on up to 4
 bytes. Am I correct? 
I don't know what do you read there, but Java can handle encoding for
you if you tell her.
...
  However, I would rather use
http://jakarta.apache.org/commons/io/api-release/org/apache/commons/io/IOUt…
 than code it ourselves... Sounds safer, shorter, less maintenance, etc
 to me... :) 
If it adds value. I think that XWiki is plagued with different libraries
doing the same thing or adding small amount of functionality. This makes
it harder to analyse.
Another place where to avoid local encoding: some source code files
contain French characters, which are messed up on non-8859-1 platforms.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: Trying to understand I8N...