Re: [xwiki-dev] Trying to understand I8N...

6 Apr 2007

Hi Vincent,
On Apr 06, Vincent Massol wrote :
...
  1) Is UTF8 supported on all platforms? Is it supported
on mobile
 platforms for example? 
I've had a quick look for mobile platforms. There is no simple answer.
In the java world, J2ME supports unicode and UTF8. But then if the
unicode aware fonts are not present in the device, there is not much
you can do. Yet I believe most of the modern PDA today have some form
of UTF-8 encoding support.
Concerning mobiles phones, some of them do have UTF-8 support, and some
of them do not. I have not found any comprehensive list.
The Nokia 770 in which I'm doing my mobile xwiki experiments does
support UTF-8.
...
  5) Jackson Wang is proposing in a patch to modify
readPackage like this:
      private Document readPackage(InputStream is) throws
 IOException, DocumentException
      {
 -        byte[] data = new byte[4096];
 +        //UTF-8 characters could cause encoding as continued bytes
 over 4096 boundary,
 +        // so change byte to char.  ---Jackson
 +        char[] data = new char[4096];
 +        BufferedReader in= new BufferedReader(new InputStreamReader
 (is));
          StringBuffer XmlFile = new StringBuffer();
          int Cnt;
 -        while ((Cnt = is.read(data, 0, 4096)) != -1) {
 +        while ((Cnt = in.read(data, 0, 4096)) != -1) {
              XmlFile.append(new String(data, 0, Cnt));
 -        }
 +       }
          return fromXml(XmlFile.toString());
      }
 However with my new understanding I'm not sure this would help as
 char are stored on 2 bytes in Java and UTF-8 encoding can store on up
 to 4 bytes. Am I correct? 
Yes I think you are. I do not believe this is reliable:
for once we should use the constructor String(data, 0, Cnt, encoding),
then there is the problem Jackson outlined: data buffer may cut the
last Unicode character's end.
Using a StringWriter instead of building intermediate
Strings, would make things easier.
...
  However, I would rather use
http://jakarta.apache.org/commons/io/api-
 release/org/apache/commons/io/IOUtils.html#toString
 (java.io.InputStream) than code it ourselves... Sounds safer,
 shorter, less maintenance, etc to me... :) 
I agree.
Pablo

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [xwiki-dev] Trying to understand I8N...