[xwiki-dev] Trying to understand I8N...
Pablo Oliveira
pablo.oliveira at enst.fr
Fri Apr 6 13:33:37 CEST 2007
Hi Vincent,
On Apr 06, Vincent Massol wrote :
> 1) Is UTF8 supported on all platforms? Is it supported on mobile
> platforms for example?
I've had a quick look for mobile platforms. There is no simple answer.
In the java world, J2ME supports unicode and UTF8. But then if the
unicode aware fonts are not present in the device, there is not much
you can do. Yet I believe most of the modern PDA today have some form
of UTF-8 encoding support.
Concerning mobiles phones, some of them do have UTF-8 support, and some
of them do not. I have not found any comprehensive list.
The Nokia 770 in which I'm doing my mobile xwiki experiments does
support UTF-8.
> 5) Jackson Wang is proposing in a patch to modify readPackage like this:
>
> private Document readPackage(InputStream is) throws
> IOException, DocumentException
> {
> - byte[] data = new byte[4096];
> + //UTF-8 characters could cause encoding as continued bytes
> over 4096 boundary,
> + // so change byte to char. ---Jackson
> + char[] data = new char[4096];
> + BufferedReader in= new BufferedReader(new InputStreamReader
> (is));
> StringBuffer XmlFile = new StringBuffer();
> int Cnt;
> - while ((Cnt = is.read(data, 0, 4096)) != -1) {
> + while ((Cnt = in.read(data, 0, 4096)) != -1) {
> XmlFile.append(new String(data, 0, Cnt));
> - }
> + }
> return fromXml(XmlFile.toString());
> }
>
> However with my new understanding I'm not sure this would help as
> char are stored on 2 bytes in Java and UTF-8 encoding can store on up
> to 4 bytes. Am I correct?
Yes I think you are. I do not believe this is reliable:
for once we should use the constructor String(data, 0, Cnt, encoding),
then there is the problem Jackson outlined: data buffer may cut the
last Unicode character's end.
Using a StringWriter instead of building intermediate
Strings, would make things easier.
> However, I would rather use http://jakarta.apache.org/commons/io/api-
> release/org/apache/commons/io/IOUtils.html#toString
> (java.io.InputStream) than code it ourselves... Sounds safer,
> shorter, less maintenance, etc to me... :)
I agree.
Pablo
More information about the devs
mailing list