Hi Vincent,
On Apr 06, Vincent Massol wrote :
1) Is UTF8 supported on all platforms? Is it supported
on mobile
platforms for example?
I've had a quick look for mobile platforms. There is no simple answer.
In the java world, J2ME supports unicode and UTF8. But then if the
unicode aware fonts are not present in the device, there is not much
you can do. Yet I believe most of the modern PDA today have some form
of UTF-8 encoding support.
Concerning mobiles phones, some of them do have UTF-8 support, and some
of them do not. I have not found any comprehensive list.
The Nokia 770 in which I'm doing my mobile xwiki experiments does
support UTF-8.
5) Jackson Wang is proposing in a patch to modify
readPackage like this:
private Document readPackage(InputStream is) throws
IOException, DocumentException
{
- byte[] data = new byte[4096];
+ //UTF-8 characters could cause encoding as continued bytes
over 4096 boundary,
+ // so change byte to char. ---Jackson
+ char[] data = new char[4096];
+ BufferedReader in= new BufferedReader(new InputStreamReader
(is));
StringBuffer XmlFile = new StringBuffer();
int Cnt;
- while ((Cnt = is.read(data, 0, 4096)) != -1) {
+ while ((Cnt = in.read(data, 0, 4096)) != -1) {
XmlFile.append(new String(data, 0, Cnt));
- }
+ }
return fromXml(XmlFile.toString());
}
However with my new understanding I'm not sure this would help as
char are stored on 2 bytes in Java and UTF-8 encoding can store on up
to 4 bytes. Am I correct?
Yes I think you are. I do not believe this is reliable:
for once we should use the constructor String(data, 0, Cnt, encoding),
then there is the problem Jackson outlined: data buffer may cut the
last Unicode character's end.
Using a StringWriter instead of building intermediate
Strings, would make things easier.
However, I would rather use
http://jakarta.apache.org/commons/io/api-
release/org/apache/commons/io/IOUtils.html#toString
(java.io.InputStream) than code it ourselves... Sounds safer,
shorter, less maintenance, etc to me... :)
I agree.
Pablo