As a person from non-8859-1 country, I have some experience with such
problems.
Vincent Massol wrote:
1) Is UTF8 supported on all platforms? Is it supported
on mobile
platforms for example?
All MIDP 2 devices I worked with supported it.
3) I see that in our standalone installation we use
-Dfile.encoding=iso-8859-1. Now that I've read Joel's tutorial it seems
to me this is not going to work for everyone and that we should rather
use -Dfile.encoding=UTF-8 by default. WDYT?
That is problem if it's not your default encoding. You have two options:
* use platform default encoding and don't use non-ASCII characters in
default configuration
* use UTF-8
Although UTF-8 sounds better, note that you:
* need an editor that supports it, otherwise local encoding will creep in
* encoding must be set manually because encoding can't be detected for
plain text files
* you have to communicate this very clearly to users
* text will look funny in non-UTF-8 editor and it will be hard to
change it
4) Should we use the platform encoding or default to
using UTF-8 all the
time? (this question is related to 1)). I think we should use the
platform encoding but I'm curious to know what others think.
See previous. you should either stick to UTF-8 or platform.
5) Jackson Wang is proposing in a patch to modify
readPackage like this:
private Document readPackage(InputStream is) throws IOException,
DocumentException
{
- byte[] data = new byte[4096];
+ //UTF-8 characters could cause encoding as continued bytes over
4096 boundary,
+ // so change byte to char. ---Jackson
+ char[] data = new char[4096];
+ BufferedReader in= new BufferedReader(new InputStreamReader(is));
StringBuffer XmlFile = new StringBuffer();
int Cnt;
- while ((Cnt = is.read(data, 0, 4096)) != -1) {
+ while ((Cnt = in.read(data, 0, 4096)) != -1) {
XmlFile.append(new String(data, 0, Cnt));
- }
+ }
return fromXml(XmlFile.toString());
}
However with my new understanding I'm not sure this would help as char
are stored on 2 bytes in Java and UTF-8 encoding can store on up to 4
bytes. Am I correct?
I don't know what do you read there, but Java can handle encoding for
you if you tell her.
However, I would rather use
http://jakarta.apache.org/commons/io/api-release/org/apache/commons/io/IOUt…
than code it ourselves... Sounds safer, shorter, less maintenance, etc
to me... :)
If it adds value. I think that XWiki is plagued with different libraries
doing the same thing or adding small amount of functionality. This makes
it harder to analyse.
Another place where to avoid local encoding: some source code files
contain French characters, which are messed up on non-8859-1 platforms.