On 12/07/2012 04:26 PM, Vincent Massol wrote:
Hi,
On Dec 7, 2012, at 9:59 PM, Sergiu Dumitriu <sergiu(a)xwiki.org> wrote:
Hi devs,
We've moved more and more toward an UTF-8-only application, and XWiki
has only been tested with this configuration for several years.
I propose that we require UTF-8 for a valid, supported installation.
This means:
- JVM encoding (-Dfile.encoding=UTF8)
- Container default URL encoding (Tomcat has ISO-8859-1 by default)
- Database encoding (MySql is still configured with latin1 on some distros)
There's one big site to update on our side:
xwiki.org.
Here's my +1. This is a move toward a future web, since more and more
standards require (or at least assume as a default) UTF-8.
After thinking a bit more, it would make sense to require a valid
Unicode encoding, including UTF-16, which is preferable in countries
that don't use a latin alphabet. However, XWiki doesn't currently work
under 16-bit encodings at all.
For XWiki 4.x I'm -1 since it's a big change and we don't want to break our
users that currently use 4.x with ISO8859-1 for example
For XWiki 5.x I'm not sure.
To be able to answer I need to understand more. For example what currently doesn't
work with any encoding the user wants to use? Shouldn't we just be transparent and use
whatever encoding is specified and not hardcode anything?
Non-ASCII-compatible encodings don't work at least because of the way we
read components.txt.
For other encodings, the problem is that you never know when something
will break. Things may appear to work, but enter a non-ASCII character and:
- the data might be discarded
- the database might throw an exception
-- if a non-transactional database engine is used, then all future
access to the document or history or object properties will fail
- the document might become inaccessible since the URL is decoded
incorrectly
- The PDF export might break
- Escaped XML entities might appear in the browser instead of characters
(just from the top of my head)
This isn't a proposal to change our rules, it's a proposal to make
explicit what we've been doing anyway. There have been many issues and
emails where our answer starts with "make sure your <component> encoding
is set to UTF-8".
On
xwiki.org users often try to use non-ASCII characters, and that
doesn't work, and so we might be losing potential users if they assume
that XWiki simply doesn't support their language.
--
Sergiu Dumitriu
http://purl.org/net/sergiu