Hi Xavier, everybody,
After an upgrade to XWiki 0.9.793, I actually faced the same problem as
you. Strangely, I never faced the problem when running the previous
release of XWiki. The problem is related to the fact that JRCS is not
configured for supporting UNICODE. I have changed the "ArchiveParser.jj"
file for enabling UNICODE streams (see below), and I have regenerated
the parser with JavaCC, now it works better. I have uploaded the updated
jars to this page:
http://www.xwiki.org/xwiki/bin/view/Dev/CharactersSets
org.apache.commons.jrcs.rcs.ArchiveParser.jj file:
JAVA_UNICODE_ESCAPE=false; // RCS files are plain ASCII
[snip]
UNICODE_INPUT=false;
changed to:
JAVA_UNICODE_ESCAPE=true; // RCS files are plain ASCII
[snip]
UNICODE_INPUT=true;
I have also commented out following lines of
"com.xpn.xwiki.web.Utils.java", because content.length() does not return
the UTF-8 encoded content length, which shortens the pages... Or
converting the string to UTF-8 encoded bytes works fine, but is it
absolutely necessary to set the content length of the response?
if (context.getResponse() instanceof XWikiServletResponse) {
response.setContentLength(content.length());
}
I send the message to dev list too, as I assume the discussion should
continue on that list.
Stéphane
Xavier MOGHRABI wrote:
Hello
I configure as stephane explained me. I have the good character sets in
mysql :
| character_set_client | utf8
|
| character_set_connection | utf8
|
| character_set_database | utf8
|
| character_set_results | utf8
|
| character_set_server | utf8
|
| character_set_system | utf8
|
| character_sets_dir | /usr/share/mysql/charsets/
|
| collation_connection | utf8_general_ci
|
| collation_database | utf8_general_ci
|
| collation_server | utf8_general_ci
|
However, it still doesn't work.
I have an error like this :
Caused by: org.apache.commons.jrcs.rcs.TokenMgrError: Lexical error at line
37, column 2. Encountered: "\u606f" (24687), after :
"(a)\n\n\nMain\nTest2\n\nen\n0\n\nXWiki.XWikiGuest\n1117191196650\n1117191218887\n1.2\nJe
parle chinois !\n\u4fe1"
at
org.apache.commons.jrcs.rcs.ArchiveParserTokenManager.getNextToken(ArchiveParserTokenManager.java:800)
I can see well my chinese characters. It looks like there was an error where
tomcat reads them.
I start Tomcat with java option : -Dfile.encoding=UTF-8
my LANG variable is fr_FR.UTF-8 (I also test en_US.UTF-8)
I change encoding in web.xml and xwiki.cfg.
>It might be worth checking whether all character-set-* variables are
>correctly set to "utf8" in the MySQL variables tables. I copy below what
>I get when running a "SHOW VARIABLES" against a MySQL 4.1 server that I
>