On 2/26/07, Gilles Serasset <Gilles.Serasset@imag.fr> wrote:

Hi all,

I'm currently working on allowing xwiki to manage documents (and
their urls) in utf-8 or other non latin1 encodings.

I saw that the method:

  public static String getURLEncoded(String content)
     {
         try {
             return URLEncoder.encode(content, "UTF-8");
         } catch (UnsupportedEncodingException e) {
             return content;
         }
     }
in XWiki class is hardcoded in UTF-8, which is strange as the default
encoding of xwiki is iso-latin-1.

The method is used in the core source:
1. To prepare "Content-disposition" headers for the responses (for
package export and file download)
    --> it encodes the filename for file downloads.
2. To generate ids of TOC in TOCGenerator

It is also used through velocity macros (mainly editrights, to allow
passing of a full URL, with GET attributes as a simple attribute
value usually for xredirect).

Hence it is a problem as soon as a document can have an url involving
non ascii characters.

Currently, everything works because the encoded URL do not include
non ascii chars as it is used in few places, but this method will
pose problem even in a default wiki (i.e. latin1) settings. Moreover,
this method is static and it is not possible to fetch the current
xwiki encoding.

So I propose to:

1. make this method non static and use the xwiki configuration to
specify the encoding to be used...
2. propose a way to encode filenames of content disposition which is
compatible with RFC 2231 which allows the specification of filenames,
even if they do contains non ascii chars (names in japanese of thai
for instance...)

Does anyone object against this proposal ?

Regards, Gilles,
--
Gilles Sérasset
GETA-CLIPS-IMAG (UJF, INPG & CNRS)
BP 53 - F-38041 Grenoble Cedex 9
Phone: +33 4 76 51 43 80
Fax:   +33 4 76 44 66 75