On Jun 10, 2013, at 8:25 PM, Sergiu Dumitriu <sergiu(a)xwiki.org> wrote:
On 06/10/2013 11:00 AM, Thomas Mortagne wrote:
Hi devs,
Right now the XAR plugin format goal systematically empty the
<defaultLanguage> property.
This is wrong IMO since it means we have no idea what is the default
document language, it was not too visible before but it's really not
very nice for things like the localization module and especially SOLR
which store deferently the content depending on the language (stop
words, etc).
I see several possibilities:
1) We don't touch the XAR maven plugin and we state that when default
language is not set, it's en (in the importer for example or in
XWikiDocument#getDefaultLanguage)
2) We stop filtering default language in the XAR plugin and we set it
to en for all document in which it make sense
3) We force default language to "en" in the XAR plugin
WDYT ?
I don't like too much 1) since some technical document could really be
seen has having no default language, some document without any literal
content. But it's more a -0 than a -1, I understand other would want
this for simplicity.
About 3) as I said having a default language empty is a valid use case
IMO so -0 for this one to. Still a bit better than 1) since the use
case is still possible.
+1 for 2)
Neither option is good in general. The main problem is that most
documents are written in the "Velocity" language, not in the
"English"
language, meaning that it only contains code (which won't be seen by the
user), and translations, which depend on a lot of factors. It's not good
to say that the default language of a dynamically translated document is
en, since a wiki configured with a different language will only display
them in that language, never in en.
There are only a few documents that contain real text (normally only the
sandbox should have real text, everything else should be localized), and
for those it's OK to specify the actual language.
I'm not sure I agree with this vision. It really depends on the use case. So far we
haven't found a perfect solution.
Some pages will have more code than content, others will have more content than code. For
the former, keys are best and for the latter translations are best.
In any case I don't understand the problem. What is the issue with saying that all our
pages are in English by default. If a wiki is configured to be in another language and
there's no translation for that language the default language (ie "English")
will be used.
What am I missing?
(I'm not commenting on anything below yet because I feel it's important to agree
on what's before first)
Thanks
-Vincent
Other options:
4) Detect somehow localized documents and index:
- the raw content using a non-language-specific analyzer
- the content translated into all the languages registered in the
administration, each with the proper language-specific analyzer, if they
are supported by Solr; this includes the default wiki language.
4a) localized document = the default language is empty
4b) localized document = the default language is literally "localized"
4c) add another document flag for marking localized documents
5) When the defaultLanguage is empty, render in the configured wiki
default language
I like 4) since it makes localized documents really searchable in all
the languages "supported" by that wiki instance.
4a) is a behavior change, so it might cause some trouble
4b) is the safest and requires the least amount of changes
The number of document fields is increasing, so I'm not that fond of 4c)
--
Sergiu Dumitriu
http://purl.org/net/sergiu