On 06/10/2013 03:12 PM, Vincent Massol wrote:
On Jun 10, 2013, at 8:25 PM, Sergiu Dumitriu <sergiu(a)xwiki.org> wrote:
On 06/10/2013 11:00 AM, Thomas Mortagne wrote:
Hi devs,
Right now the XAR plugin format goal systematically empty the
<defaultLanguage> property.
This is wrong IMO since it means we have no idea what is the default
document language, it was not too visible before but it's really not
very nice for things like the localization module and especially SOLR
which store deferently the content depending on the language (stop
words, etc).
I see several possibilities:
1) We don't touch the XAR maven plugin and we state that when default
language is not set, it's en (in the importer for example or in
XWikiDocument#getDefaultLanguage)
2) We stop filtering default language in the XAR plugin and we set it
to en for all document in which it make sense
3) We force default language to "en" in the XAR plugin
WDYT ?
I don't like too much 1) since some technical document could really be
seen has having no default language, some document without any literal
content. But it's more a -0 than a -1, I understand other would want
this for simplicity.
About 3) as I said having a default language empty is a valid use case
IMO so -0 for this one to. Still a bit better than 1) since the use
case is still possible.
+1 for 2)
Neither option is good in general. The main problem is that most
documents are written in the "Velocity" language, not in the
"English"
language, meaning that it only contains code (which won't be seen by the
user), and translations, which depend on a lot of factors. It's not good
to say that the default language of a dynamically translated document is
en, since a wiki configured with a different language will only display
them in that language, never in en.
There are only a few documents that contain real text (normally only the
sandbox should have real text, everything else should be localized), and
for those it's OK to specify the actual language.
I'm not sure I agree with this vision. It really depends on the use case. So far we
haven't found a perfect solution.
Some pages will have more code than content, others will have more content than code. For
the former, keys are best and for the latter translations are best.
In any case I don't understand the problem. What is the issue with saying that all
our pages are in English by default. If a wiki is configured to be in another language and
there's no translation for that language the default language (ie "English")
will be used.
There is no actual text in the document. How can you say that the
language of
https://github.com/xwiki/xwiki-platform/blob/master/xwiki-platform-core/xwi…
is English, since there's no English sentence in there? Depending on the
configuration, the same document will appear in German, Chinese, even
Klingon, without changing anything in the document, so it is definitely
not an English document.
Scenario: Set up a new XWiki instance, and change the default language
of the wiki to German. When you browse the wiki, everything is in
German. Yet all the documents say that they're in English.
Problem 1: The wiki is indexed as English text, so searching for text
that the user actually sees in the wiki won't return any results.
Problem 2: Editing such a document will automatically create a
translation, since the original document is in English, and the user
wants to edit a German document. Since the two languages are not
compatible, a translation will be created automatically. Now the code
has been forked, and automatic updates using the Distribution Wizard
will update the hidden English document, since that is the default one,
while the forked translation will stay behind.
That is why I'm saying that this kind of documents don't have a
language, and they never should have. They adapt themselves to the
user's language, so they're written in no language, yet they can match
all languages.
Of course, not all documents are like this, as I originally stated
myself, and there are valid cases where documents should have "en" as
the default translation.
What am I missing?
(I'm not commenting on anything below yet because I feel it's important to agree
on what's before first)
Thanks
-Vincent
> Other options:
>
> 4) Detect somehow localized documents and index:
> - the raw content using a non-language-specific analyzer
> - the content translated into all the languages registered in the
> administration, each with the proper language-specific analyzer, if they
> are supported by Solr; this includes the default wiki language.
> 4a) localized document = the default language is empty
> 4b) localized document = the default language is literally "localized"
> 4c) add another document flag for marking localized documents
>
> 5) When the defaultLanguage is empty, render in the configured wiki
> default language
>
>
> I like 4) since it makes localized documents really searchable in all
> the languages "supported" by that wiki instance.
>
> 4a) is a behavior change, so it might cause some trouble
> 4b) is the safest and requires the least amount of changes
> The number of document fields is increasing, so I'm not that fond of 4c)
--
Sergiu Dumitriu
http://purl.org/net/sergiu