Hi Sergiu,
On Jun 10, 2013, at 9:49 PM, Sergiu Dumitriu <sergiu(a)xwiki.org> wrote:
  On 06/10/2013 03:12 PM, Vincent Massol wrote:
 On Jun 10, 2013, at 8:25 PM, Sergiu Dumitriu <sergiu(a)xwiki.org> wrote:
  On 06/10/2013 11:00 AM, Thomas Mortagne wrote:
  Hi devs,
 Right now the XAR plugin format goal systematically empty the
 <defaultLanguage> property.
 This is wrong IMO since it means we have no idea what is the default
 document language, it was not too visible before but it's really not
 very nice for things like the localization module and especially SOLR
 which store deferently the content depending on the language (stop
 words, etc).
 I see several possibilities:
 1) We don't touch the XAR maven plugin and we state that when default
 language is not set, it's en (in the importer for example or in
 XWikiDocument#getDefaultLanguage)
 2) We stop filtering default language in the XAR plugin and we set it
 to en for all document in which it make sense
 3) We force default language to "en" in the XAR plugin
 WDYT ?
 I don't like too much 1) since some technical document could really be
 seen has having no default language, some document without any literal
 content. But it's more a -0 than a -1, I understand other would want
 this for simplicity.
 About 3) as I said having a default language empty is a valid use case
 IMO so -0 for this one to. Still a bit better than 1) since the use
 case is still possible.
 +1 for 2) 
 Neither option is good in general. The main problem is that most
 documents are written in the "Velocity" language, not in the
"English"
 language, meaning that it only contains code (which won't be seen by the
 user), and translations, which depend on a lot of factors. It's not good
 to say that the default language of a dynamically translated document is
 en, since a wiki configured with a different language will only display
 them in that language, never in en.
 There are only a few documents that contain real text (normally only the
 sandbox should have real text, everything else should be localized), and
 for those it's OK to specify the actual language. 
 I'm not sure I agree with this vision. It really depends on the use case. So far we
haven't found a perfect solution.
 Some pages will have more code than content, others will have more content than code. For
the former, keys are best and for the latter translations are best.
 In any case I don't understand the problem. What is the issue with saying that all
our pages are in English by default. If a wiki is configured to be in another language and
there's no translation for that language the default language (ie "English")
will be used. 
 There is no actual text in the document. How can you say that the
 language of
https://github.com/xwiki/xwiki-platform/blob/master/xwiki-platform-core/xwi…
 is English, since there's no English sentence in there? Depending on the
 configuration, the same document will appear in German, Chinese, even
 Klingon, without changing anything in the document, so it is definitely
 not an English document.
 Scenario: Set up a new XWiki instance, and change the default language
 of the wiki to German. When you browse the wiki, everything is in
 German. Yet all the documents say that they're in English.
 Problem 1: The wiki is indexed as English text, so searching for text
 that the user actually sees in the wiki won't return any results.
 Problem 2: Editing such a document will automatically create a
 translation, since the original document is in English, and the user
 wants to edit a German document. Since the two languages are not
 compatible, a translation will be created automatically. Now the code
 has been forked, and automatic updates using the Distribution Wizard
 will update the hidden English document, since that is the default one,
 while the forked translation will stay behind. 
Thanks a lot for describing these 2 use cases that I definitely wouldn't have thought
about! That's very useful.
So it seems that suddenly it's becoming more complex ;)
Basically it means that if we have documents that mix content and scripting we're
going to have issues:
* Either they're marked as having no default language and the english content will be
indexed in the default language of the wiki
* Either they're marked as "en" and the user will not have the scripts in
the search results and the DW/EM will update only the default version if the user has
created a translation
I'm sure we have lots of cases like this, the easiest one being the main home page:
-------------------
It's an easy-to-edit website that will help you work better together. This Wiki is
made of //pages// sorted by //spaces//. You're currently in the **Main** space,
looking at its home page (**WebHome**).
Learn how to use XWiki with the {{velocity}}[[Getting Started
Guide>>http://enterprise.xwiki.org/xwiki/bin/view/GettingStar…]{{/velocity}}.
{{velocity}}
#if($hasEdit)You can then use the [[Sandbox space>>Sandbox.WebHome]] to try
out your wiki's features.#end
{{/velocity}}
-------------------
It has both content and script… If it's marked as "en" then if the user
searches for "hasEdit" he won't get it if his wiki is in a language other
than "en".
Unless, if there's no translation in the language of the user then we return the
default language results for that page. Would that make sense?
But there's still the issue of editing the page, which will create a translation and
then imagine that we replace the velocity script in a future version, then the user will
only get his default page updated and not his translation… However that's a general
problem I guess...
If it has no default language (as is the case now BTW) then it seems less of an issue it
seems. It just means:
* If user searches for "work" he'll get result even though he's no in an
"en" wiki. But then he's searching for an english word too ;)
* any other downside?
In view of all this, it seems that not setting any default language is a lesser evil,
doesn't it?
Thanks
-Vincent
  That is why I'm saying that this kind of documents
don't have a
 language, and they never should have. They adapt themselves to the
 user's language, so they're written in no language, yet they can match
 all languages.
 Of course, not all documents are like this, as I originally stated
 myself, and there are valid cases where documents should have "en" as
 the default translation.
> What am I missing?
>
> (I'm not commenting on anything below yet because I feel it's important to
agree on what's before first)
>
> Thanks
> -Vincent
>
>> Other options:
>>
>> 4) Detect somehow localized documents and index:
>> - the raw content using a non-language-specific analyzer
>> - the content translated into all the languages registered in the
>> administration, each with the proper language-specific analyzer, if they
>> are supported by Solr; this includes the default wiki language.
>> 4a) localized document = the default language is empty
>> 4b) localized document = the default language is literally "localized"
>> 4c) add another document flag for marking localized documents
>>
>> 5) When the defaultLanguage is empty, render in the configured wiki
>> default language
>>
>>
>> I like 4) since it makes localized documents really searchable in all
>> the languages "supported" by that wiki instance.
>>
>> 4a) is a behavior change, so it might cause some trouble
>> 4b) is the safest and requires the least amount of changes
>> The number of document fields is increasing, so I'm not that fond of 4c)