On Jun 11, 2013, at 11:09 AM, Denis Gervalle <dgl(a)softec.lu> wrote:
Thanks Sergui for catching up here, while I was off.
This is a situation I claim since a long time now, which is that some
document has simply no language, because all text displayed by those
document are simply produced by the localization module. Not only the
search issue are affected, but also the display of language selection in a
multi-wiki, and currently this is not as nice as it should be.
IMO, with a wiki that could produce with a single source documents,
displayed document in all available translation of the wiki (like the UI
does), we need to have the notion of a "no language" or "any
language"
document. In my own projects, I have been able to manage that properly
with the empty "default language" and empty "language" case. Since
having
both is stupid, we will surely merge those column in the future, but we
need to keep the idea of a "no language" or "any language" document,
whatever you see it, and to properly manage it, not only for our own XAR
but for user produced documents as well.
For those reasons, I am -1 for 1) and 3), and +1 for 2)
BTW I forgot to say in the my previous reply but obviously I remove my -1 for 2) since
it's much more complex than I thought.
I follow Thomas, the Welcome page is not a good
example, since this on has
really a translation in all languages.
I do not understand the intend to index code statement in an index, this is
the rendered document that should be indexed. And for "any language"
document, it should be indexed separately for each language enable in a
multi-language wiki.
ok, that's something very important to decide.
I think the 2 use cases are valide; xwiki can be used both by end users who only care
about pure content and it can also be used by developers who develop wiki pages with
scripts inside. For the former they want to see only content result and for the latter
they want to see script results. For example, as a dev I want to see where I call such
velocity macro, or where I use such rendering macro.
So I really believe that we need to index both types of content: rendered and raw.
Then we need to decide how to present that in the UI, but it could an option in the
advanced search to include raw results too for example (and remove duplications…).
So, to respond to Vincent, using properly the
"any language" case by
clearing the default language where needed, is the right way to go, but we
also need to manage that special case properly elsewhere (indexing,
language selection on the page,…)
So what's the rule for putting <defaultLanguage>en</defaultLanguage>?
Whenever there's a page having at least one word of English in the rendered result?
Thanks
-Vincent
On Tue, Jun 11, 2013 at 10:54 AM, Thomas Mortagne
<thomas.mortagne(a)xwiki.com
wrote:
> That's not the home page, that's the Welcome page and it's not a very
> good example since this page do have translations already so we
> already decided what we wanted for this pages in practice: each
> translation of the page copy the scripts.
>
> On Tue, Jun 11, 2013 at 10:19 AM, Vincent Massol <vincent(a)massol.net>
wrote:
>> Hi Sergiu,
>>
>> On Jun 10, 2013, at 9:49 PM, Sergiu Dumitriu <sergiu(a)xwiki.org
wrote:
>>
>>> On 06/10/2013 03:12 PM, Vincent Massol wrote:
>>>>
>>>> On Jun 10, 2013, at 8:25 PM, Sergiu Dumitriu <sergiu(a)xwiki.org
wrote:
>>>>
>>>>> On 06/10/2013 11:00 AM, Thomas Mortagne wrote:
>>>>>> Hi devs,
>>>>>>
>>>>>> Right now the XAR plugin format goal systematically empty the
>>>>>> <defaultLanguage> property.
>>>>>>
>>>>>> This is wrong IMO since it means we have no idea what is the
default
>>>>>> document language, it was not too visible before but it's
really not
>>>>>> very nice for things like the localization module and especially
SOLR
>>>>>> which store deferently the content depending on the language
(stop
>>>>>> words, etc).
>>>>>>
>>>>>> I see several possibilities:
>>>>>>
>>>>>> 1) We don't touch the XAR maven plugin and we state that when
default
>>>>>> language is not set, it's en (in the importer for example or
in
>>>>>> XWikiDocument#getDefaultLanguage)
>>>>>> 2) We stop filtering default language in the XAR plugin and we
set it
>>>>>> to en for all document in which it make sense
>>>>>> 3) We force default language to "en" in the XAR plugin
>>>>>>
>>>>>> WDYT ?
>>>>>>
>>>>>> I don't like too much 1) since some technical document could
really
> be
>>>>>> seen has having no default language, some document without any
> literal
>>>>>> content. But it's more a -0 than a -1, I understand other
would want
>>>>>> this for simplicity.
>>>>>>
>>>>>> About 3) as I said having a default language empty is a valid
use
> case
>>>>>> IMO so -0 for this one to. Still a bit better than 1) since the
use
>>>>>> case is still possible.
>>>>>>
>>>>>> +1 for 2)
>>>>>
>>>>> Neither option is good in general. The main problem is that most
>>>>> documents are written in the "Velocity" language, not in
the "English"
>>>>> language, meaning that it only contains code (which won't be seen
by
> the
>>>>> user), and translations, which depend on a lot of factors. It's
not
> good
>>>>> to say that the default language of a dynamically translated
document
> is
>>>>> en, since a wiki configured with a different language will only
> display
>>>>> them in that language, never in en.
>>>>>
>>>>> There are only a few documents that contain real text (normally only
> the
>>>>> sandbox should have real text, everything else should be localized),
> and
>>>>> for those it's OK to specify the actual language.
>>>>
>>>> I'm not sure I agree with this vision. It really depends on the use
> case. So far we haven't found a perfect solution.
>>>>
>>>> Some pages will have more code than content, others will have more
> content than code. For the former, keys are best and for the latter
> translations are best.
>>>>
>>>> In any case I don't understand the problem. What is the issue with
> saying that all our pages are in English by default. If a wiki is
> configured to be in another language and there's no translation for that
> language the default language (ie "English") will be used.
>>>
>>> There is no actual text in the document. How can you say that the
>>> language of
>>>
>
https://github.com/xwiki/xwiki-platform/blob/master/xwiki-platform-core/xwi…
>>> is English, since there's no English sentence in there? Depending on the
>>> configuration, the same document will appear in German, Chinese, even
>>> Klingon, without changing anything in the document, so it is definitely
>>> not an English document.
>>>
>>> Scenario: Set up a new XWiki instance, and change the default language
>>> of the wiki to German. When you browse the wiki, everything is in
>>> German. Yet all the documents say that they're in English.
>>>
>>> Problem 1: The wiki is indexed as English text, so searching for text
>>> that the user actually sees in the wiki won't return any results.
>>>
>>> Problem 2: Editing such a document will automatically create a
>>> translation, since the original document is in English, and the user
>>> wants to edit a German document. Since the two languages are not
>>> compatible, a translation will be created automatically. Now the code
>>> has been forked, and automatic updates using the Distribution Wizard
>>> will update the hidden English document, since that is the default one,
>>> while the forked translation will stay behind.
>>
>> Thanks a lot for describing these 2 use cases that I definitely wouldn't
> have thought about! That's very useful.
>>
>> So it seems that suddenly it's becoming more complex ;)
>>
>> Basically it means that if we have documents that mix content and
> scripting we're going to have issues:
>> * Either they're marked as having no default language and the english
> content will be indexed in the default language of the wiki
>> * Either they're marked as "en" and the user will not have the
scripts
> in the search results and the DW/EM will update only the default version if
> the user has created a translation
>>
>> I'm sure we have lots of cases like this, the easiest one being the main
> home page:
>>
>> -------------------
>> It's an easy-to-edit website that will help you work better together.
> This Wiki is made of //pages// sorted by //spaces//. You're currently in
> the **Main** space, looking at its home page (**WebHome**).
>>
>> Learn how to use XWiki with the {{velocity}}[[Getting Started
> Guide>>
>
http://enterprise.xwiki.org/xwiki/bin/view/GettingStarted/WebHome?version=$…
> .
>>
>> {{velocity}}
>> #if($hasEdit)You can then use the [[Sandbox
> space>>Sandbox.WebHome]] to try out your wiki's features.#end
>> {{/velocity}}
>> -------------------
>>
>> It has both content and script… If it's marked as "en" then if the
user
> searches for "hasEdit" he won't get it if his wiki is in a language
other
> than "en".
>>
>> Unless, if there's no translation in the language of the user then we
> return the default language results for that page. Would that make sense?
>>
>> But there's still the issue of editing the page, which will create a
> translation and then imagine that we replace the velocity script in a
> future version, then the user will only get his default page updated and
> not his translation… However that's a general problem I guess...
>>
>> If it has no default language (as is the case now BTW) then it seems
> less of an issue it seems. It just means:
>> * If user searches for "work" he'll get result even though he's
no in an
> "en" wiki. But then he's searching for an english word too ;)
>> * any other downside?
>>
>> In view of all this, it seems that not setting any default language is a
> lesser evil, doesn't it?
>>
>> Thanks
>> -Vincent
>>
>>> That is why I'm saying that this kind of documents don't have a
>>> language, and they never should have. They adapt themselves to the
>>> user's language, so they're written in no language, yet they can
match
>>> all languages.
>>>
>>> Of course, not all documents are like this, as I originally stated
>>> myself, and there are valid cases where documents should have "en"
as
>>> the default translation.
>>>
>>>> What am I missing?
>>>>
>>>> (I'm not commenting on anything below yet because I feel it's
> important to agree on what's before first)
>>>>
>>>> Thanks
>>>> -Vincent
>>>>
>>>>> Other options:
>>>>>
>>>>> 4) Detect somehow localized documents and index:
>>>>> - the raw content using a non-language-specific analyzer
>>>>> - the content translated into all the languages registered in the
>>>>> administration, each with the proper language-specific analyzer, if
> they
>>>>> are supported by Solr; this includes the default wiki language.
>>>>> 4a) localized document = the default language is empty
>>>>> 4b) localized document = the default language is literally
"localized"
>>>>> 4c) add another document flag for marking localized documents
>>>>>
>>>>> 5) When the defaultLanguage is empty, render in the configured wiki
>>>>> default language
>>>>>
>>>>>
>>>>> I like 4) since it makes localized documents really searchable in
all
>>>>> the languages "supported" by that wiki instance.
>>>>>
>>>>> 4a) is a behavior change, so it might cause some trouble
>>>>> 4b) is the safest and requires the least amount of changes
>>>>> The number of document fields is increasing, so I'm not that fond
of
> 4c)