On Tue, Jun 11, 2013 at 11:32 AM, Vincent Massol <vincent(a)massol.net> wrote:
On Jun 11, 2013, at 11:25 AM, Thomas Mortagne <thomas.mortagne(a)xwiki.com>
wrote:
On Tue, Jun 11, 2013 at 11:21 AM, Vincent Massol
<vincent(a)massol.net>
wrote:
>
> On Jun 11, 2013, at 11:09 AM, Denis Gervalle <dgl(a)softec.lu> wrote:
>
>> Thanks Sergui for catching up here, while I was off.
>> This is a situation I claim since a long time now, which is that some
>> document has simply no language, because all text displayed by those
>> document are simply produced by the localization module. Not only the
>> search issue are affected, but also the display of language selection
in
a
>> multi-wiki, and currently this is not as
nice as it should be.
>>
>> IMO, with a wiki that could produce with a single source documents,
>> displayed document in all available translation of the wiki (like the
UI
>> does), we need to have the notion of a
"no language" or "any language"
>> document. In my own projects, I have been able to manage that properly
>> with the empty "default language" and empty "language" case.
Since
having
>> both is stupid, we will surely merge
those column in the future, but we
>> need to keep the idea of a "no language" or "any language"
document,
>> whatever you see it, and to properly manage it, not only for our own
XAR
>> but for user produced documents as well.
>>
>> For those reasons, I am -1 for 1) and 3), and +1 for 2)
>
> BTW I forgot to say in the my previous reply but obviously I remove my
-1 for
2) since it's much more complex than I thought.
>
>> I follow Thomas, the Welcome page is not a good example, since this on
has
>> really a translation in all languages.
>> I do not understand the intend to index code statement in an index,
this
is
>> the rendered document that should be
indexed. And for "any language"
>> document, it should be indexed separately for each language enable in a
>> multi-language wiki.
>
> ok, that's something very important to decide.
>
> I think the 2 use cases are valide; xwiki can be used both by end users
who
only care about pure content and it can also be used by developers who
develop wiki pages with scripts inside. For the former they want to see
only content result and for the latter they want to see script results. For
example, as a dev I want to see where I call such velocity macro, or where
I use such rendering macro.
>
> So I really believe that we need to index both types of content:
rendered and
raw.
Right now only raw content is indexed mostly because it's not easy to
index rendered content (most of it is dynamic and some are simply
dangerous or really not intended to be executed in the content of some
deamon thread, this is not a new issue). But putting aside the current
limitation I don't think indexing raw content is useless and saying
that scripting is english because the syntax looks like english and
simply nonsense…
Yes I agree it's hard to index rendered content perfectly...
I was just replying to Denis who was saying the opposite, i.e. that we
should only index rendered content. I see value in indexing the raw content
for the reasons I pointed out above.
And I was just telling what end user really expect, I know that there is
issues with indexing rendered content, but there is probably something that
could be done to improve that.
I agree that ideally non tech users shouldn't see in their results some
script portions… One thing we could do could be to render without executing
transformations and only index that content. It's not perfect but it could
be a first try to filter out tech content for simple users.
Thanks
-Vincent
> Then we need to decide how to present that in
the UI, but it could an
option in the advanced search to include raw results too
for example (and
remove duplications…).
>
>> So, to respond to Vincent, using properly the "any language" case by
>> clearing the default language where needed, is the right way to go,
but
we
>> also need to manage that special case
properly elsewhere (indexing,
>> language selection on the page,…)
>
> So what's the rule for putting <defaultLanguage>en</defaultLanguage>?
Whenever there's a page having at least one word of English in the rendered
result?
>
> Thanks
> -Vincent
>
>> On Tue, Jun 11, 2013 at 10:54 AM, Thomas Mortagne <
thomas.mortagne(a)xwiki.com
>>> wrote:
>>
>>> That's not the home page, that's the Welcome page and it's not a
very
>>> good example since this page do have translations already so we
>>> already decided what we wanted for this pages in practice: each
>>> translation of the page copy the scripts.
>>>
>>> On Tue, Jun 11, 2013 at 10:19 AM, Vincent Massol <vincent(a)massol.net>
>>> wrote:
>>>> Hi Sergiu,
>>>>
>>>> On Jun 10, 2013, at 9:49 PM, Sergiu Dumitriu <sergiu(a)xwiki.org>
wrote:
>>>>
>>>>> On 06/10/2013 03:12 PM, Vincent Massol wrote:
>>>>>>
>>>>>> On Jun 10, 2013, at 8:25 PM, Sergiu Dumitriu
<sergiu(a)xwiki.org>
wrote:
>>>>>>
>>>>>>> On 06/10/2013 11:00 AM, Thomas Mortagne wrote:
>>>>>>>> Hi devs,
>>>>>>>>
>>>>>>>> Right now the XAR plugin format goal systematically empty
the
>>>>>>>> <defaultLanguage> property.
>>>>>>>>
>>>>>>>> This is wrong IMO since it means we have no idea what is
the
default
>>>>>>>> document
language, it was not too visible before but it's really
not
>>>>>>>> very nice for
things like the localization module and especially
SOLR
>>>>>>>> which store
deferently the content depending on the language
(stop
>>>>>>>> words, etc).
>>>>>>>>
>>>>>>>> I see several possibilities:
>>>>>>>>
>>>>>>>> 1) We don't touch the XAR maven plugin and we state
that when
default
>>>>>>>> language is not
set, it's en (in the importer for example or in
>>>>>>>> XWikiDocument#getDefaultLanguage)
>>>>>>>> 2) We stop filtering default language in the XAR plugin
and we
set it
>>>>>>>> to en for all
document in which it make sense
>>>>>>>> 3) We force default language to "en" in the XAR
plugin
>>>>>>>>
>>>>>>>> WDYT ?
>>>>>>>>
>>>>>>>> I don't like too much 1) since some technical
document could
really
>>> be
>>>>>>>> seen has having no default language, some document
without any
>>> literal
>>>>>>>> content. But it's more a -0 than a -1, I understand
other would
want
>>>>>>>> this for
simplicity.
>>>>>>>>
>>>>>>>> About 3) as I said having a default language empty is a
valid use
>>> case
>>>>>>>> IMO so -0 for this one to. Still a bit better than 1)
since the
use
>>>>>>>> case is still
possible.
>>>>>>>>
>>>>>>>> +1 for 2)
>>>>>>>
>>>>>>> Neither option is good in general. The main problem is that
most
>>>>>>> documents are written in the "Velocity" language,
not in the
"English"
>>>>>>> language, meaning
that it only contains code (which won't be seen
by
>>> the
>>>>>>> user), and translations, which depend on a lot of factors.
It's
not
>>> good
>>>>>>> to say that the default language of a dynamically translated
document
>>> is
>>>>>>> en, since a wiki configured with a different language will
only
>>> display
>>>>>>> them in that language, never in en.
>>>>>>>
>>>>>>> There are only a few documents that contain real text
(normally
only
>>> the
>>>>>>> sandbox should have real text, everything else should be
localized),
>>> and
>>>>>>> for those it's OK to specify the actual language.
>>>>>>
>>>>>> I'm not sure I agree with this vision. It really depends on
the use
>>> case. So far we haven't found a perfect solution.
>>>>>>
>>>>>> Some pages will have more code than content, others will have
more
>>> content than code. For the former, keys are best and for the latter
>>> translations are best.
>>>>>>
>>>>>> In any case I don't understand the problem. What is the issue
with
>>> saying that all our pages are in English by default. If a wiki is
>>> configured to be in another language and there's no translation for
that
>>> language the default language (ie
"English") will be used.
>>>>>
>>>>> There is no actual text in the document. How can you say that the
>>>>> language of
>>>>>
>>>
https://github.com/xwiki/xwiki-platform/blob/master/xwiki-platform-core/xwi…
>>>>> is English, since there's
no English sentence in there? Depending
on the
>>>>> configuration, the same
document will appear in German, Chinese,
even
>>>>> Klingon, without changing
anything in the document, so it is
definitely
>>>>> not an English document.
>>>>>
>>>>> Scenario: Set up a new XWiki instance, and change the default
language
>>>>> of the wiki to German. When
you browse the wiki, everything is in
>>>>> German. Yet all the documents say that they're in English.
>>>>>
>>>>> Problem 1: The wiki is indexed as English text, so searching for
text
>>>>> that the user actually sees
in the wiki won't return any results.
>>>>>
>>>>> Problem 2: Editing such a document will automatically create a
>>>>> translation, since the original document is in English, and the user
>>>>> wants to edit a German document. Since the two languages are not
>>>>> compatible, a translation will be created automatically. Now the
code
>>>>> has been forked, and
automatic updates using the Distribution Wizard
>>>>> will update the hidden English document, since that is the default
one,
>>>>> while the forked translation
will stay behind.
>>>>
>>>> Thanks a lot for describing these 2 use cases that I definitely
wouldn't
>>> have thought about! That's very
useful.
>>>>
>>>> So it seems that suddenly it's becoming more complex ;)
>>>>
>>>> Basically it means that if we have documents that mix content and
>>> scripting we're going to have issues:
>>>> * Either they're marked as having no default language and the
english
>>> content will be indexed in the default language of the wiki
>>>> * Either they're marked as "en" and the user will not have
the
scripts
>>> in the search results and the DW/EM
will update only the default
version if
>>> the user has created a translation
>>>>
>>>> I'm sure we have lots of cases like this, the easiest one being the
main
>>> home page:
>>>>
>>>> -------------------
>>>> It's an easy-to-edit website that will help you work better
together.
>>> This Wiki is made of //pages// sorted by //spaces//. You're currently
in
>>> the **Main** space, looking at its
home page (**WebHome**).
>>>>
>>>> Learn how to use XWiki with the {{velocity}}[[Getting Started
>>> Guide>>
>>>
http://enterprise.xwiki.org/xwiki/bin/view/GettingStarted/WebHome?version=$…
>>> .
>>>>
>>>> {{velocity}}
>>>> #if($hasEdit)You can then use the [[Sandbox
>>> space>>Sandbox.WebHome]] to try out your wiki's
features.#end
>>>> {{/velocity}}
>>>> -------------------
>>>>
>>>> It has both content and script… If it's marked as "en" then
if the
user
>>> searches for "hasEdit" he
won't get it if his wiki is in a language
other
>>> than "en".
>>>>
>>>> Unless, if there's no translation in the language of the user then
we
>>> return the default language results for that page. Would that make
sense?
>>>>
>>>> But there's still the issue of editing the page, which will create a
>>> translation and then imagine that we replace the velocity script in a
>>> future version, then the user will only get his default page updated
and
>>> not his translation… However
that's a general problem I guess...
>>>>
>>>> If it has no default language (as is the case now BTW) then it seems
>>> less of an issue it seems. It just means:
>>>> * If user searches for "work" he'll get result even though
he's no
in an
>>> "en" wiki. But then
he's searching for an english word too ;)
>>>> * any other downside?
>>>>
>>>> In view of all this, it seems that not setting any default language
is a
>>> lesser evil, doesn't it?
>>>>
>>>> Thanks
>>>> -Vincent
>>>>
>>>>> That is why I'm saying that this kind of documents don't have
a
>>>>> language, and they never should have. They adapt themselves to the
>>>>> user's language, so they're written in no language, yet they
can
match
>>>>> all languages.
>>>>>
>>>>> Of course, not all documents are like this, as I originally stated
>>>>> myself, and there are valid cases where documents should have
"en"
as
>>>>> the default translation.
>>>>>
>>>>>> What am I missing?
>>>>>>
>>>>>> (I'm not commenting on anything below yet because I feel
it's
>>> important to agree on what's before first)
>>>>>>
>>>>>> Thanks
>>>>>> -Vincent
>>>>>>
>>>>>>> Other options:
>>>>>>>
>>>>>>> 4) Detect somehow localized documents and index:
>>>>>>> - the raw content using a non-language-specific analyzer
>>>>>>> - the content translated into all the languages registered in
the
>>>>>>> administration, each with the proper language-specific
analyzer,
if
>>> they
>>>>>>> are supported by Solr; this includes the default wiki
language.
>>>>>>> 4a) localized document = the default language is empty
>>>>>>> 4b) localized document = the default language is literally
"localized"
>>>>>>> 4c) add another
document flag for marking localized documents
>>>>>>>
>>>>>>> 5) When the defaultLanguage is empty, render in the
configured
wiki
>>>>>>> default language
>>>>>>>
>>>>>>>
>>>>>>> I like 4) since it makes localized documents really
searchable in
all
>>>>>>> the languages
"supported" by that wiki instance.
>>>>>>>
>>>>>>> 4a) is a behavior change, so it might cause some trouble
>>>>>>> 4b) is the safest and requires the least amount of changes
>>>>>>> The number of document fields is increasing, so I'm not
that fond
of
>>> 4c)
_______________________________________________
devs mailing list
devs(a)xwiki.org
http://lists.xwiki.org/mailman/listinfo/devs