On Mon, Jun 10, 2013 at 10:49 PM, Sergiu Dumitriu <sergiu(a)xwiki.org> wrote:
On 06/10/2013 03:12 PM, Vincent Massol wrote:
On Jun 10, 2013, at 8:25 PM, Sergiu Dumitriu <sergiu(a)xwiki.org> wrote:
On 06/10/2013 11:00 AM, Thomas Mortagne wrote:
Hi devs,
Right now the XAR plugin format goal systematically empty the
<defaultLanguage> property.
This is wrong IMO since it means we have no idea what is the default
document language, it was not too visible before but it's really not
very nice for things like the localization module and especially SOLR
which store deferently the content depending on the language (stop
words, etc).
I see several possibilities:
1) We don't touch the XAR maven plugin and we state that when default
language is not set, it's en (in the importer for example or in
XWikiDocument#getDefaultLanguage)
2) We stop filtering default language in the XAR plugin and we set it
to en for all document in which it make sense
3) We force default language to "en" in the XAR plugin
WDYT ?
I don't like too much 1) since some technical document could really be
seen has having no default language, some document without any literal
content. But it's more a -0 than a -1, I understand other would want
this for simplicity.
About 3) as I said having a default language empty is a valid use case
IMO so -0 for this one to. Still a bit better than 1) since the use
case is still possible.
+1 for 2)
Neither option is good in general. The main problem is that most
documents are written in the "Velocity" language, not in the
"English"
language, meaning that it only contains code (which won't be seen by the
user), and translations, which depend on a lot of factors. It's not good
to say that the default language of a dynamically translated document is
en, since a wiki configured with a different language will only display
them in that language, never in en.
There are only a few documents that contain real text (normally only the
sandbox should have real text, everything else should be localized), and
for those it's OK to specify the actual language.
I'm not sure I agree with this vision. It really depends on the use case. So far we
haven't found a perfect solution.
Some pages will have more code than content, others will have more content than code. For
the former, keys are best and for the latter translations are best.
In any case I don't understand the problem. What is the issue with saying that all
our pages are in English by default. If a wiki is configured to be in another language and
there's no translation for that language the default language (ie "English")
will be used.
There is no actual text in the document. How can you
say that the
language of
https://github.com/xwiki/xwiki-platform/blob/master/xwiki-platform-core/xwi…
is English, since there's no English sentence in there? Depending on the
configuration, the same document will appear in German, Chinese, even
Klingon, without changing anything in the document, so it is definitely
not an English document.
It looks English to me: "if", "request", "valid",
"key", "user",
"true", "login", "services", "localization", etc.
are all English
words. There are no sentences sure (although there can be code
comments), but the code is definitely written in English. If you index
the raw content then its language should be English IMO. If you index
the rendered content then I agree the result can be different
depending on the context language, but on the other hand some of this
code might not render anything (XWiki.LiveTableResultsMacros). Anyway,
I think most of the users won't care about such documents with code,
mainly because these documents are supposed to be hidden and thus not
included in the search results by default.
Let's no forget that an important part of the code is written in
objects (JSX/SSX) and they can also contain calls to the localization
service. Should we render that content and index it in multiple
languages?
Scenario: Set up a new XWiki instance, and change the default language
of the wiki to German. When you browse the wiki, everything is in
German. Yet all the documents say that they're in English.
Problem 1: The wiki is indexed as English text, so searching for text
that the user actually sees in the wiki won't return any results.
Problem 2: Editing such a document will automatically
create a
translation, since the original document is in English, and the user
wants to edit a German document. Since the two languages are not
compatible, a translation will be created automatically. Now the code
has been forked, and automatic updates using the Distribution Wizard
will update the hidden English document, since that is the default one,
while the forked translation will stay behind.
This is indeed a problem..
Thanks,
Marius
That is why I'm saying that this kind of documents don't have a
language, and they never should have. They adapt themselves to the
user's language, so they're written in no language, yet they can match
all languages.
Of course, not all documents are like this, as I originally stated
myself, and there are valid cases where documents should have "en" as
the default translation.
What am I missing?
(I'm not commenting on anything below yet because I feel it's important to agree
on what's before first)
Thanks
-Vincent
> Other options:
>
> 4) Detect somehow localized documents and index:
> - the raw content using a non-language-specific analyzer
> - the content translated into all the languages registered in the
> administration, each with the proper language-specific analyzer, if they
> are supported by Solr; this includes the default wiki language.
> 4a) localized document = the default language is empty
> 4b) localized document = the default language is literally "localized"
> 4c) add another document flag for marking localized documents
>
> 5) When the defaultLanguage is empty, render in the configured wiki
> default language
>
>
> I like 4) since it makes localized documents really searchable in all
> the languages "supported" by that wiki instance.
>
> 4a) is a behavior change, so it might cause some trouble
> 4b) is the safest and requires the least amount of changes
> The number of document fields is increasing, so I'm not that fond of 4c)
--
Sergiu Dumitriu
http://purl.org/net/sergiu
_______________________________________________
devs mailing list
devs(a)xwiki.org
http://lists.xwiki.org/mailman/listinfo/devs