Re: [xwiki-devs] [DISCUSSION] Handling document translations in Solr Search

28 Nov 2012

2012/11/28 Eduard Moraru &lt;enygma2002(a)gmail.com&gt;
...
  Hi Ludovic,
 Thanks for the reply. Please read below...
 On Tue, Nov 27, 2012 at 5:44 PM, Ludovic Dubost &lt;ludovic(a)xwiki.com&gt; wrote:
  Hi Edy,
 I'm not a huge fan of the title_fr title_en title_morelanguages approach  as
  indeed it seems to be quite complex at the query
level. I was more  leaning
  towards multiple indexes if we can query them
globally but I understand
 this is complex too.
 Now let's see the use cases that are hugely important:
 1/ Make sure that if you decide your wiki is monolingual:
    - the indexing uses the specific language analyzer
    - make sure the query uses the specific language analyzer
    - make sure the search looks in all content even if the language  setting
  of the document is wrongly set (consider all
documents being of the
 specific language)

 You mean that, if the wiki is monolingual, we should ignore the language
 filter and hardcode it to "All languages", right?
 However, what would be the advantage of this? Why would be want to pollute
 the results with irrelevant documents (caused by a probable recent
 configuration change that went from multi-lingual to mono-lignual)? Wasn't
 that the whole reason why the admin switched to mono-lingual?

If the user is monolingual, we can safely ignore the language setting of
each document and only the "main" document will be shown anyway to the user.
So we should make sure we search on ALL documents that are available to the
user.
The user might have to "reindex" to make sure this is properly taken into
account by the search engine.
What is important here, is that even if the wiki is set to "fr", then even
if documents have "en" for the main language they will still show up in the
search. The opposite would be bad.
...

  2/ Allow is a wiki is multi-lingual:
   - search in the language you decide (maybe the UI should display a
 language choice for the query)

 We already support this, by using the "Filtered Search" option and
 selecting the language.

Sure. I was just repeating what is important.
...

   - search in content that is analyzed in the proper language when the
  content is declared in this language
   - allow to specify if you want to restrict your search to documents
 declared in the language of your query, versus search more widely in all
 documents accross languages. If you search in only the language of the
 query only one document can show up but it should point to the right
 translation that matches, if you search in multiple languages then you  can
  show individual translations.

 I think this one is the same as the first bullet above.
 No I think it's different. The user can decide to make his search in
"French" but look in the "French" + "English" dataset.
For me if both
documents with the same page name match, both should come out separately.
...

    - allow technical users to search for all
documents across all  languages
  (where the language analysis does not really
matter)

 Do you mean as an API?

Not specifically as an API
...

 What exactly do you mean by "language analysis does not really matter"? Any
 Example?

I mean here that as a technical user your objective is to make sure your
search spans ALL content in the wiki. In this case you don't care about
stemming and such.
The standard UI could take this into account by choosing "any language
search" and with data set "all languages". We just need to make sure that
this won't exclude any content of the search
(Here is an example of a case where such exclusion might occur. Suppose you
have done only "French" and "English" indexing and there is
"German"
content also in your wiki but since you have not asked for German search in
your wiki you don't have title_de and content_de fields or don't have a
german specific solr index (in the other method), then your german content
would not be indexed at all ?)
...

 From an admin point of view it makes good sense for the admin to be able  to
  specify in a multilingual wiki which language
analysis should be  activated,
  and then have this transmitted to SOLR to
properly configure the engine.
 Reindexing is ok when changing the configuration.
 I believe in the end wether you use multiple fields with _fr _en or
 multiple SOLR cores, as long as you can query accross SOLR cores is a bit
 the same. If you cannot run a query merging multiple indexes then the  first
  solution is kind of absolutely necessary as it
would be the only one
 allowing to search across all languages.
 Maybe a solution would be to create one index per language and index ALL
 content regardless of it's language using the language analyzer of that
 index. This would allow to have better results even though the users have
 badly tagged the language of a document, and it's only the job of the UI  to
  limit the search to only the language of the
query, or all documents. 
  So you could have a configuration in the admin
that says:
 1/ Create an English Index
 2/ Create an additional French index
 The UI would allow to search in English and French, + would add a  language
  restriction for the documents.

 Applying the language specific analyzers (for Chinese, for example) to all
 the documents will just create a mess for all the documents that do not
 match the analyzer's language. I`m not sure the results for the
 badly-indexed languages will make any sense to users.
 I undestand the issue here, but in most cases the user will say "french
search" on "french content", he will only expand to non french content if
he was not satisfied by his search. What is just allowed here is to also
search in "french" inside all content. That would cover content that would
have the bad language setting as well as any other content. The results
might be a bit noisy but I don't think it's a big issue.
...
  Also, this is very similar to the multi-core approach
(one core per
 language), just that you also add documents that are indexed with the wrong
 analyzers. We have the same problem regarding merging relevance scores
 across indexes (cores) that is a big turn-off for the original multi-core
 approach.
 This is a more serious issue. If it's hard to merge the results spanning over
multiple cores this could be a showstopper. However the solution of
having only one Lucene document for all languages is not so cool either as
it would make it difficult to know which ones of the languages has matched
and present them separately with separate scores.
It's really the core issue to decide on. What are the benefits and
drawbacks of the different solutions. For each solution is there something
in the UI that you cannot do ?
So far I've heard:
1/ Presenting different scores for documents in different languages with
the same doc name if the title_fr,content_fr method is used
2/ Merging scores accross indexes in multicore approach
Other ? Can we list them in a wiki page ?
...

 In the future if we are able to "detect" the language of the documents we
 could add a lucene field with the "detected" language instead of the
 "provided" language of the documents, therefore increasing the quality of
 searches only on documents of a specific language.

 In the previous discussions (on the GSoC thread) we agreed that language in
 XWiki is known before-hard, so no recognition is required, at least not at
 document level.
 Let's forget this 
...

 This later solution would be the only one that would really work on file
 attachements as we have no information about the specific language of  file
  attachements (or even XWiki objects) which are
attached to the main
 document and not to the translated document.

 Yes, this is a problem right now. AFAIU, the plan [1] is to support
 translated objects and maybe attachments as well. Until then, we could
 either:
 1) Use the original document's language to index the attachment's content

This is not a good solution. If I understand correctly we could not end up
not searching in a french attachment because the original document is in
marked "en".
I'm for Paul's solution to index objects and attachments in each
translation (if we have separate entities for translated documents). I
understand that in the title_fr,content_fr approach this problem does not
happen.
...
  2) Use a language detection library to try to detect
the attachment
 content's language and index it accordingly.
 Not sure we can for now 
...
  The above could also be applied for objects and their
properties.
 ----------
 [1] http://jira.xwiki.org/browse/XWIKI-69

 This later issues shows that a search on "only french content" should 
still
  include the attachements because we have no idea
if the attachments are
 "french" or "english".

 (The paragraphs below discuss on what currently exists and what could be
 done, ignoring the possible language detection mentioned above)
 Right now a document also indexes the object's properties in a field called
 "objcontent". I do this for all translations, thus duplicating the field's
 value in all translations. I can do the same for attachments. The purpose
 is, indeed, to be able to find document translations based on hits in their
 objects/attachments. If a language filter is used and there is a hit in an
 object, only one document is returned. If there are no language filters,
 all translations will be returned.

It seems we have to do this for now
...

 However, if we search for the object/property/attachment itself, it will
 only be assigned to one language: the language of the original document.
 This means that if we search for all languages, the object itself will be
 found too (there is no language filter used). If we add a language filter
 that is different from the object/property/attachment's original document
 language, the object/property/attachment will not be found.
 Maybe we can come up with some processing of the query in the search
 application, that applies the language filter only for documents:
 ((-type:"OBJECT" OR -type:"OBJECT_PROPERTY" OR
-type:"ATTACHMENT") OR
 lang:"<userSelectedLanguage>") -- writing it like this because the
default
 operand is AND in the query filter clause that we use in the Search
 application.
 The problem with this is that that, when a language filter is used, the
 object/property/attachments that are now included in the results might not
 have the specified language and will pollute the results.

I'm not sure I understand. We have an "objcontent" field for each
translation that has the full content of objects and properties, but
individual object fields we don't have in each translations ?
The more I see all the issues, the more I lean towards a separate index per
language solution. The reason I do is that the main need is for a non
English user to have very relevant results in his own language. Therefore
we need to make sure that all content that the users have published has the
chance to be analyzed using the non English language analyzer. So indexing
all objects and attachments with the relevant language analyzer is the
solution. This is also why I proposed to index all content in this specific
index regardless of the language declared, which would only be used in the
UI to limit searches to the specific language.
In this view:
0/ There would be a language specific index per language with the objects
and attachments indexed only in the language of the index
1/ The user chooses the language in which he searches
2/ Automatically that sets the index to be used to be the "french" index
3/ Automatically that presets to limit the span of the search to declared
"french" documents
4/ The user can decide to go for non french documents at his own risks
knowing that the results might be weird because of wrong analysis (this is
what happens today with english analysis over french documents)
The benefit here is that you don't have a merging score over multiple index
issue, since you would never have to do a search across multiple indexes.
Searches are still simple to write. By default results are quite relevant
since you limit the search the french declared documents (this would be the
same as limiting your search to title_fr, content_fr) and still cover what
needs to be covered (objects and attachments).
Another benefit is that this falls back gracefully to monolingual as you
just have to have one index in the language declared for the monolingual
wiki.
The drawback is that the indexing is more costly and there is duplicated
content in the index. Howerver it is the Admin that say which languages he
wants available and he takes responsibility of the ressources this needs.
Could this solution work ?
Ludovic
...

 Thanks,
 Eduard
  Ludovic
 2012/11/26 Eduard Moraru &lt;enygma2002(a)gmail.com&gt;
 > Hi devs,
 >
 > Any other input on this matter?
 >
 > To summarize a bit, if we go with the multiple fields for each  language,
  we
 > end up with an index like:
 >
 > English version:
 > id: xwiki:Main.SomeDocument_en
 > language: en
 > space: Main
 > title_en: XWiki document
 > doccontent_en: This is some content
 >
 > French version:
 > id: xwiki:Main.SomeDocument_fr
 > language: fr
 > space: Main
 > title_fr: XWiki document
 > doccontent_fr: This is some content
 >
 > The Solr configuration is generated by some XWiki UI that returns a zip
 > that the admin has to unpack in his (remote) Solr instance. This could  be
   automated
for the embedded instance. This operation is to be performed  each
  time an admin changes the indexed languages
(rarely or even only once).
 Querying such a schema is a bit tricky when you are interested in more  than
  one language, because you have to add all the
clauses (title_en,  title_fr,
  etc.) specific to the languages you are
interested in.
 Some extra fields might also be added like title_ws (for whitespace
 tokenization only) that have various approaches to the indexing  operation,
 > with the aim of improving the relevancy.
 >
 > One solution to simplify the query for API clients would be to use  fields
   like
"title" and "doccontent" and to put as values very lightly (or not
 at
  all) analyzed content, as Paul suggested. This
would allow applications  to
  write simple (and backwards compatible maybe)
queries that will still  work,
 > but will not catch some of the nuances of specific languages. As far as
 > I`ve seen until now, applications are not very interested in nuances,  but
  > rather in filtering the results, a task for
which this solution might  be
   well
suited. Of course, nothing stops applications from using the *new*  and
 > more expressive fields that are properly analized.
 >
 > Thus, the search application will be the major beneficiary of these
 > analyzed fields (title_en, title_fr, etc.), while still allowing
 > applications to get their job done (trough generic, but less/not  analized
  > fields like "title",
"doccontent", etc.).
 >
 > WDYT?
 >
 > Thanks,
 > Eduard
 >
 >
 >
 >
 > On Wed, Nov 21, 2012 at 10:49 PM, Eduard Moraru &lt;enygma2002(a)gmail.com
 > >wrote:
 >
 > > Hi Paul,
 > >
 > > I was counting on your feedback :)
 > >
 > > On Wed, Nov 21, 2012 at 3:04 PM, Paul Libbrecht &lt;paul(a)hoplahup.net&gt;
 > wrote:
 > >
 > >>
 > >> Hello Eduard,
 > >>
 > >> it's nice of you to see you take this further.
 > >>
 > >> > This issue has already been previously [1] discussed during the 
GSoC
  > >> > project, but I am not
particularly happy with the chosen approach.
 > >> > When handling multiple languages, there are generally[2][3] 3
 > different
 > >> > approaches:
 > >> >
 > >> > 1) Indexing the content in a single field (like title, doccontent,
 > etc.)
 > >> > - This has the advantage that queries are clear and fast
 > >> > - The disadvantage is that you can not run very well tuned 
analyzers
  > on
 > >> the
 > >> > fields, having to resort to (at best) basic tokenization and
 > >> lowercasing.
 > >> >
 > >> > 2) Indexing the content in multiple fields, one field for each
 > language
 > >> > (like title_en, title_fr, doccontent_en, doccontent_fr, etc.)
 > >> > - This has the advantage that you can easily specify (as dynamic
 > fields)
 > >> > that *_en fields are of type text_en (and analyzed by an
 > >> english-centered
 > >> > chain of analyzers); *_fr of type text_fr (focused on french, 
etc.),
  > >> thus
 > >> > making the results much better.
 > >>
 > >> I would add one more field here: title_ws and text_ws where the full
 > text
 > >> is analyzed just as words (using the whitespace-tokenizer?).
 > >> A match there would even be preferred to a match in the below
 > text-fields.
 > >>
 > >> (maybe that would be called title and text?)
 > >>
 > >> > - The disadvantage is that querying such a schema is a pain. If 
you
   want
 >> > all the results in all languages, you end up with a big and 
expensive
 > >> > query.
 > >>
 > >> Why is this an issue?
 > >> Dismax does it for you for free (thanks to the "form" parameter
that
 > >> gives weight to each of the fields).
 > >> This is an issue only if you start to have more than 100 languages 
or
   >>
so...
 >> Lucene, the underlying engine of solr, handles thousands of clauses  in a
 > >> query without an issue (this is how prefix-queries are handled... 
they
  > are
 > >> expanded to a query for any of the term that matches the prefix, a
 > setting
 > >> deep somewhere, which is about 2000 avoids this to explode).
 > >>
 > >
 > > Sure, Solr is great when you want to do simple queries like "XWiki 
Open
    Source", however, since in XWiki we also expose
the Solr/Lucene query  APIs
 > to the platform, there will be (as as it is currently with Lucene) a  lot
 > of
 > > extensions wanting to do search using this API. These extensions  (like
  > the
 > > search suggest for example, rest search, etc) want to do something  like
  > > "title:'Open Source' AND
type:document AND doccontent:XWiki". Because
 > > option 2) is so messy in it's fields, it would mean that the 
extension
   >
would have to come up with a query like "title_en:'Open Source' AND
 > type:document AND doccontent_en:XWiki" (assuming that it is only  limited
  to
 > the current -- english or whatever -- language; what happens if it  wants
  to
 > do that no matter what language? It will have to specify each  combination
 > > possible because we can't use generic field names).
 > >
 > > Solr's approach works for using it in your web application's search
 > input,
 > > in a specific usecase, where you have precisely specified the default
 > > search fields and their boosts inside your schema.xml. However, as a
 > search
 > > API, using option 2) you are making the life of anyone else wanting  to
   use
 > the Solr search API really hard. Also, your search application will  work
  > nicely when the user enters a simple query
in the input field, but an
 > advanced user will suffer the same fate when trying to write an  advanced
 > > query, thus not relying on the default query (computed by solr based  on
  > > schema.xml).
 > >
 > > Also, based on your note above regarding improvements like title_ws  and
   >
such, again, all of these are very well suited for the search  application
 > > use case, together with the default query that you configure in
 > schema.xml,
 > > making the search results perform really well. However, what does all
 > these
 > > fields mean to another extension wanting to do search? Will it have  to
  > > handle all these implementation details
to query for title, content  and
  > > such? I`m not sure how well this would
work in practice.
 > >
 > > Unrealistic idea(?): perhaps we should come up with an abstract  search
   >
language (Solr/Lucene clone) that parses the searched fields andhides  the
  > complexities of all the indexed fields,
allowing to write simple  queries
 > > like "title:XWiki", while this gets translated to
"title_en:XWiki OR
 > > title_fr:XWiki OR title_de:XWiki..." :)
 > >
 > > Am I approaching this wrong by trying to have both a  tweakable/tweaked
  > > search application AND a search API?
Are the two not compatible? Do  we
  > have
 > > to sacrifice search result performance (no language-specific stuff)  to
  be
  > able to have a usable API?
 >
 >
 >> > If you want just some language, you have to read the right fields
 >> > (ex title_en) instead of just getting a clear field name (title).
 >>
 >> You have to be careful, this is really only if you want to be  specific.
 > >> In this case, it is likely that you also do not want so much 
stemming.
  > >> My experience, which was before
dismax on curriki.org, has made it  so
   >>
that any query that is a bit specific is likely to not desire  stemming.
  >>
 >
 > Can you please elaborate on this? I`m not sure I understand the  problem.
 > >
 > >
 > >>
 > >> > -- Also, the schema.xml definition is a static one in this 
concern,
  > >> > requiring you to know
beforehand which languages you want to  support
   >>
(for
 >> > example when defining the default fields to search for). Adding a 
new
 > >> > language requires you to start editing the xml files by hand.
 > >>
 > >> True but the available languages are almost all hand-coded.
 > >> You could generate the schema.xml based on the available languages 
if
  > not
 > >> hand-generated?
 > >>
 > >
 > > Basically I would have to output a zip with schema.xml,  solrconfig.xml
   and
 > then all the resources specific to all the selected languages  (stopwords,
  > synonims, etc) for the languages that we can
provide out of the box.  For
  > other languages, the admin would have to get
dirty with the xmls.
 >
 >
 >>
 >> There's one catch with this approach which is new to me but seems to 
be
  >> quite important to implement this
approach: the idf should be  modified,
  the
 >> Similarity class should be, so that the total number of documents is  the
  > total
number of documents having that language.
> See:
>
> 

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201211.mbox/%3Cza…
   > The solution sketched there sounds easy but I have
not tried it.
>
> > 3) Indexing the content in different Solr cores (indexes), one for  each
 > > language. Each core requires it's on
directory and configuration  files.
 >> > - The advantage is that queries are clean to write (like option 1) 
and
 > >> that
 > >> > you have a nice separation
 > >> > - The disadvantage is that it's difficult to get it right
 > >> (administrative
 > >> > issues) and then you also have the (considerable) problem of 
having
  to
  >> fix
 >> > the relevancy score of a query result that has entries from 
different
  >> > cores; each core has it's own
relevancy computed and does not  consider
 > >> the
 > >> > others.
 > >> > - To make it even worst, it seems that you can not [5] also push
 to
  a
  >> > remote Solr instance the
configuration files when creating a new  core
  >> > programatically. However, if we are
running an embedded Solr  instance,
  >> we
 >> > could provide a way to generate the config files and write them to 
the
 > >> data
 > >> > directory.
 > >>
 > >> Post-processing results is very very very dangerous as performance 
is
  at
 > >> risk (e.g. if a core does not answer)... I would tend to avoid that 
as
  > much
 > >> as possible.
 > >>
 > >
 > > Not really related, but this reminds me about the post processing  that
  I
  > do for checking view rights over the
returned result, but that's  another
 > > discussion that we will probably need to have :)
 > >
 > >
 > >>
 > >> > Currently I have implemented option 1) in our existing Solr
 > integration,
 > >> > which is also more or less compatible with our existing Lucene
 > queries,
 > >> but
 > >> > I would like to find a better solution that actually analyses the
 > >> content.
 > >> >
 > >> > During GSoC, option 2) was preferred but the implementation did 
not
   >>
> consider practical reasons like the ones described above (query
 >> complexity,
 >> > user configuration, etc.)
 >>
 >> True, Savitha surfed the possibility of having different solr  documents
 > >> per language.
 > >> I still could not be sure that this was not showing the document 
match
   >>
single in one language.
 >>
 >> However, indicating which language it is matched into is probably
 >> useful...
 >>
 >
 > Already doing that.
 >
 >
 >> Funnily, cross-language-retrieval is a mature research field but
 >> retrieval for multilanguage user is not so!
 >>
 >> > On a related note, I have also watched an interesting presentation 
[3]
 > >> > about how Drupal handles its Solr integration and, particularly, a
 > >> plugin
 > >> > [4] that handles the multilingual aspect.
 > >> > The idea seen there is that you have this UI that helps you 
generate
   > > configuration files, depending you your
needs. For instance, you  (admin)
 >> > check that you need search for language English, French and German 
and
 > >> the
 > >> > ui/extension gives you a zip with the configuration you need to 
use
  in
  > your
> > (remote or embedded) solr instance. The configuration for each  language
 > > comes preset with the analyzers you
should use for it and the  additional
 >> > resources (stopwords.txt, synonims.txt, etc.).
 >> > This approach helps with avoiding the need for admins to be forced 
to
  >> edit
 >> > xml files and could also still be useful for other cases, not only
 >> option
 >> > 2).
 >>
 >> Generating sounds like an easy approach to me.
 >>
 >
 > Yes, however I don`t like the fact that we can not do everything from  the
  > webapp and the admin needs to access the
filesystem to install the  given
  > configuration on the embedded/remote solr
directory. Lucene does not  have
 > > this problem now. It just works with XWiki and everything is done  from
  > > XWiki UI. I feel that losing this
commodity will not be very well
 > received
 > > by users that now have some new install steps to get XWiki running.
 > >
 > > Well, of course, for the embedded solr version, we could handle it  like
  > we
 > > do now and push the files directly from the webapp to the filesystem.
 > Since
 > > embedded will be default, it should be OK and avoid the extra install
 > step.
 > > Users with a remote solr machine should have the option to get the  zip
  > > instead.
 > >
 > > Not sure if we can apply the new configuration without a restart, but
 > I`ll
 > > have to look more into it. I know the multi-core architecture  supports
  > > something like this but will have to
see the details.
 > >
 > >
 > >>
 > >> > All these problems basically come from the fact that there is no
 way
  > to
 > >> > specify in the schema.xml that, based on the value of a field 
(like
   the
 >> > field "lang" that stores the document language), you want to run
 this
 > or
 > >> > that group of analyzers.
 > >>
 > >> Well, this is possible with ThreadLocal but is not necessarily a 
good
   >>
idea.
 >> Also, it is very common that users formulate queries without  formulating
  >> their language and thus you need to
"or" the user's queries through
 >> multiple languages (e.g. given by the browser).
 >>
 >> > Perhaps a solution would be a custom kind of "AggregatorAnalyzer"
 that
  >> > would call other analyzers at
runtime, based on the value of the  lang
  > >
field. However, this solution could only be applied at index time,  when
 > you
> > have the lang information (in the solrDocument to be indexed), but  when
 >> you
 >> > perform the query, you can not analyze the query text since you do 
not
 > >> know
 > >> > the language of the field you're querying (it was determined at
 > runtime
 > >> -
 > >> > at index time) and thus do not know what operations to apply to 
the
   >>
query
 >> > (to reduce it to the same form as the indexed values).
 >>
 >> How would that look at query time?
 >>
 >
 > That's what I was saying, that at query time, the searched term will  not
 > > get analyzed by the right chain. When you search for a single  language,
  > you
 > > could add that language as a query filter and then you could apply  the
    right chain, but when searching in 2 or more (or no,
meaning all)  languages
 > you are stuck.
 >
 >>
 >> > I have also read another interesting analysis [6] on this problem 
that
 > >> > elaborates on the complexities and limitations of each options.
 > (Ignore
 > >> the
 > >> > Rosette stuff mentioned there)
 > >> >
 > >> > I have been thinking about this for some time now, but the 
solution
  is
 > >> > probably somewhere in between, finding an option that is 
acceptable
   >>
while
 >> > not restrictive. I will probably also send a mail on the Solr list 
to
 > >> get
 > >> > some more input from there, but I get the feeling that whatever
 > >> solution we
 > >> > choose, it will most likely require the users to at least copy (or
 > even
 > >> > edit) some files into some directories (configurations and/or 
jars),
  > >> since
 > >> > it does not seem to be easy/possible to do everything on-the-fly,
 > >> > programatically.
 > >>
 > >> The only hard step is when changing the supported languages, I 
think.
   >>
In this case, when automatically generating the index, you need to  warn
  >> the user.
 >> The admin UI should have a checkbox "use generated schema" or a 
textarea
  >> for the schema.
 >>
 >
 > Please see above regarding configuration generation. Basically, since  we
 > > are going to support both embedded and remote solr instances, we  could
  > > support things like editing the schema
from XWiki only for the  embedded
   >
instance, but not for the remote one. We might end up having separate  UIs
 > > for each case, since we might want to exploit the flexibility of the
 > > embedded one as much as possible.
 > >
 > >
 > >>
 > >> Those that want particular fields and tunings need to write their 
own
  > >> schema.
 > >>
 > >> The same UI could also include whether to include a phonetic track 
or
   not
 >> (then require reindexing).
 >
 >
 >> hope it helps.
 >>
 >
 > Yes, very helpful so far. I`m counting on your expertise with  Lucene/Solr
 > > on the details. My current approach is a practical one without  previous
    experience on the topic, so I`m still doing mostly
guesswork in some  areas.

 Thanks,
 Eduard
> paul
> _______________________________________________
> devs mailing list
> devs(a)xwiki.org
> http://lists.xwiki.org/mailman/listinfo/devs
>
  _______________________________________________
 devs mailing list
 devs(a)xwiki.org
 http://lists.xwiki.org/mailman/listinfo/devs

 --
 Ludovic Dubost
 Founder and CEO
 Blog: http://blog.ludovic.org/
 XWiki: http://www.xwiki.com
 Skype: ldubost GTalk: ldubost
 _______________________________________________
 devs mailing list
 devs(a)xwiki.org
 http://lists.xwiki.org/mailman/listinfo/devs
  _______________________________________________
 devs mailing list
 devs(a)xwiki.org
 http://lists.xwiki.org/mailman/listinfo/devs

--
Ludovic Dubost
Founder and CEO
Blog: http://blog.ludovic.org/
XWiki: http://www.xwiki.com
Skype: ldubost GTalk: ldubost

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [xwiki-devs] [DISCUSSION] Handling document translations in Solr Search