Hi,
looks very cool!
Indeed, I guess stemming still needs some improvements. For instance
searching for "embeter":
returns 2 results whereas both should be the same.
Keep up the good work!
Guillaume
On Sat, Jun 30, 2012 at 11:03 PM, Paul Libbrecht <paul(a)hoplahup.net> wrote:
Ah, feedback! This is really good.
Le 30 juin 2012 à 20:26, Ludovic Dubost a écrit :
This is nice progress. I've had a look and I
have a few remarks:
1/ Some weird results
It seems the results are not always ok. For instance this page
http://ec2-50-19-181-163.compute-1.amazonaws.com:8080/xwiki/bin/view/Search…
comes up if I search for "SearchTest"
but it does not come up for "liste"
Also these 2 searches says 6 and 1 results and show only 3 and 0 results.
This is due to the multi-lingual document (one document in four languages).
The multilinguality is, I think, on top of Savitha's priority.
2/ Avanced queries
I was also wondering if we can use advanced queries.
I've been trying
SearchTest +space:SearchTest
and this does not seem to work.
There's a good reason for this: the syntax for search currently in use is
"Dismax". This is a query-parser that is rather less technical, so it
avoids such issues as considering an apostrophe as a separator (an issue
that was reported).
The queries you are suggesting, which I think can be useful, only work
with the Lucene Query-Parser, and not with dismax. This will be
configurable but I am not sure which one should be the default.
3/ It's important that we end up with at
least the same features as in
lucene.
Mmmh, not *all* of the features.
E.g. that all fields are stored is really not desired (and almost never
used in search results).
For instance being able to query all the fields
we could query
in lucene is important. For instance:
object:XWiki.XWikiUsers
should return only users
Something of this sort will be needed to achieve the advanced search
scenario.
Ordering and Scoring is also something that
existed in lucene. How
would this work in SOLR ?
A score is already displayed currently.
4/ we also want of course the advantages of SOLR,
which means
facetting. Tags, Spaces, Wikis can be interesting facets
The reason multilingual documents have been a problem thus far is that
Savitha is also trying to make the language a facet which is really
interesting but is raising an amount of difficulties.
5/ in terms of multilingual search (in case of a
multilingual wiki) we
need to make sure that you can say that you make a search in a
specific language and the correct stemmer is used (if stemming is used
at indexing time we need to index the content in each language with
the correct stemmer). I saw that you did some things with languages so
maybe SOLR has also other ways to handle this.
If you look into the source, you can see some of that.
Solr can do this very nicely declaratively with the schema.xml and
solrconfig.xml.
Part of Savitha's intent was to offer an adminstrative UI to manipulate
this but I'd personally prefer editing files manually. Or maybe we even
have to invent an extended schema syntax for XWiki-Solr (thus indicating
that a field of solr, of this and that type, tokenization and storage, if
fed by a property x/yz of an xwiki document).
paul
_______________________________________________
devs mailing list
devs(a)xwiki.org
http://lists.xwiki.org/mailman/listinfo/devs