David Ward wrote:
I am currently adding some enhancements to the Lucene
plugin for
Curriki, and have a few questions.
I have submitted a patch for XPLUCENE-25 in order to not index
password fields, the patch seems to have been accepted by Sergiu (with
some adjustments) and committed.
The other patch (XPLUCENE-26) is to allow better sorting of results.
Didn't have time to review it yet.
The issue is that when sorting by a field that has
been tokenzied by
Lucene the sorting is by any of the tokens (seemingly random), so
titles for example are sorted by random words within the title. The
patch that I have created so far increases the size of the index by
about 50% though (indexing non-tokenized versions of each object
field) and I am not sure if that is acceptable to the XWiki community
at large.
Could you run some tests to see how is the search time affected by this
change? The size increase depends on how the wiki is structured, for a
wiki holding only objects it would increase more, and for a wiki holding
mostly text it will almost stay the same.
One item that I noted was that the object data is
being stored in the
index, but there does not seem to be anything in the SearchResult
interface that allows for getting the values back. Is there a reason
the data is stored? I see two options here, first would be to add a
method in SearchResult that lets one get the object data out (but that
ends up having security issues for pages that one would not normally
be able to see), the other would be to just index the data and not
store it (which should reduce the index size). Any thoughts on the
best direction here?
Rights are a pretty serious topic here. I'd be in favor of displaying
the context of a search hit, but it will require rights check inside the
lucene plugin. We should vote for this.
For me, +1 for displaying the hit context, i.e. keep the data in the index.
The last question I have is how do I create a string
array (String[])
in a velocity script so that I can have a secondary sort column?
Velocity seems to create object arrays but the LucenePluginApi
requires a string array for the sortField argument of
getSearchResults.
Any comments/input/suggestions/answers are welcome.
#set($sortBy = "title,date")
#set($sortBy = $sortBy.split(","))
Now $sortBy is a String[]
--
Sergiu Dumitriu
http://purl.org/net/sergiu/