On 01/27/2010 05:54 PM, Paul Libbrecht wrote:
Hello devs,
I'm trying a few optimizations of the Lucene plugin and try to keep
this flexible and not too intergeo or curriki specialized.
The fact is that this plugin uses Lucene in a very blind and heavy
fashion, which gives a lot of power (but which is not used). Mostly,
I'd like, in a configurable way:
- to decide to store and/or index or not some objects or object
properties
- to decide to exclude some documents
- to decide to use particular analyzers for particular fields (in
particular the "token fields")
I know it would be almost possible by replacing lucene by solr and
letting users tune solr.
But maybe it is simple to have the configurability done in xwiki.
Probably the nicest way I see this would be the way the notifications
are done: a central field indicates the page of a groovy source which
should implement such an interface as "LuceneIndexProfile" which would
add such questions (maybe even including some more such as the Data
classes).
Is the nicest above easy?
Do we prefer and xml configuration?
Hi Paul,
I'm not sure I understood your approach, could you explain it in more
detail? What do you mean by "central field"?
The way I see it, each indexed field will have a reference, given by
some coordinates (this is related to the thread about object and
properties references), such as
"wiki:Space.Document^classname[index].property". There should be a
collection of filters (components implementing LuceneIndexFilter) which
have the following method:
boolean filter(Reference entity, LuceneIndexProfile profile);
The meaning is the following:
- entity is the entity to process (could be a document, an object
property, an attachment)
- profile is the indexing profile built by the filters, initialized with
some default values in the Lucene Plugin, and modified by the filters as
it passes through them
- returning true means that the filtering process should stop, since the
current filter decided that the profile is ready (for example if a
filter decided that the document should not be indexed due to security
restrictions, then it's useless to run all the other filters); by
default filters return false, letting the other filters to adjust the
profile
- each filter looks at the reference and, based on some internal rules,
decides if it should alter the filter for this entity, and if it
considers that no more filtering is useful/needed
After the filtering is done, the plugin indexes (or not) the entity
according to the values in the profile.
This means that we could have several components affecting the Lucene
behavior, each one with particular goals in mind (security, performance,
searchability), and each one with its own configuration.
So, what needs to be done (except writing the code) is define the
possible settings in the LuceneIndexProfile, define the filters needed,
decide how to configure them. XML files on the server are an option, but
one not flexible enough. Maybe objects inside the wiki will give more
flexibility to application developers. So, another thing to do is decide
the fields needed in such a class.
Of course, if somebody needs a new filter, it's easy to add a new jar or
write a new Groovy page in the wiki.
--
Sergiu Dumitriu
http://purl.org/net/sergiu/