Hello community,
here are my guts responses, I have not synched with Savitha.
Le 28 août 2012 à 14:04, Vincent Massol a écrit :
Search-api:
* Is the Search API supposed to be independent of SOLR?
I think this can be aimed at but it does not mean it will find a valid implementation for
each function.
* Search interface is strange, it has implementation
details such as: getImplementation(), initialize(),
These two methods are not implementation details to me.
Maybe initialize() is a natural thing to happen outside in the component lifecycle, so it
should be hidden.
Did you mean getImplementation() should rather be a cast?
* It also has other concerns such as getStatus(),
getStatusAsJson(),
These are useful!
I am not sure a unified UI is possible but if yes, they have to be there.
getVelocityUtils(), getSearchRequest()
* Why do we need a Search interface? Why not instead use the Query module and introduce a
new query type? (note return List from Query.execute() probably needs to be clarified).
Replace SearchRequest with Query impl
List is definitely a problem, one needs at least something that says the total match
count, the availability and parameters of a previous and next result page and probably
some debug information.
I also note that the practice of using strings as single entry point for queries is bad
and that Savitha partially changed it.
It is bad because of the possibility of injecting queries aside which, for examples, makes
it so that there's no way in the current lucene plugin to force an added condition
(e.g. to create a UI to query all tasks by adding the query of the existence of a task
object).
I find the Lucene Query object very workable for this; that of Solr as well. I do not
think they are compatible.
Also, the normal Solr way to do is actually use a query-parser which will operate such
transformations as the stemming of a query (you query for "arbr" when searching
for "arbres" for example). All search engines I've worked on use a
customizable query expansion mechanism where a part of the user-query is turned into a
decomposed query, then expanded into the various fields (query for title with bigger
boost, query for precise matches with bigger boost, allow matches in other languages...).
* Naming of interfaces are a bit strange. For example:
BuildIndex; should it be IndexBuilder instead? What about DeleteIndex, should it be
IndexDeleter?
* I don't think we need deleteDocumentIndex(), deleteWikiIndex(), deleteSpaceIndex(),
etc. We need a single deleteEntity(EntityReference reference, EntityType type). Same for
IndexBuilder.
* Why is there a DocumentIndexer interface?
I believe that the reason here is that it is of utmost importance to allow the indexing
process to be customized.
This can be done, on the one hand, in the Solr schema.
But that is not enough and it is quite common for an application to fetch related data
(for example to fetch the quality from an external source, for example to grab related
documents) and index it along. Thus an interface where, for each page, added data can be
injected sounds like a strict requirement to me.
Why is a Document different from other entities? For
ex I can see DocumentIndexer.deleteIndex() why not
IndexDeleter.deleteEntity(documentRef)?
* Why is there a need for RebuildIndex (which I assume is IndexRebuilder) and why cannot
we use the IndexBuilder?
An index Rebuilder object is necessary because index-rebuilding may take several days. It
needs to display a monitoring status (e.g the queue size) and probably added actions.
* Why the need for SearchIndex?
Search-solrj:
* solrj server in embedded mode is used.
This should be flexible, either embedded or not.
* Shouldn't use system property but the xwiki
configuration instead for the solrj home (see below in misc)
A challenge, but hopefully responded.
[...] (responded elsewhere)
Misc:
* all API to review and improve/stabilize
* typos to fix
* licenses to fix
* pom to fix
Maybe you can be more precise?
thanks for your comments.
Paul