Hi Savitha,
I've started reviewing quickly the SOLR code in preparation for an integration in the
platform and I have some questions which I have jotted down below as I was reviewing the
code. Sorry for the terse format, I actually wrote the questions to myself and then decide
to send them as is :)
General:
* Need an architecture diagram showing the main modules and threads and how they interact
with the platform
Search-api:
* Is the Search API supposed to be independent of SOLR?
* Search interface is strange, it has implementation details such as: getImplementation(),
initialize(),
* It also has other concerns such as getStatus(), getStatusAsJson(), getVelocityUtils(),
getSearchRequest()
* Why do we need a Search interface? Why not instead use the Query module and introduce a
new query type? (note return List from Query.execute() probably needs to be clarified).
Replace SearchRequest with Query impl
* Naming of interfaces are a bit strange. For example: BuildIndex; should it be
IndexBuilder instead? What about DeleteIndex, should it be IndexDeleter?
* I don't think we need deleteDocumentIndex(), deleteWikiIndex(), deleteSpaceIndex(),
etc. We need a single deleteEntity(EntityReference reference, EntityType type). Same for
IndexBuilder.
* Why is there a DocumentIndexer interface? Why is a Document different from other
entities? For ex I can see DocumentIndexer.deleteIndex() why not
IndexDeleter.deleteEntity(documentRef)?
* Why is there a need for RebuildIndex (which I assume is IndexRebuilder) and why cannot
we use the IndexBuilder?
* Why the need for SearchIndex?
Search-solrj:
* solrj server in embedded mode is used.
* Shouldn't use system property but the xwiki configuration instead for the solrj home
(see below in misc)
* EmbeddedSolrServer depends on Servlet API? "Also, if using EmbeddedSolrServer, keep
in mind that Solr depends on the Servlet API. " from
http://wiki.apache.org/solr/Solrj
* EmbeddedSolrServer should be started by listening to the app started event instead of
lazily in Initializable IMO
* Since we use EmbeddedSolrServer how do we handle clustering? One instance per wiki
instance? How do they reconcile their indexes? Need an architecture diagram for our
solution for heavy loads.
Misc:
* all API to review and improve/stabilize
* typos to fix
* licenses to fix
* pom to fix
* missing class javadoc (eg BuildIndex, DeleteIndex, etc)
* exception handling to verify (ex: SolrjSearchEngine)
* Remove unneeded javadoc when @override
* Need to use the XWiki Permanent Directory for storing SOLR configuration data (the solr
home) - Need to move data currenty in solr/ in a solr-configuration jar module which gets
used as a fallback if the data doesn't exist in the solr home dir.
* Idea: use solr JMX to provide admin features (
http://wiki.apache.org/solr/SolrJmx)
* TODO: Think about how to migrate users to use SOLR instead of Lucene or DB Searches.
Need a plan.
Thanks!
-Vincent