Re: [xwiki-devs] [Solr] What do we search for?

14 Nov 2013

On Wed, Nov 13, 2013 at 8:08 PM, Ludovic Dubost &lt;ludovic(a)xwiki.com&gt; wrote:
...
  Hi Marius,
 I have a quick question when starting reading your proposal. I don't see
 anything about multi language indexing.
 I remember in the current SOLR implementation that there are multiple
 fields for each language. Would there be a fields for each language indexed
 for each property ? 
Yes. Right now I'm struggling to find a way to define an alias for a
group of dynamic fields. For document title we have this in
solrconfig.xml
<str name="f.title.qf">title__ title_ar title_bg title_ca ...</str>
which makes 'title' an alias for all its translations and allows us to
write title:text in the search query. I need to do the same, but
dynamically, for each object property:
property_Blog.BlogPostClass_title =
property_Blog.BlogPostClass_title__,
property_Blog.BlogPostClass_title_en,
property_Blog.BlogPostClass_title_fr, ...
I'll keep you posted.
Thanks,
Marius
...

 Ludovic
 2013/10/14 Marius Dumitru Florea &lt;mariusdumitru.florea(a)xwiki.com&gt;
  I started writing
 http://dev.xwiki.org/xwiki/bin/view/Design/SolrSchema . I need help
 with two things:
 * test cases
 http://dev.xwiki.org/xwiki/bin/view/Design/SolrSchema#HTestCases
 * if time permits, review the proposal, especially
 http://dev.xwiki.org/xwiki/bin/view/Design/SolrSchema#HAMixedApproach
 .
 Thanks,
 Marius
 On Fri, Oct 11, 2013 at 12:55 PM, Marius Dumitru Florea
 &lt;mariusdumitru.florea(a)xwiki.com&gt; wrote:
  Hi devs,
 This is a very important question so think carefully. Let me explain:
 In XWiki (model) we have a few entity types. There are *wikis* which
 have *spaces* which have *documents*. A document can have *objects*
 and *attachments*. A document can also define a *class*.
 At the same time we like to say that in XWiki "everything is a
 document" because everything revolves around documents. The document
 is the central notion.
 We can query the database (using HQL or XWQL) for any of the
 previously mentioned entities but what should a Solr query return
 (semantically)? In other words:
 * are you searching for an object without caring about the document
 that holds the object? Same for an object property.
 * how often are you searching for an attachment without caring about
 the document that holds the attachment?
 * are you searching for a class or for the document that defines that  class?
  * are you searching for a wiki without caring
about the documents it
 contains? Same for a space.
 IMO the result of a Solr query should be, semantically, a list of
 documents. But maybe I'm wrong.
 -----------------------
 Technical Details
 -----------------------
 Unlike a relational database, Solr/Lucene index has a single 'table'.
 So normally you index a single entity type. Each row in the index
 represents an entity of that type. As a consequence the result of a
 Solr query is semantically a list of entities of that type. In our
 case the entity type is (naturally) *document*.
 If you want to index more entity types (e.g. index attachments and
 objects _separately_, not as part of a document) then, since there is
 only one 'table' in the index, you need to add a 'type' column that
 specifies the type of entity you have on each row (e.g. type=document,
 type=attachment, type=object etc.). The result of a Solr query is now,
 semantically, a list of different entity types, unless you filter by a
 specific type. It smells like a hack to me.
 Let's imagine what happens if we want to search for blog posts that
 has a specific tag. With the first approach this is easy because all
 the (indexed) information is on a single row. With the second approach
 this is considerably more complex because the information is spread on
 multiple rows:
 * one row with type=document for the blog post document
 * one row with type=object for the blog post object
 * one row with type=object for the tab object
 In a relational database when you have the information spread in
 multiple places (tables) you do joins. Fortunately (you would says)
 Solr supports joins. In this particular case we would have to perform
 2 joins which means:
 index X index X index
 where X represents the cartesian product. The document name would be
 the join key. Pretty complex even before trying to write this in Solr
 query syntax..
 So basically the question becomes: is it worth indexing more entities
 _separately_ instead of indexing just documents (with info about their
 objects and attachments) considering the complexity that it brings in
 writing Solr queries? Do we search for objects and attachments alone
 as separate entities often enough to justify this complexity? My
 answer is no.
 Thanks,
 Marius  _______________________________________________
 devs mailing list
 devs(a)xwiki.org
 http://lists.xwiki.org/mailman/listinfo/devs

 --
 Ludovic Dubost
 Founder and CEO
 Blog: http://blog.ludovic.org/
 XWiki: http://www.xwiki.com
 Skype: ldubost GTalk: ldubost
 _______________________________________________
 devs mailing list
 devs(a)xwiki.org
 http://lists.xwiki.org/mailman/listinfo/devs 

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [xwiki-devs] [Solr] What do we search for?