On Fri, Oct 11, 2013 at 1:09 PM, Guillaume Lerouge <guillaume(a)xwiki.com> wrote:
Hi,
I don't want to answer this too broadly (I don't have the technical chops
to make a really informed comment). Here's however what I can state from my
experience with XWiki projects:
- When searching for an attached file, we always want to know (and
display) the document to which that file is attached
- When searching for an object, we're always looking for the document
which the object is part of, especially since we don't have an
"object-only" or "property-only" view anyway
- When searching for a class, again, we don't have a displayer for that
class in view mode outside of the document holding the class
- We don't really search for a space right now since technically it's
just a collection of pages anyway
- Searching for a wiki would be done through the wiki index, other than
that you're just searching for documents (some of which might happen to be
in a wiki)
All of which would tend to agree with Marius' suggestion.
In terms of UX impact, I think this would mean that
documents should always
be returned in search results, with attachments indented under the document
itself (instead of having separate entries for attachments and documents as
we do now).
Yes, the results would always be documents but for each result we
would display where the search term has been matched:
* in document title
* in document content
* in the attachment name
* in the attachment content
* in an object property
* etc.
Thanks,
Marius
Guillaume
On Fri, Oct 11, 2013 at 11:55 AM, Marius Dumitru Florea <
mariusdumitru.florea(a)xwiki.com> wrote:
Hi devs,
This is a very important question so think carefully. Let me explain:
In XWiki (model) we have a few entity types. There are *wikis* which
have *spaces* which have *documents*. A document can have *objects*
and *attachments*. A document can also define a *class*.
At the same time we like to say that in XWiki "everything is a
document" because everything revolves around documents. The document
is the central notion.
We can query the database (using HQL or XWQL) for any of the
previously mentioned entities but what should a Solr query return
(semantically)? In other words:
* are you searching for an object without caring about the document
that holds the object? Same for an object property.
* how often are you searching for an attachment without caring about
the document that holds the attachment?
* are you searching for a class or for the document that defines that
class?
* are you searching for a wiki without caring about the documents it
contains? Same for a space.
IMO the result of a Solr query should be, semantically, a list of
documents. But maybe I'm wrong.
-----------------------
Technical Details
-----------------------
Unlike a relational database, Solr/Lucene index has a single 'table'.
So normally you index a single entity type. Each row in the index
represents an entity of that type. As a consequence the result of a
Solr query is semantically a list of entities of that type. In our
case the entity type is (naturally) *document*.
If you want to index more entity types (e.g. index attachments and
objects _separately_, not as part of a document) then, since there is
only one 'table' in the index, you need to add a 'type' column that
specifies the type of entity you have on each row (e.g. type=document,
type=attachment, type=object etc.). The result of a Solr query is now,
semantically, a list of different entity types, unless you filter by a
specific type. It smells like a hack to me.
Let's imagine what happens if we want to search for blog posts that
has a specific tag. With the first approach this is easy because all
the (indexed) information is on a single row. With the second approach
this is considerably more complex because the information is spread on
multiple rows:
* one row with type=document for the blog post document
* one row with type=object for the blog post object
* one row with type=object for the tab object
In a relational database when you have the information spread in
multiple places (tables) you do joins. Fortunately (you would says)
Solr supports joins. In this particular case we would have to perform
2 joins which means:
index X index X index
where X represents the cartesian product. The document name would be
the join key. Pretty complex even before trying to write this in Solr
query syntax..
So basically the question becomes: is it worth indexing more entities
_separately_ instead of indexing just documents (with info about their
objects and attachments) considering the complexity that it brings in
writing Solr queries? Do we search for objects and attachments alone
as separate entities often enough to justify this complexity? My
answer is no.
Thanks,
Marius
_______________________________________________
devs mailing list
devs(a)xwiki.org
http://lists.xwiki.org/mailman/listinfo/devs
_______________________________________________
devs mailing list
devs(a)xwiki.org
http://lists.xwiki.org/mailman/listinfo/devs