I resend this message that was accidentally sent to Sergiu only.
---------- Forwarded message ----------
From: Guillaume Delhumeau <guillaume.delhumeau(a)xwiki.com>
Date: 2018-07-02 16:06 GMT+02:00
Subject: Re: [xwiki-devs] [Brainstorm] Notifications filters capabilities
and performances.
To: Sergiu Dumitriu <sergiu(a)xwiki.org>
Hi and thanks you for your answers.
2018-06-29 5:41 GMT+02:00 Sergiu Dumitriu <sergiu(a)xwiki.org>rg>:
The two main problems that I see are that you're
mixing two separate
things:
1. Tags which are stored in one place, and events which are stored in a
different place
2. Again tags, and document locations. They may seem similar, but they
are completely different as they are implemented. That's why saying "not
in this wiki, except for this tag" cannot ever be implemented in a sane
way since the exception is on a different plane.
The logical approach would be to also store the tags in the events, but
this is not a good thing to do.
Also, a tag could be added to a page AFTER the event has been triggered. In
this case, if you still want to fetch these pages, it won't work.
Where do we stop? For future-proofing
filters, we would have to store the entire document in the event, in
case someone wants to add a filter on "documents with a specific type of
object attached", or "documents with a specific value for a specific
property of a specific object", or "documents with attachments ending in
.extension".
On 06/28/2018 04:44 PM, Guillaume Delhumeau wrote:
For the tags filter, I can also:
* perform a query to fetch all documents that correspond to the given
tags
(it doesn't need to be permanent, but it
would be better to cache it)
* add a HQL filter on these pages (OR document IN :list_of_pages).
It's a little variation of solution A. It's ugly but it could do the job.
It's the sanest proposal so far, but still impossible.
Apparently it's the way it's done on Activity Stream:
https://github.com/xwiki/xwiki-platform/blob/553afe7287dcc4ce8b588276e639bf
283352e805/xwiki-platform-core/xwiki-platform-activitystream/xwiki-platform-
activitystream-ui/src/main/resources/Main/Activity.xml#L1008
Since "not in LOCATION except WITH TAG" doesn't work because it mixes
two types of information in the same query fragment, how about making
this a one-dimensional query as "only WITH TAG". And since this
information is in a different place, pre-querying it is needed. However,
if there are thousands or tens/hundreds of thousands of documents with a
tag, won't the query crash anyway?
Sure, it won't scale.
2018-06-27 12:58 GMT+02:00 Guillaume Delhumeau
<
guillaume.delhumeau(a)xwiki.com>gt;:
> Hi developers.
>
> I am trying to add a new filter to the notifications to be able to
follow
> pages
> that are marked with a given tag. And it leads me to some questions
about
> the
> technical implementation of the notifications.
>
> To remind the context: notifications are computed on top of the events
> recorded
> by the event stream (a.k.a activity stream). We take events from the
event
> stream SQL table, we apply some
transformations on them, and we display
> them to
> the user.
>
> Then we have implemented the ability to filter on these events: for
> example
> "don't show events concerning the document A nor the wiki B". Filters
are
> implemented with 2 distinct ways:
>
> 1/ SQL injections: each filter can add SQL elements in the query we
make
> to
> fetch the events from the event stream table. We made this
mechanism
> so we
> can let the database do a lot of the filtering process. After all,
> it's its
> job and it's supposed to perform well. To be precise, Clement has
> even
> created an Abstract Syntax Tree (AST) so it's easier to inject some
> content
> in the query and it creates an abstraction over the SQL language so
> we can
> even consider to change the storage of the event stream someday.
>
> The bad thing is that some complex filtering are difficult to write
> with
> the SQL language (event with the AST) or even impossible.
>
> 2/ Post-filtering: after the events have been fetched from the
database,
> each
> filter can still decide to keep or filter them. This is useful for
> complex filtering that cannot be expressed with the SQL language.
It
> is
> also needed by the real-time notification email sender, because it
> takes
> the events immediately when they occurs without fetching them in
the
> database (so SQL filters are bypassed).
>
> The bad thing is that some events are loaded in the memory to
finally
> be
> rejected, and these filters can perform costly operations such as
> loading
> documents.
>
> Until now, this double mechanism was working quite well, with each
> mechanism
> filling the lacks of the other.
>
> However, we still have technical limitations in our design:
>
> 1/ Users who have a lot of filter preferences can end up with a giant
> SQL
> query that is almost impossible to perform by the database.
Actually
> we had
> a user complaining about an OutOfMemory problem in the HQL to SQL
> translator !
>
> 2/ I cannot implement the tag filter !
>
> The tag filter is supposed to show events that concern pages that hold a
> given
> tag, EVEN IF THE PAGE WAS EXCLUDED BY THE USER. Example of use-case: "I
> don't
> want to receive notifications about wiki A except for pages marked with
> the tag
> T".
>
> And it is not working. First because it is difficult to write a SQL
query
> for
> that. It requires to make a join clause with the document and the object
> tables,
> which our SQL injection mechanism does not support. Even if it were
> possible,
> creating a SQL join with the document table will de-facto filter events
> that do
> not concern any pages or pages that do not have any objects: so many
other
> filters will be broken. I also don't
consider creating a SQL subquery, I
> think
> the whole query would became too big. So I decided to just not inject
any
> SQL
> code for this filter and only implement the post-filtering mechanism.
>
> But the other filter "EXCLUDE WIKI A" generates a SQL injection such as
> "WIKI <> 'WIKI A'" so the events concerning the wiki A are
not fetched
> from the
> database. Consequence: the tag filter never see the events that it is
> supposed
> to keep. It would be actually possible to by-pass the first SQL
injections
> by
> injecting something like "OR 1=1". But doing something like this is like
> dropping the all SQL injections mechanism.
>
> I see some solutions for this problem:
>
> A/ For each tag, create a permanent list of pages that hold it. So I
can
> inject "OR document IN
(that_list)". I think this is heavy.
Definitely not. You'll have to maintain this list in parallel to the
canonical storage for tags, the xwikidoc table itself, with all the
headaches that data duplication brings.
I agree.
> B/ Drop the SQL injection mechanism and
only rely on the
post-filtering
> mechanism. It would require to load from
the database A LOT of
events,
> but maybe we could cache this.
Nope, filtering millions of events by inspecting them one by one is a
performance nightmare.
> C/ Don't drop the SQL injection
mechanism completely but use it as
> little as
> possible (for example, do not use it for LOCATION filtering). Seems
> hard to
> determine when a filter should use this feature or not.
I think location filtering should still happen in the query itself,
since it is a lot faster.
> D/ Don't implement the "tags"
filter, since it is the root of the
issue.
> But
> it is like sweeping dirt under the carpet!
Well, not implementing something is always the fastest, least intrusive
solution to any problem...
Humor aside, I think that in this case it is the best approach.
Thanks for saying it :)
> Since we have the OutOfMemory problem with
the SQL injections becoming
too
> huge,
> I am more in favor of solution B or C. But I'm not sure for now, since I
> do not
> know how much it would impact the performances and the scalability of
the
whole
notifications feature.
This is a complex topic, but I hope this message will inspire you some
suggestions or things I have not seen with my own eyes.
Thanks for your help,
Guillaume
Other options:
E/ Drop SQL altogether, move the event stream into a nosql database, and
do stream filtering; kind of like the B proposal
That's unfortunately out of the scope right now.
F/ Drop SQL and querying altogether, switch to a pub-sub mechanism where
a subscriber is created for every user and his filters, and matching
events are placed in a queue until they are all sent (consumed) in a
notification. Obviously, this only works for email notifications, not
for browsing past events in a webpage.
Indeed it cannot replace Activity Stream. However, it was my first design
attempt:
http://design.xwiki.org/xwiki/bin/view/Proposal/
NotificationCenterforAppsImplementation#HIteration1
But then we decided to rely on what we had (the event stream) and to kill 2
birds with the same stones (having notifications + replacing activity
stream).
Thanks,
Guillaume
--
Guillaume Delhumeau (guillaume.delhumeau(a)xwiki.com)
Research & Development Engineer at XWiki SAS
Committer on the
XWiki.org project
--
Guillaume Delhumeau (guillaume.delhumeau(a)xwiki.com)
Research & Development Engineer at XWiki SAS
Committer on the
XWiki.org project