On 02/13/2012 04:45 AM, Ludovic Dubost wrote:
Hi,
I've looked a bit at the activity stream performance while looking for the
performance issue since 3.2+ (
http://jira.xwiki.org/browse/XWIKI-7520).
Beyond this issue, I've been a bit puzzled by the logic of the activity
stream implementation.
Right now it seems the activity stream is generating many many queries on
the base data stored in the activity stream.
However I've not been able to identify the exact logic it is following as
it seems to be quite complex.
The whole point of the activity stream when it was initially implemented
was to move the work at saving time instead of having the work at display
time.
As the feature got more complex it seems we move away from that solution
and now we have again a huge amount of work at display time.
Now maybe the actual logic of what we want to display requires this, or
maybe not and we haven't gone in the right direction to implement this.
I think before we reimplement the activity stream in Java as I've seen said
in the feature survey, we should put the actual feature and logic on paper
and make sure we are going the right way.
Because otherwise reimplementing in Java won't solve anything.
I think it would be really good to go back to the initial objective of
having the effort at save time and then having the display only read data
in display it with simple templating.
Is there any documentation about the feature itself and about the logic ?
Can we put somebody on writing down the logic and then discussing that it's
the right thing to do ?
I can help on this if I'm given some more information about why it was done
the way it's done now.
The problem is that now we have two different groups that should be
displayed in the stream:
- event groups, which means that a DocumentUpdatedEvent can be a side
effect of an XarImportEvent, in which case we shouldn't list 100 update
events but a singe "somebody imported a XAR" event;
- daily document activity, which means that instead of displaying each
change individually, or just the most recent change on a document (like
we did with the old activity implementation), we show all the changes
done on a document in a day.
Both kinds of groupings are relevant, so they shouldn't be removed.
The problem is that they were introduced on top of the existing Activity
code, without a clean refactoring, which means that instead of modifying
the base logic, we just "patched" it with extra queries to get the
grouping right. It should be possible to write queries so that they
select the right events without the need for subqueries. One problem is
that I don't know how to define "daily" when timezones are involved,
since HQL doesn't know timezone conversions. One option is to do some
date math, as in:
group by day(event.date + user timezone offset hours)
but this isn't something standard, as far as I know.
--
Sergiu Dumitriu
http://purl.org/net/sergiu/