On Feb 22, 2010, at 8:11 AM, Vincent Massol wrote:
On Feb 21, 2010, at 11:18 PM, Ludovic Dubost wrote:
Hi XWiki devs,
We have a project where it is needed to show what users has seen which page, in addition
to aggregate statistics.
Now there are multiple ways to implement this, namely either using the Activity Stream
module (which records page level edit activity) or to use the Statistics module (which
records aggregate level view activity).
Which of the two systems would be best to use ?
IMO, in some near future the Stats module will be rewritten to use the activity stream
module and perform computations to store aggregated data.
Some more details. I believe we have only 2 genera choices:
1) Decide to save raw data and then perform any aggregated computations (allowing
OLAP-based queries, business reporting, data mining)
2) Decide to perform aggregation in memory and only save a few metrics
Obviously option 1) is much more powerful and from an architectural POV is the best.
For option 1) the main issue is write performance. The solution I have put in place in the
past for this kind of architecture is the following:
* Have the application write to a local queue (JMS queue to be standard)
* Have several physical servers poll the queue and get data from it as fast as they can
and then save the data in a database different from the main application database. Note
that the scalability is perfect since servers get data from the queue as fast as they can
process it but not more and you just need to add more servers to get better throughput (in
addition the servers don't need to be of the same type).
* Have some separate application querying the raw data database and perform computations
on it and store the result in different tables/database.
The key point for option 1) is to ensure that the performance or your app is not impacted
at all by the storage of the raw events.
So we'll need to decide if we want option 1 or 2.
Thanks
-Vincent
-Vincent
>
> In any case, I would like to implement this as a patch to the standard module with a
setting to activate it at a Wiki level.
>
> My first choice would be to use the Activity Stream module but I see that we have
some code to clean-up the activity stream (is it active ?). I don't think I would want
this data to be deleted.
>
> WDYT ?
>
> Ludovic