The custom statistics data structure is not that
complicated.
I've already the SQL query that does the initial version (based on
xwikircs for now)
create table l10n_stats_author as select year(xwr_date)*100+
month(xwr_date) as l10n_month, XWR_AUTHOR as l10n_author, count(distinct
XWR_DOCID, XWR_VERSION1, XWR_VERSION2) as l10n_count from xwikircs group by
1,2;
delete from l10n_stats_author where l10n_month=201301 or
l10n_month=201302;
insert into l10n_stats_author (select year(xwr_date)*100+
month(xwr_date) as l10n_month, XWR_AUTHOR as l10n_author, count(distinct
XWR_DOCID, XWR_VERSION1, XWR_VERSION2) as l10n_count from xwikircs where
xwr_date >= '2012-12-01' group by 1,2);
select * from l10n_stats_author where l10n_month=201301 or
l10n_month=201302;
Putting this in a scheduler job should not be complicated
Ludovic
2013/1/24 Sergiu Dumitriu <sergiu(a)xwiki.com>
On 01/24/2013 12:03 AM, Ludovic Dubost wrote:
Working on l10n change to custom mapping showed
up an issue with the
l10n
history.
We have a huge history table with 2,3M lines (as well as an activity
stream) and we use the history table for extracting contributor
statistics.
However looking at it more closely shows that the source of most data
in
the history table is useless.
See here a grouping by dates showing that localisation of already
close to
2M lines in 6 month in 2012.
| 201201 | 318346 |
| 201202 | 311403 |
| 201203 | 728703 |
| 201204 | 271657 |
| 201205 | 296384 |
| 201206 | 120463 |
Here is an example of history for an entry
mysql> select
xwr_author,xwr_version1,xwr_version2,xwr_date,xwr_comment,cast(xwr_isdiff
as unsigned), length(xwr_patch) from xwikircs
where
xwr_docid=596494851 ;
+-----------------------+--------------+--------------+---------------------+-------------+------------------------------+-------------------+
> | xwr_author | xwr_version1 | xwr_version2 | xwr_date
> | xwr_comment | cast(xwr_isdiff as unsigned) | length(xwr_patch) |
+-----------------------+--------------+--------------+---------------------+-------------+------------------------------+-------------------+
| XWiki.XWikiTranslator | 1 |
1 | 2010-02-23
11:38:27
| | 1 |
163 |
| XWiki.XWikiTranslator | 2 | 1 | 2010-02-23
11:40:26
| | 1 |
330 |
| XWiki.rbuj | 3 | 1 | 2010-03-04
00:24:24
| | 1 |
189 |
| XWiki.rbuj | 4 | 1 | 2010-03-04
01:02:52
| | 1 |
223 |
| XWiki.rbuj | 5 | 1 | 2010-07-30
01:12:58
| | 0 |
5026 |
| XWiki.XWikiTranslator | 6 | 1 | 2012-01-23
11:40:25
| | 1 |
115 |
| XWiki.XWikiTranslator | 7 | 1 | 2012-01-23
11:58:25
| | 1 |
115 |
| XWiki.XWikiTranslator | 8 | 1 | 2012-01-23
13:31:52
| | 1 |
234 |
| XWiki.XWikiTranslator | 8 | 2 | 2012-01-23
15:36:02
| Prepared | 1 |
115 |
| XWiki.XWikiTranslator | 8 | 3 | 2012-01-24
02:01:12
| Prepared | 0 |
5670 |
| XWiki.XWikiTranslator | 8 | 4 | 2012-01-25
02:04:00
| Prepared | 1 |
115 |
| XWiki.XWikiTranslator | 8 | 5 | 2012-01-26
02:02:29
| Prepared | 1 |
115 |
| XWiki.XWikiTranslator | 8 | 6 | 2012-01-27
02:01:18
| Prepared | 1 |
115 |
| XWiki.XWikiTranslator | 8 | 7 | 2012-02-04
02:01:08
| Prepared | 1 |
115 |
| XWiki.XWikiTranslator | 8 | 8 | 2012-02-07
02:02:03
| Prepared | 0 |
5670 |
| XWiki.XWikiTranslator | 8 | 9 | 2012-02-08
02:01:20
| Prepared | 1 |
115 |
| XWiki.XWikiTranslator | 8 | 10 | 2012-02-14
02:01:07
| Prepared | 1 |
116 |
| XWiki.XWikiTranslator | 8 | 11 | 2012-02-28
02:01:32
| Prepared | 1 |
116 |
| XWiki.XWikiTranslator | 8 | 12 | 2012-03-03
02:01:20
| Prepared | 1 |
116 |
| XWiki.XWikiTranslator | 8 | 13 | 2012-03-14
02:01:50
| Prepared | 0 |
5671 |
| XWiki.XWikiTranslator | 8 | 14 | 2012-03-14
18:23:06
| Prepared | 1 |
116 |
| XWiki.XWikiTranslator | 8 | 15 | 2012-03-14
18:32:38
| Prepared | 1 |
116 |
| XWiki.XWikiTranslator | 8 | 16 | 2012-03-14
18:37:17
| Prepared | 1 |
116 |
| XWiki.XWikiTranslator | 8 | 17 | 2012-03-24
02:01:34
| Prepared | 1 |
116 |
| XWiki.XWikiTranslator | 8 | 18 | 2012-03-26
16:40:35
| Prepared | 0 |
5671 |
| XWiki.XWikiTranslator | 8 | 19 | 2012-03-28
02:01:23
| Prepared | 1 |
116 |
| XWiki.XWikiTranslator | 8 | 20 | 2012-03-29
02:01:45
| Prepared | 1 |
116 |
| XWiki.XWikiTranslator | 8 | 21 | 2012-04-11
02:01:36
| Prepared | 1 |
116 |
| XWiki.XWikiTranslator | 8 | 22 | 2012-04-19
15:20:10
| Prepared | 1 |
116 |
| XWiki.XWikiTranslator | 8 | 24 | 2012-04-29
02:01:45
| Prepared | 1 |
116 |
| XWiki.XWikiTranslator | 8 | 25 | 2012-05-04
18:55:30
| Prepared | 1 |
116 |
| XWiki.XWikiTranslator | 8 | 26 | 2012-05-04
19:02:09
| Prepared | 1 |
116 |
| XWiki.XWikiTranslator | 8 | 27 | 2012-05-10
02:04:36
| Prepared | 1 |
116 |
| XWiki.XWikiTranslator | 8 | 28 | 2012-05-30
02:02:08
| Prepared | 0 |
5671 |
| XWiki.XWikiTranslator | 8 | 29 | 2012-06-07
10:44:28
> | Prepared | 0 | 5782 |
+-----------------------+--------------+--------------+---------------------+-------------+------------------------------+-------------------+
http://l10n.xwiki.org/xwiki/bin/view/XE/XEXWikiCoreResources_2107067965_cor…
> Most entries are "no
changes" and have been cause by the L10NUpdater
which
wrongefully saved the document with no changes. I
believe this must
have
> been fixed (by Thomas M.?) mid 2012.
> Now the 2M lines impact performance
significantly and loads the DB for
> nothing (and in the activity stream as well).
> I suggest we clean up the history and
activity stream. We have 2
> possibilities:
> For xwikircs:
> 1/ Clean up only the bad data from
XWikiTranslator when there are no
> changes:
> This is complicated as you need to
verify if the change is actually a
> change and you cannot do that just with sql queries. It could be very
long
> 2/ Clean up old data from pre-201206
from XWikiTranslator
> Simpler if it is safe to delete by
date in the DB. After discussion
with
sergui this is a bit complicated because you need
to make sure you
don't
delete the latest full version before the
versions you keep. So you
would
> have to do it by API which will take ages.
> 3/ Clean up up old data from
pre-201206 from all users
> This is simpler as you can safely
delete from the database everything
older
than a certain versions. Cleans-up even more but
would loose
contributor
statistics unless we store 2012 contributor
counts in an alternate
table
which would then be regularly updated
+1 for 3/, without the extra stats table yet. See below.
Technically, we could also use /2, since we'll still have history
summary (who changed when), but some actual versions won't be
retrievable. Is the actual revision important? IMHO, only if we want to
investigate some foul play.
> In any case we should probably create this intermediary table for
> statistics as it would be much faster anyway.
> For activitystream:
> 1/ Clean up old data from
XWikiTranslator 201206 or earlier
> 2/ Clean up old date from everybody
201206 or earlier
+1 for 1/
> What value do we see in the l10n history and actvity stream and which
> solutions do the other commiters suggest ?
> I would say it's interesting for
contributor statistics (counting
number of
contribution by translators) but beyond that we
can delete the data.
So we would fix that by storing monthly statistics in a table and
updating
the latest 2 month through a scheduler job. This
means that we can also
delete history over 2 month.
We can get contributor activity from the activity stream, independently
from the RCS table. Since the activity stream doesn't have inter-version
dependencies like the RCS does, we can freely discard irrelevant rows
from it, and use the remaining valid data. The same information is
present in the activity stream and in the RCS table: who did what, when,
and where.
I mean, this is a good fast solution for the moment. We could still make
a custom statistics data structure for translation contributors, but
that would take longer, and thus it delays the migration of the l10n
wiki to a newer version, and delays the performance improvement that
we'd gain by dropping data.
--
Sergiu Dumitriu
http://purl.org/net/sergiu
_______________________________________________
devs mailing list
devs(a)xwiki.org
http://lists.xwiki.org/mailman/listinfo/devs