Working on l10n change to custom mapping showed up an issue with the l10n
history.
We have a huge history table with 2,3M lines (as well as an activity
stream) and we use the history table for extracting contributor statistics.
However looking at it more closely shows that the source of most data in
the history table is useless.
See here a grouping by dates showing that localisation of already close to
2M lines in 6 month in 2012.
| 201201 | 318346 |
| 201202 | 311403 |
| 201203 | 728703 |
| 201204 | 271657 |
| 201205 | 296384 |
| 201206 | 120463 |
Here is an example of history for an entry
mysql> select
xwr_author,xwr_version1,xwr_version2,xwr_date,xwr_comment,cast(xwr_isdiff
as unsigned), length(xwr_patch) from xwikircs where xwr_docid=596494851 ;
+-----------------------+--------------+--------------+---------------------+-------------+------------------------------+-------------------+
| xwr_author | xwr_version1 | xwr_version2 | xwr_date
| xwr_comment | cast(xwr_isdiff as unsigned) | length(xwr_patch) |
+-----------------------+--------------+--------------+---------------------+-------------+------------------------------+-------------------+
| XWiki.XWikiTranslator | 1 | 1 | 2010-02-23 11:38:27
| | 1 | 163 |
| XWiki.XWikiTranslator | 2 | 1 | 2010-02-23 11:40:26
| | 1 | 330 |
| XWiki.rbuj | 3 | 1 | 2010-03-04 00:24:24
| | 1 | 189 |
| XWiki.rbuj | 4 | 1 | 2010-03-04 01:02:52
| | 1 | 223 |
| XWiki.rbuj | 5 | 1 | 2010-07-30 01:12:58
| | 0 | 5026 |
| XWiki.XWikiTranslator | 6 | 1 | 2012-01-23 11:40:25
| | 1 | 115 |
| XWiki.XWikiTranslator | 7 | 1 | 2012-01-23 11:58:25
| | 1 | 115 |
| XWiki.XWikiTranslator | 8 | 1 | 2012-01-23 13:31:52
| | 1 | 234 |
| XWiki.XWikiTranslator | 8 | 2 | 2012-01-23 15:36:02
| Prepared | 1 | 115 |
| XWiki.XWikiTranslator | 8 | 3 | 2012-01-24 02:01:12
| Prepared | 0 | 5670 |
| XWiki.XWikiTranslator | 8 | 4 | 2012-01-25 02:04:00
| Prepared | 1 | 115 |
| XWiki.XWikiTranslator | 8 | 5 | 2012-01-26 02:02:29
| Prepared | 1 | 115 |
| XWiki.XWikiTranslator | 8 | 6 | 2012-01-27 02:01:18
| Prepared | 1 | 115 |
| XWiki.XWikiTranslator | 8 | 7 | 2012-02-04 02:01:08
| Prepared | 1 | 115 |
| XWiki.XWikiTranslator | 8 | 8 | 2012-02-07 02:02:03
| Prepared | 0 | 5670 |
| XWiki.XWikiTranslator | 8 | 9 | 2012-02-08 02:01:20
| Prepared | 1 | 115 |
| XWiki.XWikiTranslator | 8 | 10 | 2012-02-14 02:01:07
| Prepared | 1 | 116 |
| XWiki.XWikiTranslator | 8 | 11 | 2012-02-28 02:01:32
| Prepared | 1 | 116 |
| XWiki.XWikiTranslator | 8 | 12 | 2012-03-03 02:01:20
| Prepared | 1 | 116 |
| XWiki.XWikiTranslator | 8 | 13 | 2012-03-14 02:01:50
| Prepared | 0 | 5671 |
| XWiki.XWikiTranslator | 8 | 14 | 2012-03-14 18:23:06
| Prepared | 1 | 116 |
| XWiki.XWikiTranslator | 8 | 15 | 2012-03-14 18:32:38
| Prepared | 1 | 116 |
| XWiki.XWikiTranslator | 8 | 16 | 2012-03-14 18:37:17
| Prepared | 1 | 116 |
| XWiki.XWikiTranslator | 8 | 17 | 2012-03-24 02:01:34
| Prepared | 1 | 116 |
| XWiki.XWikiTranslator | 8 | 18 | 2012-03-26 16:40:35
| Prepared | 0 | 5671 |
| XWiki.XWikiTranslator | 8 | 19 | 2012-03-28 02:01:23
| Prepared | 1 | 116 |
| XWiki.XWikiTranslator | 8 | 20 | 2012-03-29 02:01:45
| Prepared | 1 | 116 |
| XWiki.XWikiTranslator | 8 | 21 | 2012-04-11 02:01:36
| Prepared | 1 | 116 |
| XWiki.XWikiTranslator | 8 | 22 | 2012-04-19 15:20:10
| Prepared | 1 | 116 |
| XWiki.XWikiTranslator | 8 | 24 | 2012-04-29 02:01:45
| Prepared | 1 | 116 |
| XWiki.XWikiTranslator | 8 | 25 | 2012-05-04 18:55:30
| Prepared | 1 | 116 |
| XWiki.XWikiTranslator | 8 | 26 | 2012-05-04 19:02:09
| Prepared | 1 | 116 |
| XWiki.XWikiTranslator | 8 | 27 | 2012-05-10 02:04:36
| Prepared | 1 | 116 |
| XWiki.XWikiTranslator | 8 | 28 | 2012-05-30 02:02:08
| Prepared | 0 | 5671 |
| XWiki.XWikiTranslator | 8 | 29 | 2012-06-07 10:44:28
| Prepared | 0 | 5782 |
+-----------------------+--------------+--------------+---------------------+-------------+------------------------------+-------------------+
http://l10n.xwiki.org/xwiki/bin/view/XE/XEXWikiCoreResources_2107067965_cor…
Most entries are "no changes" and have been cause by the L10NUpdater which
wrongefully saved the document with no changes. I believe this must have
been fixed (by Thomas M.?) mid 2012.
Now the 2M lines impact performance significantly and loads the DB for
nothing (and in the activity stream as well).
I suggest we clean up the history and activity stream. We have 2
possibilities:
For xwikircs:
1/ Clean up only the bad data from XWikiTranslator when there are no
changes:
This is complicated as you need to verify if the change is actually a
change and you cannot do that just with sql queries. It could be very long
2/ Clean up old data from pre-201206 from XWikiTranslator
Simpler if it is safe to delete by date in the DB. After discussion with
sergui this is a bit complicated because you need to make sure you don't
delete the latest full version before the versions you keep. So you would
have to do it by API which will take ages.
3/ Clean up up old data from pre-201206 from all users
This is simpler as you can safely delete from the database everything older
than a certain versions. Cleans-up even more but would loose contributor
statistics unless we store 2012 contributor counts in an alternate table
which would then be regularly updated
In any case we should probably create this intermediary table for
statistics as it would be much faster anyway.
For activitystream:
1/ Clean up old data from XWikiTranslator 201206 or earlier
2/ Clean up old date from everybody 201206 or earlier
What value do we see in the l10n history and actvity stream and which
solutions do the other commiters suggest ?
I would say it's interesting for contributor statistics (counting number of
contribution by translators) but beyond that we can delete the data.
So we would fix that by storing monthly statistics in a table and updating
the latest 2 month through a scheduler job. This means that we can also
delete history over 2 month.
WDYT ?
Ludovic
--
Ludovic Dubost
Founder and CEO
Blog:
http://blog.ludovic.org/
XWiki:
http://www.xwiki.com
Skype: ldubost GTalk: ldubost