FYI, here are the running times of these
queries:
mysql> create table l10n_stats_author as select year(xwr_date)*100+
month(xwr_date) as l10n_month, XWR_AUTHOR as l10n_author, count(distinct
XWR_DOCID, XWR_VERSION1, XWR_VERSION2) as l10n_count from xwikircs group by
1,2;
Query OK, 763 rows affected (1 min 28.36 sec)
Records: 763 Duplicates: 0 Warnings: 0
mysql> delete from l10n_stats_author where l10n_month=201301 or
l10n_month=201302;
Query OK, 29 rows affected (0.00 sec)
mysql> insert into l10n_stats_author (select year(xwr_date)*100+
month(xwr_date) as l10n_month, XWR_AUTHOR as l10n_author, count(distinct
XWR_DOCID, XWR_VERSION1, XWR_VERSION2) as l10n_count from xwikircs where
xwr_date >= '2012-12-01' group by 1,2);
Query OK, 35 rows affected (0.07 sec)
Records: 35 Duplicates: 0 Warnings: 0
Update is very fast
Ludovic
2013/1/24 Ludovic Dubost <ludovic(a)xwiki.com>
The custom statistics data structure is not that
complicated.
I've already the SQL query that does the initial version (based on
xwikircs for now)
create table l10n_stats_author as select year(xwr_date)*100+
month(xwr_date) as l10n_month, XWR_AUTHOR as l10n_author, count(distinct
XWR_DOCID, XWR_VERSION1, XWR_VERSION2) as l10n_count from xwikircs group by
1,2;
delete from l10n_stats_author where l10n_month=201301 or
l10n_month=201302;
insert into l10n_stats_author (select year(xwr_date)*100+
month(xwr_date) as l10n_month, XWR_AUTHOR as l10n_author, count(distinct
XWR_DOCID, XWR_VERSION1, XWR_VERSION2) as l10n_count from xwikircs where
xwr_date >= '2012-12-01' group by 1,2);
select * from l10n_stats_author where l10n_month=201301 or
l10n_month=201302;
Putting this in a scheduler job should not be complicated
Ludovic
2013/1/24 Sergiu Dumitriu <sergiu(a)xwiki.com>
> On 01/24/2013 12:03 AM, Ludovic Dubost wrote:
> > Working on l10n change to custom mapping showed up an issue with the
> l10n
> > history.
> > We have a huge history table with 2,3M lines (as well as an activity
> > stream) and we use the history table for extracting contributor
> statistics.
> >
> > However looking at it more closely shows that the source of most
> data in
> > the history table is useless.
> > See here a grouping by dates showing that localisation of already
> close to
> > 2M lines in 6 month in 2012.
> >
> > | 201201 | 318346 |
> > | 201202 | 311403 |
> > | 201203 | 728703 |
> > | 201204 | 271657 |
> > | 201205 | 296384 |
> > | 201206 | 120463 |
> >
> > Here is an example of history for an entry
> >
> > mysql> select
> >
> xwr_author,xwr_version1,xwr_version2,xwr_date,xwr_comment,cast(xwr_isdiff
> > as unsigned), length(xwr_patch) from xwikircs where
> xwr_docid=596494851 ;
> >
>
+-----------------------+--------------+--------------+---------------------+-------------+------------------------------+-------------------+
> > | xwr_author | xwr_version1 | xwr_version2 | xwr_date
> > | xwr_comment | cast(xwr_isdiff as unsigned) | length(xwr_patch) |
> >
>
+-----------------------+--------------+--------------+---------------------+-------------+------------------------------+-------------------+
> > | XWiki.XWikiTranslator | 1 | 1 | 2010-02-23
> 11:38:27
> > | | 1 | 163 |
> > | XWiki.XWikiTranslator | 2 | 1 | 2010-02-23
> 11:40:26
> > | | 1 | 330 |
> > | XWiki.rbuj | 3 | 1 | 2010-03-04
> 00:24:24
> > | | 1 | 189 |
> > | XWiki.rbuj | 4 | 1 | 2010-03-04
> 01:02:52
> > | | 1 | 223 |
> > | XWiki.rbuj | 5 | 1 | 2010-07-30
> 01:12:58
> > | | 0 | 5026 |
> > | XWiki.XWikiTranslator | 6 | 1 | 2012-01-23
> 11:40:25
> > | | 1 | 115 |
> > | XWiki.XWikiTranslator | 7 | 1 | 2012-01-23
> 11:58:25
> > | | 1 | 115 |
> > | XWiki.XWikiTranslator | 8 | 1 | 2012-01-23
> 13:31:52
> > | | 1 | 234 |
> > | XWiki.XWikiTranslator | 8 | 2 | 2012-01-23
> 15:36:02
> > | Prepared | 1 | 115 |
> > | XWiki.XWikiTranslator | 8 | 3 | 2012-01-24
> 02:01:12
> > | Prepared | 0 | 5670 |
> > | XWiki.XWikiTranslator | 8 | 4 | 2012-01-25
> 02:04:00
> > | Prepared | 1 | 115 |
> > | XWiki.XWikiTranslator | 8 | 5 | 2012-01-26
> 02:02:29
> > | Prepared | 1 | 115 |
> > | XWiki.XWikiTranslator | 8 | 6 | 2012-01-27
> 02:01:18
> > | Prepared | 1 | 115 |
> > | XWiki.XWikiTranslator | 8 | 7 | 2012-02-04
> 02:01:08
> > | Prepared | 1 | 115 |
> > | XWiki.XWikiTranslator | 8 | 8 | 2012-02-07
> 02:02:03
> > | Prepared | 0 | 5670 |
> > | XWiki.XWikiTranslator | 8 | 9 | 2012-02-08
> 02:01:20
> > | Prepared | 1 | 115 |
> > | XWiki.XWikiTranslator | 8 | 10 | 2012-02-14
> 02:01:07
> > | Prepared | 1 | 116 |
> > | XWiki.XWikiTranslator | 8 | 11 | 2012-02-28
> 02:01:32
> > | Prepared | 1 | 116 |
> > | XWiki.XWikiTranslator | 8 | 12 | 2012-03-03
> 02:01:20
> > | Prepared | 1 | 116 |
> > | XWiki.XWikiTranslator | 8 | 13 | 2012-03-14
> 02:01:50
> > | Prepared | 0 | 5671 |
> > | XWiki.XWikiTranslator | 8 | 14 | 2012-03-14
> 18:23:06
> > | Prepared | 1 | 116 |
> > | XWiki.XWikiTranslator | 8 | 15 | 2012-03-14
> 18:32:38
> > | Prepared | 1 | 116 |
> > | XWiki.XWikiTranslator | 8 | 16 | 2012-03-14
> 18:37:17
> > | Prepared | 1 | 116 |
> > | XWiki.XWikiTranslator | 8 | 17 | 2012-03-24
> 02:01:34
> > | Prepared | 1 | 116 |
> > | XWiki.XWikiTranslator | 8 | 18 | 2012-03-26
> 16:40:35
> > | Prepared | 0 | 5671 |
> > | XWiki.XWikiTranslator | 8 | 19 | 2012-03-28
> 02:01:23
> > | Prepared | 1 | 116 |
> > | XWiki.XWikiTranslator | 8 | 20 | 2012-03-29
> 02:01:45
> > | Prepared | 1 | 116 |
> > | XWiki.XWikiTranslator | 8 | 21 | 2012-04-11
> 02:01:36
> > | Prepared | 1 | 116 |
> > | XWiki.XWikiTranslator | 8 | 22 | 2012-04-19
> 15:20:10
> > | Prepared | 1 | 116 |
> > | XWiki.XWikiTranslator | 8 | 24 | 2012-04-29
> 02:01:45
> > | Prepared | 1 | 116 |
> > | XWiki.XWikiTranslator | 8 | 25 | 2012-05-04
> 18:55:30
> > | Prepared | 1 | 116 |
> > | XWiki.XWikiTranslator | 8 | 26 | 2012-05-04
> 19:02:09
> > | Prepared | 1 | 116 |
> > | XWiki.XWikiTranslator | 8 | 27 | 2012-05-10
> 02:04:36
> > | Prepared | 1 | 116 |
> > | XWiki.XWikiTranslator | 8 | 28 | 2012-05-30
> 02:02:08
> > | Prepared | 0 | 5671 |
> > | XWiki.XWikiTranslator | 8 | 29 | 2012-06-07
> 10:44:28
> > | Prepared | 0 | 5782 |
> >
>
+-----------------------+--------------+--------------+---------------------+-------------+------------------------------+-------------------+
> >
> >
>
http://l10n.xwiki.org/xwiki/bin/view/XE/XEXWikiCoreResources_2107067965_cor…
> >
> > Most entries are "no changes" and have been cause by the L10NUpdater
> which
> > wrongefully saved the document with no changes. I believe this must
> have
> > been fixed (by Thomas M.?) mid 2012.
> >
> > Now the 2M lines impact performance significantly and loads the DB
> for
> > nothing (and in the activity stream as well).
> >
> > I suggest we clean up the history and activity stream. We have 2
> > possibilities:
> >
> > For xwikircs:
> >
> > 1/ Clean up only the bad data from XWikiTranslator when there are no
> > changes:
> >
> > This is complicated as you need to verify if the change is actually a
> > change and you cannot do that just with sql queries. It could be
> very long
> >
> > 2/ Clean up old data from pre-201206 from XWikiTranslator
> >
> > Simpler if it is safe to delete by date in the DB. After discussion
> with
> > sergui this is a bit complicated because you need to make sure you
> don't
> > delete the latest full version before the versions you keep. So you
> would
> > have to do it by API which will take ages.
> >
> > 3/ Clean up up old data from pre-201206 from all users
> >
> > This is simpler as you can safely delete from the database
> everything older
> > than a certain versions. Cleans-up even more but would loose
> contributor
> > statistics unless we store 2012 contributor counts in an alternate
> table
> > which would then be regularly updated
>
> +1 for 3/, without the extra stats table yet. See below.
>
> Technically, we could also use /2, since we'll still have history
> summary (who changed when), but some actual versions won't be
> retrievable. Is the actual revision important? IMHO, only if we want to
> investigate some foul play.
>
> > In any case we should probably create this intermediary table for
> > statistics as it would be much faster anyway.
> >
> > For activitystream:
> >
> > 1/ Clean up old data from XWikiTranslator 201206 or earlier
> >
> > 2/ Clean up old date from everybody 201206 or earlier
>
> +1 for 1/
>
> > What value do we see in the l10n history and actvity stream and which
> > solutions do the other commiters suggest ?
> >
> > I would say it's interesting for contributor statistics (counting
> number of
> > contribution by translators) but beyond that we can delete the data.
> > So we would fix that by storing monthly statistics in a table and
> updating
> > the latest 2 month through a scheduler job. This means that we can
> also
> > delete history over 2 month.
>
> We can get contributor activity from the activity stream, independently
> from the RCS table. Since the activity stream doesn't have
> inter-version
> dependencies like the RCS does, we can freely discard irrelevant rows
> from it, and use the remaining valid data. The same information is
> present in the activity stream and in the RCS table: who did what,
> when,
> and where.
>
> I mean, this is a good fast solution for the moment. We could still
> make
> a custom statistics data structure for translation contributors, but
> that would take longer, and thus it delays the migration of the l10n
> wiki to a newer version, and delays the performance improvement that
> we'd gain by dropping data.
>
> --
> Sergiu Dumitriu
>
http://purl.org/net/sergiu
> _______________________________________________
> devs mailing list
> devs(a)xwiki.org
>
http://lists.xwiki.org/mailman/listinfo/devs
>
--
Ludovic Dubost
Founder and CEO
Blog:
http://blog.ludovic.org/
XWiki:
http://www.xwiki.com
Skype: ldubost GTalk: ldubost