On 03/15/2011 07:27 PM, Víctor A. Rodríguez (Bit-Man) wrote:
Hi,
we're using Xwiki as our main local knowledge repository (kind of) and
we've started some gardening.
We did some assessment and some minimal cleanup but as far as we can
see we'll need to do some automated wiki gardening : broken links
detection, duplicated content, etc.
Do you use some automated tools to do it ? or simply do the cleaning manually ?
No, not really. What can be done is to detect such content.
There's the "Orphaned Pages" tab in "Document Index", which can
list
pages which don't have a valid parent.
This snippet can detect broken links:
http://extensions.xwiki.org/xwiki/bin/Extension/All+Broken+Links
Duplicated content is harder to find, and depends on what you understand
by "duplicate content". If that's exact copy, character by character,
you could use something like this: (works on mysql, depends on the rdbms
implementing MD5 method)
{{velocity}}
#foreach($doc in $xwiki.search("select doc.fullName from XWikiDocument
doc where MD5(doc.content) in (select MD5(d.content) from XWikiDocument
d group by MD5(d.content) having count(*) > 1)"))
* [[$doc]]
#end
{{/velocity}}
This only checks the content field, and will report all documents based
on the template+sheet pattern.
If you mean "fairly similar to", then that's not something that can be
done out of the box, but you could integrate a third party tool
dedicated to finding similar documents, feed it the content of the wiki,
and check its results.
As for the cleanup part, that has to be done manually. Or, with a bit of
scripting, you can do whatever you want with the reported documents.
--
Sergiu Dumitriu
http://purl.org/net/sergiu/