[xwiki-users] Wiki gardening
Hi, we're using Xwiki as our main local knowledge repository (kind of) and we've started some gardening. We did some assessment and some minimal cleanup but as far as we can see we'll need to do some automated wiki gardening : broken links detection, duplicated content, etc. Do you use some automated tools to do it ? or simply do the cleaning manually ? Thanks ! -- Víctor A. Rodríguez (http://www.bit-man.com.ar) El bit Fantasma (Bit-Man) Programming: love it or leave it
On 03/15/2011 07:27 PM, Víctor A. Rodríguez (Bit-Man) wrote:
Hi,
we're using Xwiki as our main local knowledge repository (kind of) and we've started some gardening. We did some assessment and some minimal cleanup but as far as we can see we'll need to do some automated wiki gardening : broken links detection, duplicated content, etc.
Do you use some automated tools to do it ? or simply do the cleaning manually ?
No, not really. What can be done is to detect such content. There's the "Orphaned Pages" tab in "Document Index", which can list pages which don't have a valid parent. This snippet can detect broken links: http://extensions.xwiki.org/xwiki/bin/Extension/All+Broken+Links Duplicated content is harder to find, and depends on what you understand by "duplicate content". If that's exact copy, character by character, you could use something like this: (works on mysql, depends on the rdbms implementing MD5 method) {{velocity}} #foreach($doc in $xwiki.search("select doc.fullName from XWikiDocument doc where MD5(doc.content) in (select MD5(d.content) from XWikiDocument d group by MD5(d.content) having count(*) > 1)")) * [[$doc]] #end {{/velocity}} This only checks the content field, and will report all documents based on the template+sheet pattern. If you mean "fairly similar to", then that's not something that can be done out of the box, but you could integrate a third party tool dedicated to finding similar documents, feed it the content of the wiki, and check its results. As for the cleanup part, that has to be done manually. Or, with a bit of scripting, you can do whatever you want with the reported documents. -- Sergiu Dumitriu http://purl.org/net/sergiu/
Hi Sergiu, On Tue, Mar 15, 2011 at 16:17, Sergiu Dumitriu <[email protected]> wrote:
On 03/15/2011 07:27 PM, Víctor A. Rodríguez (Bit-Man) wrote:
Hi,
we're using Xwiki as our main local knowledge repository (kind of) and we've started some gardening. We did some assessment and some minimal cleanup but as far as we can see we'll need to do some automated wiki gardening : broken links detection, duplicated content, etc.
Do you use some automated tools to do it ? or simply do the cleaning manually ?
No, not really. What can be done is to detect such content.
There's the "Orphaned Pages" tab in "Document Index", which can list pages which don't have a valid parent.
This snippet can detect broken links: http://extensions.xwiki.org/xwiki/bin/Extension/All+Broken+Links
Thanks a lot, I'll take a look to the extensions
Duplicated content is harder to find, and depends on what you understand by "duplicate content". If that's exact copy, character by character, you could use something like this: (works on mysql, depends on the rdbms implementing MD5 method)
{{velocity}} #foreach($doc in $xwiki.search("select doc.fullName from XWikiDocument doc where MD5(doc.content) in (select MD5(d.content) from XWikiDocument d group by MD5(d.content) having count(*) > 1)")) * [[$doc]] #end {{/velocity}}
This only checks the content field, and will report all documents based on the template+sheet pattern.
If you mean "fairly similar to", then that's not something that can be done out of the box, but you could integrate a third party tool dedicated to finding similar documents, feed it the content of the wiki, and check its results.
Agreed, it's a kind of "ambitious goal"
As for the cleanup part, that has to be done manually. Or, with a bit of scripting, you can do whatever you want with the reported documents.
Thanks a lot for your help ! -- Víctor A. Rodríguez (http://www.bit-man.com.ar) El bit Fantasma (Bit-Man) Programming: love it or leave it
participants (2)
-
Sergiu Dumitriu -
Víctor A. Rodríguez (Bit-Man)