Hi Vincent,
1) add props.setPruneTags("script, style"); in DefaultHTMLCleaner.
This will remove all the script and style tags and their contents.
script and style tags are useless for the later use, IMO.
2) remove the first p tag following th li tag.
<ul>
<li><p>test</p></li>
</ul>
could not render properly in xwiki syntax 1.0 and
xhtmlparser+xwikisyntaxrendering.
It should change to
<ul>
<li>test</li>
</ul>
I have a filter with w3c dom:
http://svn.xwiki.org/svnroot/xwiki/sandbox/xwiki-plugin-officeimporter/src/…
Maybe can help. If you need a jdom version, I can provide it later if necessary.
3) empty link. like <a/> <a href="">test</a>
<a>something</a>
http://svn.xwiki.org/svnroot/xwiki/sandbox/xwiki-plugin-officeimporter/src/…
this filter can remove empty link tag.
Thanks
Wang Ning