Hi Wang,
On Aug 26, 2008, at 7:49 PM, Wang Ning wrote:
Hi Vincent,
1) add props.setPruneTags("script, style"); in DefaultHTMLCleaner.
This will remove all the script and style tags and their contents.
script and style tags are useless for the later use, IMO.
Done (I'm not 100% sure but we'll see).
2) remove the first p tag following th li tag.
<ul>
<li><p>test</p></li>
</ul>
could not render properly in xwiki syntax 1.0 and
xhtmlparser+xwikisyntaxrendering.
1.0 syntax doesn't matter since the office converter is meant to work
the 2.0 syntax.
It should change to
<ul>
<li>test</li>
</ul>
I have a filter with w3c dom:
http://svn.xwiki.org/svnroot/xwiki/sandbox/xwiki-plugin-officeimporter/src/…
Maybe can help. If you need a jdom version, I can provide it later
if necessary.
It's probably more complex than it looks. What about this:
<ul>
<li><p>test</p><p>test2</p></li>
</ul>
3) empty link. like <a/> <a
href="">test</a> <a>something</a>
http://svn.xwiki.org/svnroot/xwiki/sandbox/xwiki-plugin-officeimporter/src/…
this filter can remove empty link tag.
i don't think this is correct. For example the following is valid and
should be kept:
<a name="id"/>
Thanks
-Vincent