Hi Vincent,
On Mon, Aug 18, 2008 at 2:59 AM, Vincent Massol <vincent(a)massol.net> wrote:
Hi Wang,
I have some questions/remarks on the commit below:
1) It's hard to see what you're modifying from the default tag infos
(in default.xml). I think a better solution might be to extend
DefaultTagProvider and use the constructor accepting an info provider,
see
http://htmlcleaner.sourceforge.net/doc/org/htmlcleaner
HtmlCleaner.html#HtmlCleaner(org.htmlcleaner.ITagInfoProvider)
. In this manner we just extend the default and not duplicate it.
WDYT? Is this possible? Are you adding tags or redefining default one?
I have seen this DefaultTagProvider, but I don't think it can provide
what I want.
I want to replace some tags with others. Such like replace <p> in <li>
to <span>, replace deprecated tags with others. Like replace <s> to
<del>. But the default.xml can define this.
2) The following code is not required:
+ private void removeHead(TagNode node)
+ {
+ logger.debug("remove the head tag of the html");
+ TagNode head = node.findElementByName("head", true);
+ head.removeFromTree();
+ }
instead you can just use props.setOmitHtmlEnvelope(true);
I have tried
props.setOmitHtmlEnvelope(true), but there are some
errors. If input is
<html>
<head/>
<body>
<p>p1</p>
<p>p2</p>
</body>
</html>
the output TagNode is not <p>p1</p><p>p2</p> but only
<p>p2</p>. So I
have to use removeHead() to remove the head tag.
3) Could you explain why we need convertAllAttribute2Lowercase?
In xhtml, all the attributes value should be lowercase. And some
attributes value in the conversion result of openoffice is uppercase,
like algin="CENTER". I think lowercase will be better. But if
xhtmlparser could handler uppercase attributes value, I will remove
this method.
4) Could you refactor your code to use the new
HTMLCleaner code I have
put in xwiki-xml module?
I will do it today.
Feel free to provide patches for CleaningFilters.
[snip]
--
Thanks
Wang Ning