On Aug 18, 2008, at 4:37 AM, Wang Ning wrote:
Hi Vincent,
On Mon, Aug 18, 2008 at 2:59 AM, Vincent Massol <vincent(a)massol.net>
wrote:
Hi Wang,
I have some questions/remarks on the commit below:
1) It's hard to see what you're modifying from the default tag infos
(in default.xml). I think a better solution might be to extend
DefaultTagProvider and use the constructor accepting an info
provider,
see
http://htmlcleaner.sourceforge.net/doc/org/htmlcleaner
HtmlCleaner.html#HtmlCleaner(org.htmlcleaner.ITagInfoProvider)
. In this manner we just extend the default and not duplicate it.
WDYT? Is this possible? Are you adding tags or redefining default
one?
I have seen this DefaultTagProvider, but I don't think it can provide
what I want.
I want to replace some tags with others.
public class MyProvider extends DefaultTagProvided
{
public TagInfo getTagInfo(String tagName)
{
if (tagName.equals("s") {
return new TagInfo(....);
} else {
return super.getTagInfo(tagName)
}
}
}
or something like that.
I guess it looks better if we have only a few changes to make. If we
have a lot then the file solution might be better. In any case make
sure you separate and document your changes from the default in
default.xml.
Such like replace <p> in <li>
to <span>, replace deprecated tags with others. Like replace <s> to
<del>. But the default.xml can define this.
2) The following code is not required:
+ private void removeHead(TagNode node)
+ {
+ logger.debug("remove the head tag of the html");
+ TagNode head = node.findElementByName("head", true);
+ head.removeFromTree();
+ }
instead you can just use props.setOmitHtmlEnvelope(true);
I have tried
props.setOmitHtmlEnvelope(true), but there are some
errors. If input is
<html>
<head/>
<body>
<p>p1</p>
<p>p2</p>
</body>
</html>
the output TagNode is not <p>p1</p><p>p2</p> but only
<p>p2</p>. So I
have to use removeHead() to remove the head tag.
htmlcleaner seems quite buggy... Even with setOmitHtmlEnvelope to
false it still generate something strange with an extra line between
the 2 paragraphs of your example. Do you think you could report an
issue in the issue tracker of the htmlcleaner project so that they can
fix it in the next release?
3) Could you
explain why we need convertAllAttribute2Lowercase?
In xhtml, all the attributes value should be lowercase.
Do you have a link to the XHTML spec for this? I can only find
http://www.w3.org/TR/xhtml1/#h-4.2
but this is different.
I have only found this
http://lists.xml.org/archives/xml-dev/200109/msg00718.html
I'd like to know what we gain by transforming in lowercase. Since it's
extra effort we need to rationalize it.
And some
attributes value in the conversion result of openoffice is uppercase,
like algin="CENTER". I think lowercase will be better. But if
xhtmlparser could handler uppercase attributes value, I will remove
this method.
4) Could you refactor your code to use the new
HTMLCleaner code I
have
put in xwiki-xml module?
I will do it today.
cool
Thanks
-Vincent
Feel free to provide patches for CleaningFilters.
[snip]
--
Thanks
Wang Ning
_______________________________________________
devs mailing list
devs(a)xwiki.org
http://lists.xwiki.org/mailman/listinfo/devs