Pablo:
Yes, I found the JTidy package (as an option to another one in XWiki's
libs), but the HTML parser's constructor seemed to depend on some other
stuff that I don't think I can synthesize; I think there's yet another
one in the XWiki distribution, but I haven't looked at it yet (low
priority for now).
Of course, I can always use Javascript, yuck (not that I dislike it per
se, but due to the hegemonious lusts of a certain corporation - the bane
of my professional existence, which I am ashamed to say is from my own
country - cross-browser compatibility rates just above un-anaesthetized
oral surgery on my personal list of preferences)...
I will investigate TagSoup, though; thanks.
brain[sic]
-----Original Message-----
From: Pablo Oliveira [mailto:pablo.oliveira@enst.fr]
Sent: Thursday, April 12, 2007 9:07 AM
To: xwiki-users(a)objectweb.org
Subject: Re: [xwiki-users]
Xwiki.com API stability and
Class/Object model
On Apr 06, THOMAS, BRIAN M (ATTSI) wrote :
From: Sergiu Dumitriu
[mailto:sergiu.dumitriu@gmail.com]
Sent: Thursday, April 05, 2007 4:16 PM
To: xwiki-users(a)objectweb.org
Subject: Re: [xwiki-users]
Xwiki.com API stability and
Class/Object
model
On 4/4/07, THOMAS, BRIAN M (ATTSI) <bt0008(a)att.com> wrote:
The only reason I haven't already made a start
of it is that I
haven't
found an HTML DOM parser. Is there one in the
myriad of libraries
that
come with XWiki?
What do you mean by "HTML DOM parser"? You can use any
DOM parser as
long as it's well formed XML, and it should
be.
--
http://purl.org/net/sergiu
Unfortunately, it isn't:
Nested exception: org.xml.sax.SAXParseException: The
declaration for
the entity "HTML.Version" must end with
'>'.
This exception is thrown regardless of which of the javadoc pages I
use...
Just my two cents:
you might have a look at TagSoup
(
http://home.ccil.org/~cowan/XML/tagsoup/) or JTidy
(
http://jtidy.sourceforge.net/) which I think is distributed
already as part of XWiki, those should help you when dealing
with non xml-valid HTML.
Pablo