[xwiki-users] HTML DOM parser? (was: Xwiki.com API stability and Class/Object model)

THOMAS, BRIAN M (ATTSI) bt0008 at att.com
Thu Apr 12 19:03:53 CEST 2007


Pablo:

Yes, I found the JTidy package (as an option to another one in XWiki's
libs), but the HTML parser's constructor seemed to depend on some other
stuff that I don't think I can synthesize; I think there's yet another
one in the XWiki distribution, but I haven't looked at it yet (low
priority for now).

Of course, I can always use Javascript, yuck (not that I dislike it per
se, but due to the hegemonious lusts of a certain corporation - the bane
of my professional existence, which I am ashamed to say is from my own
country - cross-browser compatibility rates just above un-anaesthetized
oral surgery on my personal list of preferences)...

I will investigate TagSoup, though; thanks.

brain[sic]  

> -----Original Message-----
> From: Pablo Oliveira [mailto:pablo.oliveira at enst.fr] 
> Sent: Thursday, April 12, 2007 9:07 AM
> To: xwiki-users at objectweb.org
> Subject: Re: [xwiki-users] Xwiki.com API stability and 
> Class/Object model
> 
> On Apr 06, THOMAS, BRIAN M (ATTSI) wrote :
>  
> > 	From: Sergiu Dumitriu [mailto:sergiu.dumitriu at gmail.com] 
> > 	Sent: Thursday, April 05, 2007 4:16 PM
> > 	To: xwiki-users at objectweb.org
> > 	Subject: Re: [xwiki-users] Xwiki.com API stability and 
> Class/Object 
> > model
> > 
> > 	On 4/4/07, THOMAS, BRIAN M (ATTSI) <bt0008 at att.com> wrote: 
> > 
> > 
> > 		The only reason I haven't already made a start 
> of it is that I 
> > haven't
> > 		found an HTML DOM parser.  Is there one in the 
> myriad of libraries 
> > that
> > 		come with XWiki?
> > 		
> > 		
> > 
> > 
> > 	What do you mean by "HTML DOM parser"? You can use any 
> DOM parser as 
> > long as it's well formed XML, and it should be.
> > 	
> > 	
> > 	-- 
> > 	http://purl.org/net/sergiu
> > 	 
> > 
> >  Unfortunately, it isn't:
> >  
> > Nested exception: org.xml.sax.SAXParseException: The 
> declaration for 
> > the entity "HTML.Version" must end with '>'.
> > 
> >  
> > 
> > This exception is thrown regardless of which of the javadoc pages I 
> > use...
> 
> Just my two cents:
> you might have a look at TagSoup 
> (http://home.ccil.org/~cowan/XML/tagsoup/) or JTidy 
> (http://jtidy.sourceforge.net/) which I think is distributed 
> already as part of XWiki, those should help you when dealing 
> with non xml-valid HTML.
> 
> Pablo
> 
> 




More information about the users mailing list