Hi Asiri,
On Feb 23, 2009, at 1:37 PM, asiri (SVN) wrote:
Author: asiri
Date: 2009-02-23 13:37:50 +0100 (Mon, 23 Feb 2009)
New Revision: 16999
Modified:
platform/core/trunk/xwiki-officeimporter/src/test/java/org/xwiki/
officeimporter/internal/cleaner/OpenOfficeHTMLCleanerTest.java
Log:
XWIKI-3259: Table headers are not handled properly
* Added a unit test.
[snip]
/**
+ * Test proper cleaning of {@code <th>} elements.
+ */
+ public void testTableHeaderItemCleaning()
+ {
+ // Isolated paragraph elements inside 'th' elements should
be removed.
+ String html =
+ header +
"<table><thead><tr><th><p>Test</p></th></tr></
thead><tbody><tr><td/></tr></tbody></table>"
+ + footer;
+ Document doc = cleaner.clean(new StringReader(html));
+ NodeList nodes = doc.getElementsByTagName("th");
+ Node hearderItemContent = nodes.item(0).getFirstChild();
+ assertEquals(Node.TEXT_NODE,
hearderItemContent.getNodeType());
+ assertEquals("Test", hearderItemContent.getNodeValue());
Why is this only for th and not for td cells too?
Is this specific to the office importer? It looks very generic to me,
isn't it?
Why do paragraphs need to be removed?
What if there are 2 paragraphs elements? what happens? Do you have a
test for that too?
Thanks
-Vincent