Hi Asiri,
If you want to run your tests with different initial conditions you
should use a Test Suite instead and pass the cleaner in the
constructor of the Test Case class.
That said, I don't see why you'd want to run the same tests again.
Since the implementation is the same the result will be the same! What
you need to test are the extra things you've added to
WysiwygDefaultHTMLCleaner.
Thanks
-Vincent
On Jan 2, 2009, at 7:04 AM, asiri (SVN) wrote:
Author: asiri
Date: 2009-01-02 07:04:20 +0100 (Fri, 02 Jan 2009)
New Revision: 15010
Added:
sandbox/xwiki-officeimporter/src/test/java/org/xwiki/
officeimporter/html/cleaner/AbstractHTMLCleanerTestCase.java
sandbox/xwiki-officeimporter/src/test/java/org/xwiki/
officeimporter/html/cleaner/WysiwygDefaultHTMLCleanerTest.java
Modified:
sandbox/xwiki-officeimporter/pom.xml
sandbox/xwiki-officeimporter/src/main/java/org/xwiki/
officeimporter/html/filter/TableFilter.java
sandbox/xwiki-officeimporter/src/test/java/org/xwiki/
officeimporter/html/cleaner/OpenOfficeDefaultHTMLCleanerTest.java
Log:
* Added few more test cases.
Modified: sandbox/xwiki-officeimporter/pom.xml
===================================================================
--- sandbox/xwiki-officeimporter/pom.xml 2009-01-02 03:59:29 UTC
(rev 15009)
+++ sandbox/xwiki-officeimporter/pom.xml 2009-01-02 06:04:20 UTC
(rev 15010)
@@ -73,4 +73,17 @@
<scope>test</scope>
</dependency>
</dependencies>
+ <build>
+ <plugins>
+ <plugin>
+ <groupId>org.apache.maven.plugins</groupId>
+ <artifactId>maven-surefire-plugin</artifactId>
+ <configuration>
+ <excludes>
+ <exclude>**/Abstract*.java</exclude>
+ </excludes>
+ </configuration>
+ </plugin>
+ </plugins>
+ </build>
</project>
\ No newline at end of file
Modified: sandbox/xwiki-officeimporter/src/main/java/org/xwiki/
officeimporter/html/filter/TableFilter.java
===================================================================
--- sandbox/xwiki-officeimporter/src/main/java/org/xwiki/
officeimporter/html/filter/TableFilter.java 2009-01-02 03:59:29 UTC
(rev 15009)
+++ sandbox/xwiki-officeimporter/src/main/java/org/xwiki/
officeimporter/html/filter/TableFilter.java 2009-01-02 06:04:20 UTC
(rev 15010)
@@ -36,12 +36,12 @@
public class TableFilter implements HTMLFilter
{
/**
- * Tags that need to be removed from cell items.
+ * Tags that need to be removed from cell items while
preserving there children.
*/
private static final String[] filterTags = new String[] {"p"};
/**
- * Tags that need to be removed from cell items.
+ * Tags that need to be completely removed from cell items.
*/
private static final String[] removeTags = new String[] {"br"};
@@ -54,7 +54,7 @@
for (int i = 0; i < cellItems.getLength(); i++) {
Node cellItem = cellItems.item(i);
cleanNode(cellItem);
- // Workaround empty cells.
+ // Workaround empty cells.
if (cellItem.getTextContent().equals("")) {
boolean empty = true;
for (int j = 0; j <
cellItem.getChildNodes().getLength(); j++) {
@@ -81,7 +81,13 @@
NodeList children = node.getChildNodes();
boolean removed = false;
if (node.getNodeType() == Node.TEXT_NODE) {
- node.setTextContent(node.getTextContent().trim());
+ String trimmedContent = node.getTextContent().trim();
+ if (trimmedContent.equals("")) {
+ parent.removeChild(node);
+ removed = true;
+ } else {
+ node.setTextContent(trimmedContent);
+ }
} else if (Arrays.binarySearch(filterTags,
node.getNodeName()) >= 0) {
while (children.getLength() > 0) {
Node child = children.item(0);
@@ -95,7 +101,9 @@
removed = true;
} else {
for (int i = 0; i < children.getLength(); i++) {
- i = cleanNode(children.item(i)) ? i-- : i;
+ if (cleanNode(children.item(i))) {
+ --i;
+ }
}
}
return removed;
Added: sandbox/xwiki-officeimporter/src/test/java/org/xwiki/
officeimporter/html/cleaner/AbstractHTMLCleanerTestCase.java
===================================================================
--- sandbox/xwiki-officeimporter/src/test/java/org/xwiki/
officeimporter/html/cleaner/
AbstractHTMLCleanerTestCase.java (rev 0)
+++ sandbox/xwiki-officeimporter/src/test/java/org/xwiki/
officeimporter/html/cleaner/AbstractHTMLCleanerTestCase.java
2009-01-02 06:04:20 UTC (rev 15010)
@@ -0,0 +1,112 @@
+/*
+ * See the NOTICE file distributed with this work for additional
+ * information regarding copyright ownership.
+ *
+ * This is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as
+ * published by the Free Software Foundation; either version 2.1 of
+ * the License, or (at your option) any later version.
+ *
+ * This software is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this software; if not, write to the Free
+ * Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA
+ * 02110-1301 USA, or see the FSF site:
http://www.fsf.org.
+ */
+package org.xwiki.officeimporter.html.cleaner;
+
+import java.io.StringReader;
+
+import org.w3c.dom.Document;
+import org.w3c.dom.NodeList;
+import org.xwiki.xml.html.HTMLCleaner;
+
+import com.xpn.xwiki.test.AbstractXWikiComponentTestCase;
+
+/**
+ * Abstract test case class for all HTMLCleaner tests.
+ *
+ * @version $Id$
+ * @since 1.8M1
+ */
+public class AbstractHTMLCleanerTestCase extends
AbstractXWikiComponentTestCase
+{
+ /**
+ * Beginning of the test html document.
+ */
+ protected String header =
"<html><head><title>Title</title></
head><body>";
+
+ /**
+ * Beginning of the test html document, which has a {@code
<style> tag.}
+ */
+ protected String headerWithStyles =
+ "<html><head><style type=\"text/css\">h1
{color:red} p
{color:blue}
</style><title>Title</title></head><body>";
+
+ /**
+ * Ending of the test html document..
+ */
+ protected String footer = "</body></html>";
+
+ /**
+ * Test most basic cleaning.
+ */
+ public void testBasicCleaning(HTMLCleaner cleaner)
+ {
+ String html = header + footer;
+ Document doc = cleaner.clean(new StringReader(html));
+ assertNotNull(doc.getDoctype());
+ }
+
+ /**
+ * Test stripping of {@code <style>} and {@code <script>} tags.
+ */
+ public void testTagStripping(HTMLCleaner cleaner)
+ {
+ String html = headerWithStyles + footer;
+ Document doc = cleaner.clean(new StringReader(html));
+ NodeList nodes = doc.getElementsByTagName("style");
+ assertEquals(0, nodes.getLength());
+ html =
+ header + "<script type=\"text/javascript
\">document.write(\"Hello World!\")</script>"
+ + footer;
+ doc = cleaner.clean(new StringReader(html));
+ nodes = doc.getElementsByTagName("script");
+ assertEquals(0, nodes.getLength());
+ }
+
+ /**
+ * Test stripping of redundant tags.
+ */
+ public void testRedundancyFiltering(HTMLCleaner cleaner)
+ {
+ // <span> & <div> tags without attributes should be
stripped off.
+ String htmlTemplate = header + "<p>Test%sRedundant
%sFiltering<p>" + footer;
+ String[] attributeWiseFilteredTags = new String[] {"span",
"div"};
+ for (String tag : attributeWiseFilteredTags) {
+ String startTag = "<" + tag + ">";
+ String endTag = "</" + tag + ">";
+ String html = String.format(htmlTemplate, startTag,
endTag);
+ Document doc = cleaner.clean(new StringReader(html));
+ NodeList nodes = doc.getElementsByTagName(tag);
+ assertEquals(0, nodes.getLength());
+ }
+ // Tags that usually contain textual information like
<strong>, <code>, <em> etc. etc.
+ // should be filtered if they do not contain any textual
content.
+ htmlTemplate = header + "<p>Test%sRedundant%s%s
%sFiltering<p>" + footer;
+ String[] contentWiseFilteredTags =
+ new String[] {"em", "strong", "dfn",
"code", "samp",
"kbd", "var", "cite", "abbr",
+ "acronym", "address", "blockquote",
"q", "pre", "h1",
"h2", "h3", "h4", "h5", "h6"};
+ for(String tag: contentWiseFilteredTags) {
+ String startTag = "<" + tag + ">";
+ String endTag = "</" + tag + ">";
+ String html = String.format(htmlTemplate, startTag,
endTag, startTag, endTag);
+ Document doc = cleaner.clean(new StringReader(html));
+ NodeList nodes = doc.getElementsByTagName(tag);
+ assertEquals(1, nodes.getLength());
+ }
+ }
+}
Modified: sandbox/xwiki-officeimporter/src/test/java/org/xwiki/
officeimporter/html/cleaner/OpenOfficeDefaultHTMLCleanerTest.java
===================================================================
--- sandbox/xwiki-officeimporter/src/test/java/org/xwiki/
officeimporter/html/cleaner/OpenOfficeDefaultHTMLCleanerTest.java
2009-01-02 03:59:29 UTC (rev 15009)
+++ sandbox/xwiki-officeimporter/src/test/java/org/xwiki/
officeimporter/html/cleaner/OpenOfficeDefaultHTMLCleanerTest.java
2009-01-02 06:04:20 UTC (rev 15010)
@@ -22,48 +22,135 @@
import java.io.StringReader;
import org.w3c.dom.Document;
+import org.w3c.dom.Element;
+import org.w3c.dom.Node;
+import org.w3c.dom.NodeList;
import org.xwiki.xml.html.HTMLCleaner;
-import com.xpn.xwiki.test.AbstractXWikiComponentTestCase;
-
/**
* Test case for default open office html cleaner.
- *
+ *
* @version $Id$
* @since 1.8M1
*/
-public class OpenOfficeDefaultHTMLCleanerTest extends
AbstractXWikiComponentTestCase
+public class OpenOfficeDefaultHTMLCleanerTest extends
AbstractHTMLCleanerTestCase
{
/**
- * Beginning of the test html document.
- */
- private String header =
"<html><head><title>Title</title></
head><body>";
-
- /**
- * Ending of the test html document..
- */
- private String footer = "</body></html>";
-
- /**
* Open office html cleaner.
*/
private HTMLCleaner cleaner;
-
+
/**
* {@inheritDoc}
*/
protected void setUp() throws Exception
{
super.setUp();
- cleaner = (HTMLCleaner)
getComponentManager().lookup(HTMLCleaner.ROLE, "openoffice-default");
+ cleaner =
+ (HTMLCleaner)
getComponentManager().lookup(HTMLCleaner.ROLE, "openoffice-default");
}
-
+
/**
* Test most basic cleaning.
*/
- public void testBasicCleaning() {
- String html = header + footer;
+ public void testBasicCleaning()
+ {
+ super.testBasicCleaning(cleaner);
+ }
+
+ /**
+ * Test stripping of {@code <style>} and {@code <script>} tags.
+ */
+ public void testTagStripping()
+ {
+ super.testTagStripping(cleaner);
+ }
+
+ /**
+ * Test stripping of redundant tags.
+ */
+ public void testRedundancyFiltering()
+ {
+ super.testRedundancyFiltering(cleaner);
+ }
+
+ /**
+ * Test filtering of html links.
+ */
+ public void testLinkFiltering()
+ {
+ String html = header + "<a
href=\"http://www.xwiki.org
\">xwiki</a>" + footer;
Document doc = cleaner.clean(new StringReader(html));
- assertNotNull(doc.getDoctype());
+ NodeList nodes = doc.getElementsByTagName("a");
+ assertEquals(1, nodes.getLength());
+ Node link = nodes.item(0);
+ Element span = (Element) link.getParentNode();
+ assertEquals("span", span.getNodeName());
+ assertEquals("wikiexternallink",
span.getAttribute("class"));
+ Node startComment = span.getPreviousSibling();
+
assertTrue(startComment.getNodeValue().startsWith("startwikilink"));
+ Node stopComment = span.getNextSibling();
+
assertTrue(stopComment.getNodeValue().startsWith("stopwikilink"));
}
+
+ /**
+ * Test filtering of html lists.
+ */
+ public void testListFiltering()
+ {
+ // Leading spaces inside list items are not allowed.
+ String html = header + "<ol><li>
Test</li></ol>" + footer;
+ Document doc = cleaner.clean(new StringReader(html));
+ NodeList nodes = doc.getElementsByTagName("li");
+ Node listContent = nodes.item(0).getFirstChild();
+ assertEquals(Node.TEXT_NODE, listContent.getNodeType());
+ assertEquals("Test", listContent.getNodeValue());
+ // Paragraphs inside list items are not allowed.
+ html = header +
"<ol><li><p>Test</p></li></ol>" + footer;
+ doc = cleaner.clean(new StringReader(html));
+ listContent = nodes.item(0).getFirstChild();
+ assertEquals(Node.TEXT_NODE, listContent.getNodeType());
+ assertEquals("Test", listContent.getNodeValue());
+ // Leading space plus a starting paragraph.
+ html = header + "<ol><li>
<p>Test</p></li></ol>" + footer;
+ doc = cleaner.clean(new StringReader(html));
+ listContent = nodes.item(0).getFirstChild();
+ assertEquals(Node.TEXT_NODE, listContent.getNodeType());
+ assertEquals("Test", listContent.getNodeValue());
+ }
+
+ /**
+ * Test filtering of html tables.
+ */
+ public void testTableFiltering()
+ {
+ // Leading or trailing spaces inside cell items are not
allowed.
+ String html = header + "<table><tr><td> Test
</td></tr></
table>" + footer;
+ Document doc = cleaner.clean(new StringReader(html));
+ NodeList nodes = doc.getElementsByTagName("td");
+ Node cellContent = nodes.item(0).getFirstChild();
+ assertEquals(Node.TEXT_NODE, cellContent.getNodeType());
+ assertEquals("Test", cellContent.getNodeValue());
+ // Paragraphs are not allowed inside cell items.
+ html = header + "<table><tr><td> <p>Test</p>
</td></tr></
table>" + footer;
+ doc = cleaner.clean(new StringReader(html));
+ nodes = doc.getElementsByTagName("td");
+ cellContent = nodes.item(0).getFirstChild();
+ assertEquals(Node.TEXT_NODE, cellContent.getNodeType());
+ assertEquals("Test", cellContent.getNodeValue());
+ // Line breaks are not allowed inside cell items.
+ html = header +
"<table><tr><td><br/><p><br/>Test</p> </
td></tr></table>" + footer;
+ doc = cleaner.clean(new StringReader(html));
+ nodes = doc.getElementsByTagName("td");
+ cellContent = nodes.item(0).getFirstChild();
+ assertEquals(Node.TEXT_NODE, cellContent.getNodeType());
+ assertEquals("Test", cellContent.getNodeValue());
+ // Empty cells should be replaced by a '-' (this is
temporary)
+ html = header +
"<table><tr><td><br/><p><br/></p>
</td></
tr></table>" + footer;
+ doc = cleaner.clean(new StringReader(html));
+ nodes = doc.getElementsByTagName("td");
+ cellContent = nodes.item(0).getFirstChild();
+ assertEquals(Node.TEXT_NODE, cellContent.getNodeType());
+ assertEquals("-", cellContent.getNodeValue());
+ }
}
Added: sandbox/xwiki-officeimporter/src/test/java/org/xwiki/
officeimporter/html/cleaner/WysiwygDefaultHTMLCleanerTest.java
===================================================================
--- sandbox/xwiki-officeimporter/src/test/java/org/xwiki/
officeimporter/html/cleaner/
WysiwygDefaultHTMLCleanerTest.java (rev 0)
+++ sandbox/xwiki-officeimporter/src/test/java/org/xwiki/
officeimporter/html/cleaner/WysiwygDefaultHTMLCleanerTest.java
2009-01-02 06:04:20 UTC (rev 15010)
@@ -0,0 +1,71 @@
+/*
+ * See the NOTICE file distributed with this work for additional
+ * information regarding copyright ownership.
+ *
+ * This is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as
+ * published by the Free Software Foundation; either version 2.1 of
+ * the License, or (at your option) any later version.
+ *
+ * This software is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this software; if not, write to the Free
+ * Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA
+ * 02110-1301 USA, or see the FSF site:
http://www.fsf.org.
+ */
+package org.xwiki.officeimporter.html.cleaner;
+
+import org.xwiki.xml.html.HTMLCleaner;
+
+/**
+ * Test case for default wysiwyg html cleaner.
+ *
+ * @version $Id$
+ * @since 1.8M1
+ */
+public class WysiwygDefaultHTMLCleanerTest extends
AbstractHTMLCleanerTestCase
+{
+
+ /**
+ * Wysiwyg html cleaner.
+ */
+ private HTMLCleaner cleaner;
+
+ /**
+ * {@inheritDoc}
+ */
+ protected void setUp() throws Exception
+ {
+ super.setUp();
+ cleaner =
+ (HTMLCleaner)
getComponentManager().lookup(HTMLCleaner.ROLE, "wysiwyg-default");
+ }
+
+ /**
+ * Test most basic cleaning.
+ */
+ public void testBasicCleaning()
+ {
+ super.testBasicCleaning(cleaner);
+ }
+
+ /**
+ * Test stripping of {@code <style>} and {@code <script>} tags.
+ */
+ public void testTagStripping()
+ {
+ super.testTagStripping(cleaner);
+ }
+
+ /**
+ * Test stripping of redundant tags.
+ */
+ public void testRedundancyFiltering()
+ {
+ super.testRedundancyFiltering(cleaner);
+ }
+}