Hi Asiri,
On Feb 27, 2009, at 12:32 PM, asiri (SVN) wrote:
Author: asiri
Date: 2009-02-27 12:32:21 +0100 (Fri, 27 Feb 2009)
New Revision: 17078
Added:
platform/core/trunk/xwiki-officeimporter/src/test/java/org/xwiki/
officeimporter/internal/cleaner/AbstractHTMLCleaningTest.java
platform/core/trunk/xwiki-officeimporter/src/test/java/org/xwiki/
officeimporter/internal/cleaner/
EmptyLineParagraphOpenOfficeCleaningTest.java
platform/core/trunk/xwiki-officeimporter/src/test/java/org/xwiki/
officeimporter/internal/cleaner/ImageOpenOfficeCleaningTest.java
platform/core/trunk/xwiki-officeimporter/src/test/java/org/xwiki/
officeimporter/internal/cleaner/InvalidTagOpenOfficeCleaningTest.java
platform/core/trunk/xwiki-officeimporter/src/test/java/org/xwiki/
officeimporter/internal/cleaner/LineBreakOpenOfficeCleaningTest.java
platform/core/trunk/xwiki-officeimporter/src/test/java/org/xwiki/
officeimporter/internal/cleaner/LinkOpenOfficeCleaningTest.java
platform/core/trunk/xwiki-officeimporter/src/test/java/org/xwiki/
officeimporter/internal/cleaner/ListOpenOfficeCleaningTest.java
platform/core/trunk/xwiki-officeimporter/src/test/java/org/xwiki/
officeimporter/internal/cleaner/MiscWysiwygCleaningTest.java
platform/core/trunk/xwiki-officeimporter/src/test/java/org/xwiki/
officeimporter/internal/cleaner/
RedundantTagOpenOfficeCleaningTest.java
platform/core/trunk/xwiki-officeimporter/src/test/java/org/xwiki/
officeimporter/internal/cleaner/TableOpenOfficeCleaningTest.java
Removed:
platform/core/trunk/xwiki-officeimporter/src/test/java/org/xwiki/
officeimporter/internal/cleaner/AbstractHTMLCleanerTest.java
platform/core/trunk/xwiki-officeimporter/src/test/java/org/xwiki/
officeimporter/internal/cleaner/OpenOfficeHTMLCleanerTest.java
platform/core/trunk/xwiki-officeimporter/src/test/java/org/xwiki/
officeimporter/internal/cleaner/WysiwygHTMLCleanerTest.java
Modified:
platform/core/trunk/xwiki-officeimporter/src/main/java/org/xwiki/
officeimporter/filter/LineBreakFilter.java
Log:
XWIKI-3265: Restructure officeimporter test cases + write more tests
* Completed.
[snip]
+public class InvalidTagOpenOfficeCleaningTest extends
AbstractHTMLCleaningTest
+{
+ /**
+ * {@code <style>} tags should be stripped from html content.
+ */
+ public void testStyleTagRemoving()
+ {
+ String html =
+ "<html><head><title>Title</title>" +
"<style type=
\"text/css\">h1 {color:red} p {color:blue} </style>"
+ + "</head><body>" + footer;
+ Document doc = openOfficeHTMLCleaner.clean(new
StringReader(html));
+ NodeList nodes = doc.getElementsByTagName("style");
+ assertEquals(0, nodes.getLength());
+ }
+
+ /**
+ * {@code <style>} tags should be stripped from html content.
copy paste, should be <script>.
+ */
+ public void testScriptTagRemoving()
+ {
+ String html = header + "<script type=\"text/javascript
\">document.write(\"Hello World!\")</script>" + footer;
+ Document doc = openOfficeHTMLCleaner.clean(new
StringReader(html));
+ NodeList nodes = doc.getElementsByTagName("script");
+ assertEquals(0, nodes.getLength());
+ }
+}
[snip]
+ /**
+ * {@code <br/>} elements placed next to paragraph elements
should be converted to {@code<div
+ * class="wikikmodel-emptyline"/>} elements.
+ */
+ public void testLineBreaksNextToParagraphElements()
+ {
+
checkLineBreakReplacements("<br/><br/><p>para</p>", 0,
2);
+
checkLineBreakReplacements("<p>para</p><br/><br/>", 0,
2);
+
checkLineBreakReplacements("<p>para</p><br/><br/><p>para</
p>", 0, 2);
+ }
Shouldn't this be done by the default HTML Cleaner?
Same for the other tests in this category.
+ /**
+ * The html generated by open office server includes anchors of
the form {@code<a name="table1"><h1>Sheet 2:
+ * <em>Hello</em></h1></a>} and the default html cleaner
converts them to {@code <a name="table1"/><h1><a
+ * name="table1">Sheet 1:
<em>Hello</em></a></h1>} this is
because of the close-before-copy-inside
+ * behaviour of default html cleaner. Thus the additional (copy-
inside) anchor needs to be ripped off.
This looks like a bug in the default HTML cleaner no?
+ /**
+ * If there are leading spaces within the content of a list
item ({@code<li/>}) they should be trimmed.
+ */
+ public void testListItemContentLeadingSpaceTrimming()
+ {
+ String html = header + "<ol><li>
Test</li></ol>" + footer;
+ Document doc = openOfficeHTMLCleaner.clean(new
StringReader(html));
+ NodeList nodes = doc.getElementsByTagName("li");
+ Node listContent = nodes.item(0).getFirstChild();
+ assertEquals(Node.TEXT_NODE, listContent.getNodeType());
+ assertEquals("Test", listContent.getNodeValue());
+ }
Shouldn't this be done in the default HTML cleaner? Actually I think
this is already done in the XHTML parser by the whitespace XML filter.
If not then it's a bug of the whitespace filter.
For all bugs please refer to the jira issue in the javadoc and explain
that the code will be removed once the bug is fixed.
+
+ /**
+ * If there is a leading paragraph inside a list item, it
should be replaced with it's content.
+ */
+ public void testListItemContentIsolatedParagraphCleaning()
+ {
+ String html = header +
"<ol><li><p>Test</p></li></ol>" +
footer;
+ Document doc = openOfficeHTMLCleaner.clean(new
StringReader(html));
+ NodeList nodes = doc.getElementsByTagName("li");
+ Node listContent = nodes.item(0).getFirstChild();
+ assertEquals(Node.TEXT_NODE, listContent.getNodeType());
+ assertEquals("Test", listContent.getNodeValue());
+ }
+}
This should be handled by a combination of both XHTML parser and Wiki
Syntax Renderer and/or by the default HTML cleaner.
+ /**
+ * Test cleaning of html paragraphs brearing namespaces.
+ */
+ public void testParagraphsWithNamespaces()
+ {
+ String html = header + "<w:p>paragraph</w:p>" + footer;
+ Document doc =
+ wysiwygHTMLCleaner.clean(new StringReader(html),
Collections.singletonMap(HTMLCleaner.NAMESPACES_AWARE,
+ "false"));
+ NodeList nodes = doc.getElementsByTagName("p");
+ assertEquals(1, nodes.getLength());
+ }
hmmm... I think this needs to be reviewed and we need to check if the
wikimodel XHTML parser supports namespaces.
+
+ /**
+ * The source of the images in copy pasted html content should
be replaces with 'Missing.png' since they can't be
+ * uploaded automatically.
+ */
+ public void testImageFiltering()
+ {
+ String html = header + "<img src=\"file://path/to/local/image.png
\"/>" + footer;
+ Document doc = wysiwygHTMLCleaner.clean(new
StringReader(html));
+ NodeList nodes = doc.getElementsByTagName("img");
+ assertEquals(1, nodes.getLength());
+ Element image = (Element) nodes.item(0);
+ Node startComment = image.getPreviousSibling();
+ Node stopComment = image.getNextSibling();
+ assertEquals(Node.COMMENT_NODE, startComment.getNodeType());
+
assertTrue
(startComment.getNodeValue().equals("startimage:Missing.png"));
It should be lowercase "missing.png". So this means a missing.png
image need to be present in all skins?
Has this been discussed and is everyone aware of this?
+ /**
+ * Test filtering of those tags which doesn't have any
attributes set.
+ */
+ public void testFilterIfZeroAttributes()
+ {
+ String htmlTemplate = header + "<p>Test%sRedundant
%sFiltering</p>" + footer;
+ String[] filterIfZeroAttributesTags = new String[] {"span",
"div"};
+ for (String tag : filterIfZeroAttributesTags) {
+ String startTag = "<" + tag + ">";
+ String endTag = "</" + tag + ">";
+ String html = String.format(htmlTemplate, startTag,
endTag);
+ Document doc = openOfficeHTMLCleaner.clean(new
StringReader(html));
+ NodeList nodes = doc.getElementsByTagName(tag);
+ assertEquals(0, nodes.getLength());
+ }
+ }
Shouldn't this be done in the default HTML cleaner?
+
+ /**
+ * Test filtering of those tags which doesn't have any textual
content in them.
+ */
+ public void testFilterIfNoContent()
+ {
+ String htmlTemplate = header + "<p>Test%sRedundant%s%s
%sFiltering</p>" + footer;
+ String[] filterIfNoContentTags =
+ new String[] {"em", "strong", "dfn",
"code", "samp",
"kbd", "var", "cite", "abbr",
"acronym", "address",
+ "blockquote", "q", "pre", "h1",
"h2", "h3", "h4", "h5",
"h6"};
+ for (String tag : filterIfNoContentTags) {
+ String startTag = "<" + tag + ">";
+ String endTag = "</" + tag + ">";
+ String html = String.format(htmlTemplate, startTag,
endTag, startTag, endTag);
+ Document doc = openOfficeHTMLCleaner.clean(new
StringReader(html));
+ NodeList nodes = doc.getElementsByTagName(tag);
+ assertEquals(1, nodes.getLength());
+ }
+ }
+}
Shouldn't this be done in the default HTML cleaner?
+ /**
+ * An isolated paragraph inside a table cell item should be
replaced with paragraph's content.
+ */
+ public void testTableCellItemIsolatedParagraphCleaning()
+ {
+ String html = header +
"<table><tr><td><p>Test</p></td></
tr></table>" + footer;
+ Document doc = openOfficeHTMLCleaner.clean(new
StringReader(html));
+ NodeList nodes = doc.getElementsByTagName("td");
+ Node cellContent = nodes.item(0).getFirstChild();
+ assertEquals(Node.TEXT_NODE, cellContent.getNodeType());
+ assertEquals("Test", cellContent.getNodeValue());
+ }
Isn't this already tested above?
In any case shouldn't this be moved out of the importer?
Same for other tests in the same category.
+ /**
+ * If multiple paragraphs are found inside a table cell item,
they should be wrapped in an embedded document.
+ */
+ public void testTableCellItemMultipleParagraphWrapping()
+ {
+ assertEquals(true,
checkEmbeddedDocumentGeneration("<table><tr><td><p>Test</p><p>Test</
p></td></tr></table>",
+ "td"));
+ }
This looks like a bug in the XHTML parser.
Same for other tests in the same category.
+
+ /**
+ * Empty rows should be removed.
+ */
+ public void testEmptyRowRemoving()
+ {
+ String html = header +
"<table><tbody><tr><td>cell</td></
tr><tr></tr></tbody></table>" + footer;
+ Document doc = openOfficeHTMLCleaner.clean(new
StringReader(html));
+ NodeList nodes = doc.getElementsByTagName("tr");
+ assertEquals(1, nodes.getLength());
+ }
Shouldn't this be done in the default HTML cleaner?
Thanks
-Vincent
http://xwiki.com
http://xwiki.org
http://massol.net