On Feb 24, 2009, at 7:24 PM, Sergiu Dumitriu wrote:
Vincent Massol wrote:
On Feb 24, 2009, at 4:48 PM, Sergiu Dumitriu
wrote:
Asiri Rathnayake wrote:
Hi Vincent,
> But the story
>> is different for OO generated html which puts a paragraph element
>> when there
>> shouldn't be one.
> I don't agree since it's very valid to have <p> inside cells and
> not a
> OO problem.
It's very valid to have <p> elements inside table cells. But my
point is
this:
The original word document when viewed through _oo writer_ displays
content
within table cells with a particular size. But when saved as html
and viewed
from a browser, the same table cell becomes enlarged. And this is
because
there is a paragraph element inside each table cell element
generated by oo
html generator.
Now, since we wanted officeimporter to generate wiki content that
would
ultimately render an output which looks close to the original
document, i
decided to strip the paragraph element (to make it look smaller and
close to
the sizing of original document rendered in oo writer)
But if it's only a matter of convension (wiki is wiki, office is
office) and
the paragraph should be left alone I can make that chage easily.
WDYT?
I for one prefer removing the paragraph. For me, this is clearly
an OO
shortcoming. Vincent, the idea is not about paragraphs inside table
cells in general, but about this particular paragraph that obviously
shouldn't be there. The HTML generated by OO is just an
intermediary,
we're not interested in keeping it as much as possible in the
wiki, we
just want to extract the data from it and convert it to wiki syntax.
The
Office importer transforms office documents to wiki documents, and
not
HTML to wiki. OO wrongly puts paragraphs in there, and the fact that
the
same HTML looks much different in a browser than the document
looks in
OO is a good enough argument, IMO.
This is generic and not specific to OO. HTML allows puttings one or
several paragraphs in table cells, list item,etc so we need to handle
those, independently of OO.
If we handle it at the rendering module level then it fixes both OO
and direct HTML input.
No. We should not strip all the paragraphs that are found inside table
cells.
I've never said this! What I told Asiri is that the XHTML parser
should generate the following events:
beginCell + beginDocument + beginPara + onWord(sometext) + endPara +
endDocument + endCell.
Maybe the user wants those there.
I don't agree. We're making transformation and we're not leaving the
user content untouched. For example if the user enters "**hello" it'll
get converted to "**hello**". There are several cases where we're
transforming what the user enters.
Here I'm proposing that the XWiki Syntax Renderer transforms the
events above into:
| sometext
instead of:
| (((sometext)))
But we know for sure that the
_intermediary_ HTML generated by OO contains Ps where it shouldn't. It
is specific. In general we should respect the markup, but in this
specific case it is just a workaround for a third party bug. HTMLs
generated by office suites is messy in general. I for one really hate
the bulky sh1t that MS Word names HTML.
I still don't agree. See above.
Thanks
-Vincent