On Feb 24, 2009, at 4:48 PM, Sergiu Dumitriu wrote:
Asiri Rathnayake wrote:
Hi Vincent,
But the story
> is different for OO generated html which puts a paragraph element
> when there
> shouldn't be one.
I don't agree since it's very valid to have <p> inside cells and
not a
OO problem.
It's very valid to have <p> elements inside table cells. But my
point is
this:
The original word document when viewed through _oo writer_ displays
content
within table cells with a particular size. But when saved as html
and viewed
from a browser, the same table cell becomes enlarged. And this is
because
there is a paragraph element inside each table cell element
generated by oo
html generator.
Now, since we wanted officeimporter to generate wiki content that
would
ultimately render an output which looks close to the original
document, i
decided to strip the paragraph element (to make it look smaller and
close to
the sizing of original document rendered in oo writer)
But if it's only a matter of convension (wiki is wiki, office is
office) and
the paragraph should be left alone I can make that chage easily.
WDYT?
I for one prefer removing the paragraph. For me, this is clearly an OO
shortcoming. Vincent, the idea is not about paragraphs inside table
cells in general, but about this particular paragraph that obviously
shouldn't be there. The HTML generated by OO is just an intermediary,
we're not interested in keeping it as much as possible in the wiki, we
just want to extract the data from it and convert it to wiki syntax.
The
Office importer transforms office documents to wiki documents, and not
HTML to wiki. OO wrongly puts paragraphs in there, and the fact that
the
same HTML looks much different in a browser than the document looks in
OO is a good enough argument, IMO.
This is generic and not specific to OO. HTML allows puttings one or
several paragraphs in table cells, list item,etc so we need to handle
those, independently of OO.
If we handle it at the rendering module level then it fixes both OO
and direct HTML input.
No. We should not strip all the paragraphs that are found inside table
cells. Maybe the user wants those there. But we know for sure that the
_intermediary_ HTML generated by OO contains Ps where it shouldn't. It
is specific. In general we should respect the markup, but in this
specific case it is just a workaround for a third party bug. HTMLs
generated by office suites is messy in general. I for one really hate
the bulky sh1t that MS Word names HTML.
--
Sergiu Dumitriu