Re: [xwiki-devs] [xwiki-notifications] r16999 - platform/core/trunk/xwiki-officeimporter/src/test/java/org/xwiki/officeimporter/internal/cleaner

24 Feb 2009

On Feb 24, 2009, at 7:24 PM, Sergiu Dumitriu wrote:
...
  Vincent Massol wrote:
  On Feb 24, 2009, at 4:48 PM, Sergiu Dumitriu
wrote:
  Asiri Rathnayake wrote:
  Hi Vincent,
> But the story
>> is different for OO generated html which puts a paragraph element
>> when there
>> shouldn't be one.
> I don't agree since it's very valid to have <p> inside cells and
> not a
> OO problem.
 It's very valid to have <p> elements inside table cells. But my
 point is
 this:
 The original word document when viewed through _oo writer_ displays
 content
 within table cells with a particular size. But when saved as html
 and viewed
 from a browser, the same table cell becomes enlarged. And this is
 because
 there is a paragraph element inside each table cell element
 generated by oo
 html generator.
 Now, since we wanted officeimporter to generate wiki content that
 would
 ultimately render an output which looks close to the original
 document, i
 decided to strip the paragraph element (to make it look smaller and
 close to
 the sizing of original document rendered in oo writer)
 But if it's only a matter of convension (wiki is wiki, office is
 office) and
 the paragraph should be left alone I can make that chage easily.
 WDYT?
  I for one prefer removing the paragraph. For me, this is clearly
 an OO
 shortcoming. Vincent, the idea is not about paragraphs inside table
 cells in general, but about this particular paragraph that obviously
 shouldn't be there. The HTML generated by OO is just an
 intermediary,
 we're not interested in keeping it as much as possible in the
 wiki, we
 just want to extract the data from it and convert it to wiki syntax.
 The
 Office importer transforms office documents to wiki documents, and
 not
 HTML to wiki. OO wrongly puts paragraphs in there, and the fact that
 the
 same HTML looks much different in a browser than the document
 looks in
 OO is a good enough argument, IMO. 
 This is generic and not specific to OO. HTML allows puttings one or
 several paragraphs in table cells, list item,etc so we need to handle
 those, independently of OO.
 If we handle it at the rendering module level then it fixes both OO
 and direct HTML input. 
 No. We should not strip all the paragraphs that are found inside table
 cells. 
I've never said this! What I told Asiri is that the XHTML parser
should generate the following events:
beginCell + beginDocument + beginPara + onWord(sometext) + endPara +
endDocument + endCell.
...
  Maybe the user wants those there. 
I don't agree. We're making transformation and we're not leaving the
user content untouched. For example if the user enters "**hello" it'll
get converted to "**hello**". There are several cases where we're
transforming what the user enters.
Here I'm proposing that the XWiki Syntax Renderer transforms the
events above into:
| sometext
instead of:
| (((sometext)))
...
  But we know for sure that the
 _intermediary_ HTML generated by OO contains Ps where it shouldn't. It
 is specific. In general we should respect the markup, but in this
 specific case it is just a workaround for a third party bug. HTMLs
 generated by office suites is messy in general. I for one really hate
 the bulky sh1t that MS Word names HTML. 
I still don't agree. See above.
Thanks
-Vincent

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [xwiki-devs] [xwiki-notifications] r16999 - platform/core/trunk/xwiki-officeimporter/src/test/java/org/xwiki/officeimporter/internal/cleaner