On 06 Sep 2016, at 16:59, Caleb James DeLisle
<cjd(a)cjdns.fr> wrote:
Hello all,
Recently we discovered that in XWiki 8.2 the {{html}} macro cleaner now removes
<video>
tag whereas in 7.4 it did not and this has unfortunately caused problems for the
{{video}}
macro.
1. After some helpful investigation by a few XWiki developers, we have found that in
fact
the {{html}} macro is cleaning for XHTML and not for HTML5 which is what XWiki uses and
the
change has seemingly gone unnoticed since the upgrade to 8.0M1.
The HTML macro has always been cleaning to generate clean XHTML 1.0. This hasn’t changed.
We do need to make it generate HTML5 for skins that output HTML5 but this hasn’t been done
yet (see
http://jira.xwiki.org/browse/XCOMMONS-901).
What has happened (most likely) is that the HTML Cleaner library was updated from version
2.10 to 2.16 in 8.0M1 and somewhere in there the HTML Cleaner project must have fixed a
bug where previously they were not stripping the video tag when they should have (since
it’s not valid XHTML 1.0 probably).
2. After some conversation with developers, I have
observed that it is rare for an application
developer to *intend* to have their HTML cleaned. In the XWiki repositories we find that
amongst non-one-liner {{html}} macro invocations, 109 out of 292 (37%) of the
invocations
explicitly request clean=false [1].
Yes, developers should always use clean=false for 2 reasons:
* performance
* get an HTML validation error so that the error can be noticed and fixed in the HTML.
Note: I hope that our validation tests are able to notice all the HTML errors and that as
a consequence we won’t generate invalid X(HTML).
FTR I wasn’t initially in agreement but got convinced by Marius, see
https://github.com/xwiki-contrib/macro-video/commit/d986b8346a4c21d55472dd8…
I even mentioned introducing a new {{rawhtml}} macro or something like this
({{cleanhtml}}, {{validhtml}}, …) if we want to avoid having to write clean=false.
3. Nowhere is clean="true" specified and
though one might think it obvious as it is a default,
we find that wiki="false" (also default) is indeed specified 28 times [2].
That’s normal because you’re looking at development code and this code is supposed to
generate valid HTML by default so there’s no need to clean it :)
4. We see also that the HTML macro proves to be quite
slow. During an investigation of SOLR
performance Thomas found that "about 23% of the request is spend executing html
macros" [3].
I suppose it should be obvious that without the cleaner, the {{html}} macro should not
cause
any significant performance degradation.
Given the status of this feature, I would like to propose that we make an intentional
change
of behavior and change the default to clean=false. This way we application developers
will not
need to type it all of the time and XWiki will become measurably faster.
Here is my +1 for changing the default.
We should not forget that the HTML macro is also meant to be used by users of the wiki
(and not only by developers).
BTW we even said in the past that we should try to avoid using the {{html}} macro at all
in our code so that it can be disabled as a macro, see
http://jira.xwiki.org/browse/XRENDERING-27 and even
http://jira.xwiki.org/browse/XWIKI-7894
And most users don’t set the clean=false parameter.
So I guess the question is whether we want to protect an XWiki instance against to
generate invalid (X)HTML by default when the HTML macro is used by users, or whether we
don’t care.
TBH I don’t remember exactly why we introduced the HTML Cleaner but we probably had issues
at some point. Anyone remembers?
Thanks
-Vincent
Thanks,
Caleb
[1]:
$ grep -nr '{{html' | grep '.xml:' | grep
'clean=['\''"]*false' | grep -v '{{/html}}' | wc -l
109
$ grep -nr '{{html' | grep '.xml:' | grep -v
'clean=['\''"]*false' | grep -v '{{/html}}' | wc -l
183
[2]:
$ grep -nr '{{html' | grep '.xml:' | grep
'wiki=['\''"]*false' | wc -l
28
[3]:
https://jira.xwiki.org/browse/XWIKI-12043