Help me escape!

15 Jun 2006

As a member of an organization whose name has an ampersand in it, I am
pleased to say that I am discovering many of the ways that XML and HTML
can choke on it.  This is because as a developer, I thoroughly enjoy
finding bugs (in other people's code).
However, I am spending more time than I want (greater than zero) using
my admin privileges to clean up after users who shouldn't have to worry
about what kind of text they enter into their documents - in particular,
they shouldn't have to know that every time they type the company name,
it has to be AT&amp;T.
First the document headings in the RSS feeds caused readers to fail,
then the CSS validator refuses even to look at a document, and now the
Tomcat logfile is growing by dozens of megabytes per minute on a system
with ten or fewer active users.  All because somebody innocently entered
"AT&T" somewhere in a document.
I have found several methods for transforming text, such as
$xwiki.getURLEncoded(String) and $doc.getEscapedContent() (which
apparently hides the entire content of a document from Velocity, but not
from Radeox).  There is also the Javascript in some form documents that
makes sure that accented characters don't get into document names.
Nowhere, however, have I yet found a method that will generally escape
things in user-entered text that will break XML parsing.
Is there such a thing?  I note several regular expressions in some of
the config files for Radeox, etc; there ought - somewhere - to be a
general method for doing this, n'est-ce pas?
Brian M. Thomas - Senior Technical Architect
AT&T Services, Inc.
One SBC Center, Room 24D3
St. Louis, MO 63101
314 235 3141

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Help me escape!