andrewjwelch.com
LexEv XMLReader
...doesn't ignore Lexical Events :)

LexEv Parser is free for non-commercial use and is supplied under the Creative Commons license http://creativecommons.org/licenses/by-nc/3.0/

Email lexev@andrewjwelch.com for customisations and a commercial license.

Download

- process CDATA sections as markup

- preserve entity references in the output

- preserve character references in the output

- preserve the DOCTYPE in the output

Lexical Event wraps the standard XMLReader to convert lexical events into markup so that they can be processed. Typical uses are:
  • Converting cdata sections into markup:
    <![CDATA[ <p> a para <p> ]]>
    to:
    <lexev:cdata> <p> a para </p> </lexev:cdata>
  • Preserving entity references:
    hello&mdash;world
    is converted to:
    hello<lexev:entity name="mdash">—</lexev:entity>world
  • Preserving character references:
    hello&#160;world
    is converted to:
    hello<lexev:entity name="#160"> </lexev:entity>world
  • Preserving the doctype declaration:
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    is converted to processing instructions:
    <?doctype-public -//W3C//DTD XHTML 1.0 Transitional//EN?>
    <?doctype-system http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd?>
  • Marking up comments:
    <!-- a comment -->
    is converted to:
    <lexev:comment> a comment </lexev:comment>

Instructions

To use LexEv with Saxon:
java -cp saxon9.jar;LexEv.jar net.sf.saxon.Transform -x:com.andrewjwelch.lexev.LexEv input.xml stylesheet.xslt
Make sure LexEv.jar is on the classpath, and then tell Saxon to use it with the -x switch (copy and paste this line -x:com.andrewjwelch.lexev.LexEv)
To use LexEv in an XSLT transform from Java:
SAXTransformerFactory stf = (SAXTransformerFactory)TransformerFactory.newInstance();
TransformerHandler handler = stf.newTransformerHandler(stylesheet);

Result result = new StreamResult(System.out);
handler.setResult(result);	

XMLReader xmlReader = new LexEv();
xmlReader.setContentHandler(handler);
xmlReader.parse(xml);
Instead of the more typical Transformer.transform() way of running a tranform, you need to use a TransformerHandler so you can specify the XMLReader to be used. The transform is started when you call parse() on the XMLReader, which sends SAX events to the handler. This techique is the one to use when you want to slot in filters between the XMLReader and the transform, so it's a reasonably good idea to use this techique even if you're not using LexEv.

Options

You can control the following features of LexEv:
  • enable/disable the marking up of entity references
  • enable/disable the marking up of character references
  • enable/disable the marking up of CDATA sections
  • set the default namespace for the CDATA section markup
  • enable/disable the reporting of the DOCTYPE
  • enable/disable the marking up of comments
You can set these through the API (if you are including LexEv in an application), or from the command line using the following system properties:
  • com.andrewjwelch.lexev.inline-entities
  • com.andrewjwelch.lexev.character-references
  • com.andrewjwelch.lexev.cdata
  • com.andrewjwelch.doctype.cdataNamespace
  • com.andrewjwelch.lexev.doctype
  • com.andrewjwelch.lexev.comments
For example to set a system property from the command line you use: -Dcom.andrewjwelch.lexev.comments=false
All of these are enable by default.

XSLT Samples

To output an entity or character reference:
<xsl:template match="lexev:entity | lexev:char-ref">
  <xsl:value-of disable-output-escaping="yes" select="concat('&amp;', @name, ';')"/>				
</xsl:template>
To process a CDATA section as markup:
<xsl:template match="lexev:cdata">
  <xsl:apply-templates/>				
</xsl:template>
To output a DOCTYPE from the processing instructions:
In XSLT 1.0 the doctype-public and doctype-system attributes on xsl:output are static and need to be known at compile time, which means I'm afraid you have to do this:
<xsl:template match="/">
	<xsl:value-of disable-output-escaping="yes"
		select="concat('&lt;!DOCTYPE ', name(/*), '&#xa;  PUBLIC &quot;', 
			processing-instruction('doctype-public'), '&quot; &quot;',
			processing-instruction('doctype-system'), '&quot;&gt;')"/>
	<xsl:apply-templates/>
</xsl:template>
In XSLT 2.0 you can use xsl:result-document where the doctype-public and doctype-system are AVTs which mean their values can be determined at runtime:
<xsl:template match="/">
	<xsl:result-document
		doctype-public="{processing-instruction('doctype-public')}"
		doctype-system="{processing-instruction('doctype-system')}">
		<xsl:apply-templates/>
	</xsl:result-document>
</xsl:template>

Valid XHTML 1.0 Transitional

For support or suggestions, email lexev@andrewjwelch.com