Lexical Event wraps the standard XMLReader to convert lexical events into markup so that they can be processed. Typical uses are:
- Converting cdata sections into markup:
<![CDATA[ <p> a para <p> ]]>to:<lexev:cdata> <p> a para </p> </lexev:cdata> - Preserving entity references:
hello—worldis converted to:hello<lexev:entity name="mdash">—</lexev:entity>world - Preserving character references:
hello worldis converted to:hello<lexev:entity name="#160"> </lexev:entity>world - Preserving the doctype declaration:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">is converted to processing instructions:<?doctype-public -//W3C//DTD XHTML 1.0 Transitional//EN?>
<?doctype-system http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd?> - Marking up comments:
<!-- a comment -->is converted to:<lexev:comment> a comment </lexev:comment>
Instructions
To use LexEv with Saxon:
java -cp saxon9.jar;LexEv.jar net.sf.saxon.Transform -x:com.andrewjwelch.lexev.LexEv input.xml
stylesheet.xslt
Make sure
LexEv.jar is on the classpath, and then tell Saxon to use it with the -x switch (copy and paste this line -x:com.andrewjwelch.lexev.LexEv)To use LexEv in an XSLT transform from Java:
SAXTransformerFactory stf = (SAXTransformerFactory)TransformerFactory.newInstance(); TransformerHandler handler = stf.newTransformerHandler(stylesheet); Result result = new StreamResult(System.out); handler.setResult(result); XMLReader xmlReader = new LexEv(); xmlReader.setContentHandler(handler); xmlReader.parse(xml);
Instead of the more typical
Transformer.transform() way of running a tranform, you need to use a TransformerHandler so you can specify the XMLReader to be used. The transform is started when you call parse() on the XMLReader, which sends SAX events to the handler. This techique is the one to use when you want to slot in filters between the XMLReader and the transform, so it's a reasonably good idea to use this techique even if you're not using LexEv.
Options
You can control the following features of LexEv:
- enable/disable the marking up of entity references
- enable/disable the marking up of character references
- enable/disable the marking up of CDATA sections
- set the default namespace for the CDATA section markup
- enable/disable the reporting of the DOCTYPE
- enable/disable the marking up of comments
You can set these through the API (if you are including LexEv in an application), or from the command line using the following system properties:
com.andrewjwelch.lexev.inline-entitiescom.andrewjwelch.lexev.character-referencescom.andrewjwelch.lexev.cdatacom.andrewjwelch.doctype.cdataNamespacecom.andrewjwelch.lexev.doctypecom.andrewjwelch.lexev.comments
For example to set a system property from the command line you use:
-Dcom.andrewjwelch.lexev.comments=falseAll of these are enable by default.
XSLT Samples
To output an entity or character reference:
<xsl:template match="lexev:entity | lexev:char-ref">
<xsl:value-of disable-output-escaping="yes" select="concat('&', @name, ';')"/>
</xsl:template>
To process a CDATA section as markup:
<xsl:template match="lexev:cdata"> <xsl:apply-templates/> </xsl:template>
To output a DOCTYPE from the processing instructions:
In XSLT 1.0 the doctype-public and doctype-system attributes on
xsl:output are static and need to be known at compile time, which means I'm afraid you have to do this:
<xsl:template match="/">
<xsl:value-of disable-output-escaping="yes"
select="concat('<!DOCTYPE ', name(/*), '
 PUBLIC "',
processing-instruction('doctype-public'), '" "',
processing-instruction('doctype-system'), '">')"/>
<xsl:apply-templates/>
</xsl:template>
In XSLT 2.0 you can use
xsl:result-document where the doctype-public and doctype-system are AVTs which mean their values can be determined at runtime:
<xsl:template match="/">
<xsl:result-document
doctype-public="{processing-instruction('doctype-public')}"
doctype-system="{processing-instruction('doctype-system')}">
<xsl:apply-templates/>
</xsl:result-document>
</xsl:template>
For support or suggestions, email lexev@andrewjwelch.com