net.sf.saxon.dotnet
Class DotNetRegexTranslator
java.lang.Object
net.sf.saxon.regex.RegexTranslator
net.sf.saxon.regex.SurrogateRegexTranslator
net.sf.saxon.dotnet.DotNetRegexTranslator
public class DotNetRegexTranslator
- extends SurrogateRegexTranslator
This class translates XML Schema regex syntax into .NET regex syntax.
Author: James Clark, Thai Open Source Software Center Ltd. See statement at end of file.
Modified by Michael Kay (a) to integrate the code into Saxon, (b) to support XPath additions
to the XML Schema regex syntax, (c) to target the .NET regex syntax instead of JDK 1.4
This version of the regular expression translator treats each half of a surrogate pair as a separate
character, translating anything in an XPath regex that can match a non-BMP character into a Java
regex that matches the two halves of a surrogate pair independently. This approach doesn't work
under JDK 1.5, whose regex engine treats a surrogate pair as a single character.
This translator is currently used for Saxon on .NET 1.1. It's almost the same as the JDK 1.4 version,
except that it avoids use of the "&&" operator, which isn't available on .NET.
Fields inherited from class net.sf.saxon.regex.RegexTranslator |
ALL, captures, caseBlind, curChar, currentCapture, eos, ignoreWhitespace, inCharClassExpr, isXPath, length, NONE, NOT_ALLOWED_CLASS, pos, regExp, result, SOME, SURROGATES1_CLASS, SURROGATES2_CLASS, xmlVersion |
Method Summary |
int |
getNumberOfCapturedGroups()
Get the number of captured groups for this regular expression |
static void |
main(java.lang.String[] args)
Convenience main method for testing purposes. |
java.lang.String |
translate(java.lang.CharSequence regExp,
int xmlVersion,
boolean xpath,
boolean ignoreWhitespace,
boolean caseBlind)
Translates a regular expression in the syntax of XML Schemas Part 2 into a regular
expression in the syntax of java.util.regex.Pattern . |
protected boolean |
translateAtom()
|
Methods inherited from class net.sf.saxon.regex.RegexTranslator |
absorbSurrogatePair, advance, copyCurChar, expect, highSurrogateRanges, isAsciiAlnum, isBlock, isJavaMetaChar, lowSurrogateRanges, makeException, makeException, parseQuantExact, recede, sortRangeList, translateBranch, translateQuantifier, translateQuantity, translateRegExp, translateTop |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
DotNetRegexTranslator
public DotNetRegexTranslator()
- Create a regular expression translator for the .NET platform
translate
public java.lang.String translate(java.lang.CharSequence regExp,
int xmlVersion,
boolean xpath,
boolean ignoreWhitespace,
boolean caseBlind)
throws RegexSyntaxException
- Translates a regular expression in the syntax of XML Schemas Part 2 into a regular
expression in the syntax of
java.util.regex.Pattern
. The translation
assumes that the string to be matched against the regex uses surrogate pairs correctly.
If the string comes from XML content, a conforming XML parser will automatically
check this; if the string comes from elsewhere, it may be necessary to check
surrogate usage before matching.
- Parameters:
regExp
- a String containing a regular expression in the syntax of XML Schemas Part 2xmlVersion
- the version of XML in use - this affects the meanings of the \i and \c character
class escapesxpath
- a boolean indicating whether the XPath 2.0 F+O extensions to the schema
regex syntax are permittedignoreWhitespace
- true if the x flag is set, allowing ignorable whitespace in the regexcaseBlind
- true if the i flag is set, allowing case blind comparisons
- Returns:
- a String containing a regular expression in the syntax of java.util.regex.Pattern
- Throws:
RegexSyntaxException
- if regexp
is not a regular expression in the
syntax of XML Schemas Part 2, or XPath 2.0, as appropriate- See Also:
Pattern
,
XML Schema Part 2
getNumberOfCapturedGroups
public int getNumberOfCapturedGroups()
- Get the number of captured groups for this regular expression
- Returns:
- the number of captured groups
translateAtom
protected boolean translateAtom()
throws RegexSyntaxException
- Specified by:
translateAtom
in class RegexTranslator
- Throws:
RegexSyntaxException
main
public static void main(java.lang.String[] args)
throws RegexSyntaxException
- Convenience main method for testing purposes. Note that the actual testing is done using the
Java regex engine.
- Parameters:
args:
- (1) the regex, (2) xpath|schema, (3) target string to be matched
- Throws:
RegexSyntaxException