|
|
 |
|
|
 |
minixsv:
A Lightweight XML schema validator
New
release 0.9.0 available!
minixsv is a lightweight XML schema validator package written in pure
Python
(at least Python 2.4 is required).
It is based on genxmlif, a generic XML
interface package,
which currently supports the standard python DOM implementations minidom,
4DOM
and Fredrik Lundh's elementtree
module
(configurable by parameter "xmlIfClass" which can be XMLIF_MINIDOM,
XMLIF_4DOM or
XMLIF_ELEMENTTREE).
Other DOM implementations can be adapted by implementing a new
derived XML interface class.
minixsv provides a programming interface (API) which is explained below.
minixsv has been developed for using XML schema in code generators
but can also be used for any other application.
New features of release 0.9.0:
- check of facets of derived primitive types added
- unicode support added (except wide unicode characters)
- major improvements for pattern matching (but there are still some restrictions, refer below)
- limited support of XInclude added (no support of fallback tag)
- performance optimizations (caching option introduced)
- several bugs fixed
Release 0.9.0 has been tested against the W3C
XML Schema Test Suite (new testsuite from 2006-11-06).
Results:
NIST tests: 3943 of 3953 testgroups
passed
Microsoft tests: 8645 of 9745 testgroups passed
SUN tests: 559 of 679
testgroups passed
Most
testgroups which haven't been passed correspond to the limitations
listed
below!
Constructor of the API class pyxsval.XsValidator:
def __init__(self,
xmlIfClass=XMLIF_MINIDOM,
warningProc=IGNORE_WARNINGS,
errorLimit=_XS_VAL_DEFAULT_ERROR_LIMIT,
verbose=0,
useCaching=1,
processXInclude=1):
xmlifClass:
XMLIF_MINIDOM,
XMLIF_4DOM or
XMLIF_ELEMENTTREE
warningProc: IGNORE_WARNINGS, PRINT_WARNINGS or
STOP_ON_WARNINGS
verbose:
0, 1 or 2
useCaching: 0 or 1 (1: use internal caching for performance optimization, option new in release 0.9.0)
processXInclude: 0 or 1 (1: process XInclude instruction before validation, option new in release 0.9.0)
Convenience
functions:
parseAndValidate
(inputFile, xsdFile=None, **kw):
minixsv uses the
schema file referred in the "schemaLocation" or
"noNamespaceSchemaLocation" attribute of the "inputFile" root tag by
default.
Only if no schema file is specified in the input file, the schema file
given by the input parameter xsdFile is used,
i.e. the
schema specification in the input file has priority (changed in release 0.8)!
Other options (**kw) are forwarded to the XsValidator class.
Return
value is a wrapper
object containing the PSVI (Post-Schema-Validation-Information-Set).
parseAndValidateString
(inputText, xsdText=None, **kw):
This
function expects
text strings containing XML
code instead of filenames.
minixsv uses
the schema
file referred in the "schemaLocation" or
"noNamespaceSchemaLocation"
attribute of the "inputText" root tag by default.
Only
if no schema file is specified in "inputText", the schema given by the
input parameter xsdText is used,
i.e. the
schema specification of "inputText" has priority (changed in release 0.8)!
Other
options (**kw) are forwarded to the XsValidator class.
Return
value is a wrapper object containing the PSVI
(Post-Schema-Validation-Information-Set).
parseAndValidateXmlInput
(inputFile, xsdFile=None, validateSchema=0, **kw):
Same
as parseAndValidate,
but schema file is not validated by default.
minixsv
uses the
schema file referred in the "schemaLocation" or
"noNamespaceSchemaLocation" attribute of the "inputFile" root tag by
default.
Only if no schema file is specified in the input file, the schema file
given by the input parameter xsdFile is used,
i.e. the
schema specification in the input file has now priority (changed in release 0.8)!
Other
options (**kw) are forwarded to the XsValidator class.
Return
value is a wrapper
object containing the PSVI (Post-Schema-Validation-Information-Set).
parseAndValidateXmlInputString
(inputText, xsdText=None, validateSchema=0, **kw):
Same
as parseAndValidateString,
but schema file is not validated by default.
This
function expects text strings containing XML code instead of filenames.
minixsv
uses
the schema
file referred in the "schemaLocation" or
"noNamespaceSchemaLocation"
attribute of the "inputText" root tag by default.
Only
if no schema file is specified in "inputText", the schema given by the
input parameter xsdText is used,
i.e. the
schema specification of "inputText" has now priority (changed in release 0.8)!
Other
options (**kw) are forwarded to the XsValidator class.
Return value is a wrapper object containing the PSVI
(Post-Schema-Validation-Information-Set).
parseAndValidateXmlSchema
(xsdFile, **kw):
This function validates only the
given schema file.
Other options (**kw) are forwarded to the XsValidator class.
Return
value is a wrapper
object containing the PSVI (Post-Schema-Validation-Information-Set).
parseAndValidateXmlSchemaString
(xsdText, **kw):
This function validates only the
given schema text
string..
Other options (**kw) are forwarded to the XsValidator class.
Return
value is a wrapper
object containing the PSVI (Post-Schema-Validation-Information-Set).
Examples
for invoking minixsv:
from
genxmlif import
GenXmlIfError
from minixsv import pyxsval
try:
# use default values of
minixsv, location of the schema file must be specified in the XML file
domTreeWrapper = pyxsval .parseAndValidate ("Test.xml")
# domTree is a minidom
document object
domTree = domTreeWrapper.getTree()
# call validator with non-default values
elementTreeWrapper = pyxsval .parseAndValidate ("Test.xml",
xsdFile="TestSchema.xsd",
xmlIfClass=
pyxsval.XMLIF_ELEMENTTREE,
warningProc=pyxsval.PRINT_WARNINGS,
errorLimit=200, verbose=1,
useCaching=0, processXInclude=0)
# get elementtree object after validation
elemTree = elementTreeWrapper.getTree()
except pyxsval.XsvalError, errstr:
print errstr
print "Validation aborted!"
except GenXmlIfError, errstr:
print errstr
print "Parsing aborted!"
Steps of validation performed by
minixsv:
1. Parse XML input file and XML schema file (calls the parser
of the configured DOM implementation)
2. Validate XML schema
3. Validate XML input
Since the validator is written in pure Python, it is not very fast.
Instead of function "parseAndValidate()"
the functions "parseAndValidateXmlSchema()" and "parseAndValidateXmlInput()" can
be used.
To speed-up validation "parseAndValidateXmlSchema" can be skipped,
if you are sure that the
XML schema file is valid.
Using the 4DOM interface is rather
slow. For best performance the
elementtree interface should be used.
Note: It
is essential for validation of the XML input that the XML schema file
is valid.
Otherwise
runtime errors may occur inside minixsv.
The
input file and xsd
file for validation may be a path or an URL.
Caution: Interface changed in release 0.9.0!
Parser and XInclude
errors will now result in "GenXmlIfError" exceptions,
validation errors
will result in "XsvalError" exceptions.
After successful validation "parseAndValidate...()"
minixsv
returns a XML tree wrapper object
(containing a DOM document or an
elementtree) to the caller.
minixsv inserts "default" and
"fixed"
attributes
automatically into the XML tree if they are not specified in the XML
input file.
minixsv
also normalizes and collapses white spaces of the XML input according
to the specification in the XML schema.
Note, that the PSVI
(Post-Schema-Validation-Information-Set) of minixsv does not contain
all the other information
specified by the XML schema
standard 1.0.
Limitations
minixsv is in beta state (version 0.9.0). and supports a subset of
the XML schema
standard 1.0.
minixsv currently has at least the following limitations/restrictions:
- no check if derived type and base type match
- no check of attributes "final", "finalDefault"
- no support of substitution groups
- no support of abstract elements and types
- restrictions regarding pattern matching:
* subtraction of character sets not supported, e.g. regex = "[\w-[ab]]"
* character sets with \I, \C, \P{...} not supported, e.g. regex = "[\S\I\?a-c\?]"
(character sets with \i, \c, \p{...} are supported!)
Note: This constraint list may not be complete!
Advanced features
addUserSpecXmlIfClass
(xmlIfKey, factory):
Convenience function to add an user
defined XML interface class.
This function expects a key to identify the interface class and a
factory function which creates an instance
of the user defined XML interface class.
Example for minidom:
def _minidomInterfaceFactory (verbose):
import minidomif
return
minidomif.MiniDomInterface(verbose)
Download
You can get the current release of minixsv here.
Copyright
2004-2008 by Roland Leuthe
|
|
|
 |
 |
|
|
 |
|
|
|
|