Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xs:NCName puzzler #911

Closed
GaryGapinski opened this issue Apr 22, 2021 · 3 comments · Fixed by #936 or #948
Closed

xs:NCName puzzler #911

GaryGapinski opened this issue Apr 22, 2021 · 3 comments · Fixed by #936 or #948
Assignees
Labels
question Scope: Metaschema Issues targeted at the metaschema pipeline Scope: Modeling Issues targeted at development of OSCAL formats

Comments

@GaryGapinski
Copy link

I have found that NCNames containing leading or trailing whitespace provoke no validation errors but cannot reliably be used in XPath references. The same problem is encountered even if these were constrained to xs:ID or xs:IDREF. xs:NCName, xs:ID, and xsd:IDREF all have "whiteSpace = collapse" facets in their definitions

E.g.,

<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="https://github.com/raw/usnistgov/OSCAL/master/xml/schema/oscal_catalog_schema.xsd" schematypens="http://www.w3.org/2001/XMLSchema" title="OSCAL Catalog schema"?>
<catalog xmlns="http://csrc.nist.gov/ns/oscal/1.0" uuid="fafe5776-b69f-41ea-a5cf-2192bdc4242b">
    <metadata>
        <title>catalog 1</title>
        <last-modified>2021-04-22T17:09:41Z</last-modified>
        <version>one</version>
        <oscal-version>1.0.0-rc1</oscal-version>
    </metadata>
    <control id="   control-1   ">
        <title>Control one</title>
        <param id="param-1">
            <label>organization-defined time</label>
        </param>
        <part name="statement" id="statement-1">
            <p>Open the perimeter gate <insert type="param" id-ref="param-1"/>:</p>
        </part>
    </control>
</catalog>

has three spaces surrounding the control id attribute (<control id="␢␢␢control-1␢␢␢">), validates correctly, but the following

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:math="http://www.w3.org/2005/xpath-functions/math" exclude-result-prefixes="xs math" version="3.0"
    xmlns:oscal="http://csrc.nist.gov/ns/oscal/1.0" xpath-default-namespace="http://csrc.nist.gov/ns/oscal/1.0">
    <xsl:output method="text"/>
    <xsl:template match="/">
        <xsl:text expand-text="true">//control/@id contains «{//control/@id}»&#x0a;</xsl:text>
        <xsl:text expand-text="true">"if (//control[@id = 'control-1']) then 'true' else 'false'" is {if (//control[@id = 'control-1']) then 'true' else 'false'} &#x0a;</xsl:text>
        <xsl:text expand-text="true">"if (//control[@id = ' control-1 ']) then 'true' else 'false'" is {if (//control[@id = ' control-1 ']) then 'true' else 'false'} &#x0a;</xsl:text>
        <xsl:text expand-text="true">"if (//control[@id = '   control-1   ']) then 'true' else 'false'" is {if (//control[@id = '   control-1   ']) then 'true' else 'false'} &#x0a;</xsl:text>
    </xsl:template>
</xsl:stylesheet>

produces

//control/@id contains «   control-1   »
"if (//control[@id = 'control-1']) then 'true' else 'false'" is false 
"if (//control[@id = ' control-1 ']) then 'true' else 'false'" is false 
"if (//control[@id = '   control-1   ']) then 'true' else 'false'" is true 

Note that the whitespace has been processed as if "whiteSpace preserve" were in effect. "whiteSpace collapse" should consider leading and trailing whitespace as inconsequential

The above was tested with XML Editor 23.1 with the transformation using Saxon-PE 9.9.1.7. Validation also succeeds using xmllint (xmllint: using libxml version 20910).

I have not yet tried a variant schema with an extension of xs:NCName constrained to exclude whitespace.

@wendellpiez
Copy link
Contributor

wendellpiez commented Apr 30, 2021

I suggest we replace or supplement NCName with a datatype along the lines of tokenOnly (appropriate string of characters, no whitespace). This would inhibit the introduction of "padded values" to begin with, thus forestalling the question of normalization.

Not relying on NCName at all (simple extending string) would also address the root of the problem since it would turn whitespace normalization off (under schema-aware parsing) so that " control1" and "control1" are not the same.

We can also broaden the scope of this effort and consider whether at this point we should not scrub our datatypes of problematic DTD-based data formats altogether. (They survive from early versions of OSCAL when their convenience outweighed their cost. But we have the infrastructure now.) Related to @GaryGapinski's original observation, the wiggliness around these data types also impacts XML-JSON conversion -- effectively it permits silent bugs on the XML side.

Questions for @david-waltermire-nist:

  • what do you think of the tokenOnly datatype idea?
  • how broadly should we scope this effort? timing is not bad - is it limited to NCName?

@david-waltermire david-waltermire added Scope: Metaschema Issues targeted at the metaschema pipeline Scope: Modeling Issues targeted at development of OSCAL formats labels May 17, 2021
wendellpiez added a commit to wendellpiez/metaschema that referenced this issue May 21, 2021
@wendellpiez
Copy link
Contributor

This should also address usnistgov/metaschema#67 and usnistgov/metaschema#68 thanks @GaryGapinski very timely indeed.

wendellpiez added a commit to wendellpiez/metaschema that referenced this issue May 21, 2021
david-waltermire pushed a commit to usnistgov/metaschema that referenced this issue May 21, 2021
* Addressing datatype validation issues: whitespace collapsing; non-empty values; ncname-workalike in JSON Schema - see usnistgov/OSCAL#911  usnistgov/OSCAL#805 also #33 #67 #68
* Improvements to XSD production; fully aligning 'token' datatype across XSD and JSON Schema implementations.
@david-waltermire david-waltermire linked a pull request May 21, 2021 that will close this issue
8 tasks
david-waltermire pushed a commit that referenced this issue May 21, 2021
* Adjusted metaschemas: new 'version'; json-base-uri
* Added 'complete' metaschema
* Changes to OSCAL metaschemas in view of enhancements addressing #805, #911, #67, #868.
david-waltermire pushed a commit that referenced this issue May 27, 2021
* Adjusted metaschemas: new 'version'; json-base-uri
* Added 'complete' metaschema
* Changes to OSCAL metaschemas in view of enhancements addressing #805, #911, #67, #68
david-waltermire pushed a commit to david-waltermire/OSCAL that referenced this issue May 27, 2021
* Adjusted metaschemas: new 'version'; json-base-uri
* Added 'complete' metaschema
* Changes to OSCAL metaschemas in view of enhancements addressing usnistgov#805, usnistgov#911, usnistgov#67, usnistgov#868.
david-waltermire pushed a commit to david-waltermire/OSCAL that referenced this issue May 27, 2021
* Adjusted metaschemas: new 'version'; json-base-uri
* Added 'complete' metaschema
* Changes to OSCAL metaschemas in view of enhancements addressing usnistgov#805, usnistgov#911, usnistgov#67, usnistgov#68
david-waltermire pushed a commit that referenced this issue May 27, 2021
* Adjusted metaschemas: new 'version'; json-base-uri
* Added 'complete' metaschema
* Changes to OSCAL metaschemas in view of enhancements addressing #805, #911, #67, #868.
david-waltermire pushed a commit that referenced this issue May 27, 2021
* Adjusted metaschemas: new 'version'; json-base-uri
* Added 'complete' metaschema
* Changes to OSCAL metaschemas in view of enhancements addressing #805, #911, #67, #68
david-waltermire pushed a commit to david-waltermire/OSCAL that referenced this issue May 27, 2021
* Adjusted metaschemas: new 'version'; json-base-uri
* Added 'complete' metaschema
* Changes to OSCAL metaschemas in view of enhancements addressing usnistgov#805, usnistgov#911, usnistgov#67, usnistgov#68
david-waltermire pushed a commit to david-waltermire/OSCAL that referenced this issue May 27, 2021
* Adjusted metaschemas: new 'version'; json-base-uri
* Added 'complete' metaschema
* Changes to OSCAL metaschemas in view of enhancements addressing usnistgov#805, usnistgov#911, usnistgov#67, usnistgov#68
david-waltermire pushed a commit to david-waltermire/OSCAL that referenced this issue May 27, 2021
* Adjusted metaschemas: new 'version'; json-base-uri
* Added 'complete' metaschema
* Changes to OSCAL metaschemas in view of enhancements addressing usnistgov#805, usnistgov#911, usnistgov#67, usnistgov#68
david-waltermire pushed a commit to david-waltermire/OSCAL that referenced this issue May 27, 2021
* Adjusted metaschemas: new 'version'; json-base-uri
* Added 'complete' metaschema
* Changes to OSCAL metaschemas in view of enhancements addressing usnistgov#805, usnistgov#911, usnistgov#67, usnistgov#68
david-waltermire pushed a commit to david-waltermire/OSCAL that referenced this issue May 27, 2021
* Adjusted metaschemas: new 'version'; json-base-uri
* Added 'complete' metaschema
* Changes to OSCAL metaschemas in view of enhancements addressing usnistgov#805, usnistgov#911, usnistgov#67, usnistgov#68
david-waltermire added a commit to usnistgov/metaschema that referenced this issue Jun 6, 2021
* Rework of docs focusing on JSON docs and model pipeline
* Improvements to composition toolchain
* Fixed a few small bugs in the metaschema-check. Improved performance of the compose pruning using an accumulator.
* Moved edge-case samples into testing directory
* Made shadowing warning a warning
* Initial commit of an Oxygen Metaschema framework.
* Creation of new compose schematron unit tests.
* Cross-linking XML and JSON syntax pages and other improvements to links
* Now building XML and JSON indexes to reference pages, with links to steps
* Reconfigured docs pipeline (XSLT entry points); adding new files including pipeline steps
* Migrating schema generation tools to new/improved composition pipeline
* Addressing usnistgov/OSCAL#902 thanks for finding this bug
* Enhancements to JSON Schema definition (with better performance too)
* Adding support for json-base-uri as a metaschema property
* Updated JSON schema $id; factoring out common docs XSLT
* Fixing IDs in JSON schema per issue usnistgov/OSCAL#933.
* Addressing datatype validation issues: whitespace collapsing; non-empty values; ncname-workalike in JSON Schema - see usnistgov/OSCAL#911  usnistgov/OSCAL#805 also #33 #67 #68
* Improvements to XSD production; fully aligning 'token' datatype across XSD and JSON Schema implementations.
* Updating bidirectional XML/JSON converter generators (#143)
* Committing a version that handles test data correctly (so far) from rebuilt metaschema composition addressing #51 #53 #76
* Now displaying constraints in documentation at point of definition;
* Docs generation revamp Reworked reference and other pages to sketch - #128 and others

Co-authored-by: Wendell Piez <wendell.piez@nist.gov>
david-waltermire pushed a commit that referenced this issue Jun 7, 2021
* Adjusted metaschemas: new 'version'; json-base-uri
* Added 'complete' metaschema
* Changes to OSCAL metaschemas in view of enhancements addressing #805, #911, #67, #868.
@david-waltermire
Copy link
Contributor

Changes to address this were integrated in PR #948. These changes will be released with OSCAL 1.0.0.

nikitawootten-nist pushed a commit to nikitawootten-nist/metaschema-xslt that referenced this issue Jul 21, 2023
* Rework of docs focusing on JSON docs and model pipeline
* Improvements to composition toolchain
* Fixed a few small bugs in the metaschema-check. Improved performance of the compose pruning using an accumulator.
* Moved edge-case samples into testing directory
* Made shadowing warning a warning
* Initial commit of an Oxygen Metaschema framework.
* Creation of new compose schematron unit tests.
* Cross-linking XML and JSON syntax pages and other improvements to links
* Now building XML and JSON indexes to reference pages, with links to steps
* Reconfigured docs pipeline (XSLT entry points); adding new files including pipeline steps
* Migrating schema generation tools to new/improved composition pipeline
* Addressing usnistgov/OSCAL#902 thanks for finding this bug
* Enhancements to JSON Schema definition (with better performance too)
* Adding support for json-base-uri as a metaschema property
* Updated JSON schema $id; factoring out common docs XSLT
* Fixing IDs in JSON schema per issue usnistgov/OSCAL#933.
* Addressing datatype validation issues: whitespace collapsing; non-empty values; ncname-workalike in JSON Schema - see usnistgov/OSCAL#911  usnistgov/OSCAL#805 also usnistgov#33 usnistgov#67 usnistgov#68
* Improvements to XSD production; fully aligning 'token' datatype across XSD and JSON Schema implementations.
* Updating bidirectional XML/JSON converter generators (#143)
* Committing a version that handles test data correctly (so far) from rebuilt metaschema composition addressing usnistgov#51 usnistgov#53 usnistgov#76
* Now displaying constraints in documentation at point of definition;
* Docs generation revamp Reworked reference and other pages to sketch - #128 and others

Co-authored-by: Wendell Piez <wendell.piez@nist.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Scope: Metaschema Issues targeted at the metaschema pipeline Scope: Modeling Issues targeted at development of OSCAL formats
Projects
None yet
3 participants