Automatic transformation of XML namespaces/RDF resource format

RDF resource format

This chapter describes the RDF resource format, which I call asset.

The RDF file is valid when both it conforms to the grammar forest with :Namespace and :Transformer roots.

rdfs:seeAlso predicates

When reading an RDF file, it should process triples of the forms:

:transform rdfs:seeAlso (IRI1 IRI2 ...) .
:validate rdfs:seeAlso (IRI1 IRI2 ...) .

This should add the IRIs to the list of RDF files to be downloaded (in the order of recursive retrieval described elsewhere in this specification).

Obviously :transform is for transformation and :validate is for validation.

WARNING: Recursive downloading (and thus :seeAlso) for validation may be removed in a future version.

TODO: Various form of seeAlso for both processing after or before current asset.

Scripts

A script is something which accepts an input (some XML text, in this specification) and generates an output (a text and/or a program exit status). (A script may be a Unix command, Web service, etc.)

A script is represented as an RDF node with certain properties.

This specification provides the following classes of scripts:

command line
script in a specified programming language
A Web service

Validator kind (see below) is either entire document (:entire) or by parts (:parts).

A script should not have both :transformerKind and :validatorKind (see below).

It is up to implementation what to do if a single node has several types (such as both :Command and :WebService). Rationale: ease of programming and efficiency.

Script in a specified programming language

Script for a named programming language (see below):

:Command
- {1..1} :language (IRI) (programming language)
- {0..1} :minVersion (minimum version)
- {0..1} :maxVersion (maximum version)
- {0..1} :scriptURL (URL of the script)
- {0..1} :commandString (command string)
- {0..1} :params (as in the example below)
- {0..1} :okResult (result denoting OK)
- {0..1} :preservance (preservance, float 0..1, 1.0 by default)
- {0..1} :stability (stability, float 0..1, 1.0 by default)
- {0..1} :preference (preference, float 0..1, 1.0 by default)
- {0..1} :transformerKind (transformer kind)
- {0..1} :validatorKind (validator kind)

Either :scriptURL or :commandString (but not both) should be provided. Not every programming language may support both. :params can be present only if there is :scriptURL.

Validation is considered passed if both the exit status of the command is success and the output is equal to :OkResult (if there is :OkResult predicate). Remark: There is \n in Turtle, don't forget to use it at the end of the output when needed. (TODO: What about different newline indicators in different OSes?)

A Web service

:WebService
- {1..1} :action (IRI) (request IRI)
- {1..1} :method (HTTP method)
- {1..1} :xmlField [xsd:string] (field for XML)
- {0..1} :okResult (result denoting OK)
- {0..1} :transformerKind (transformer kind)
- {0..1} :validatorKind (validator kind)
- {0..1} :preservance (preservance)
- {0..1} :stability (stability)
- {0..1} :preference (preference)

Validity constraint: :validatorKind must be present only for validators. :transformerKind must be present only for transformers. :validatorKind must be present for validators. :transformerKind must be present for transformers.

RDF describing a namespace

Namespaces are described as instances of :Namespace class.

Their format tree:

:Namespace
- {0..*} :validator (validator)
  - _ (script node)

Example: (TODO: Explain that :attribute node gets an attribute from source XML. Also say that it does not work for :entire and simple sequential transformations.)

 <http://purl.org/dc/terms/> 
   a :Namespace ;
   dc:description <http://...> ;
   # Other Dublin Core metadata.
 
   :link [
     :url <http://www.rddl.org/> ;
     :role <http://www.rddl.org/> ;
     :nature <http://www.w3.org/1999/xhtml> ;
     :purpose <http://www.rddl.org/purposes#schema-validation>
   ] ;
   
   :validator [
     a :Command ;
     :language lang:Python ;
     :minVersion "2.1" ;
     :maxVersion "3.2" ;
     :scriptURL <http://example.org/script.py> ;
     :params ([ :name "name1"; :value "value1" ] [ :name "name2"; :value "value2" ]
              [ :name "lexer" ;
                :value: [ :attribute [ :NS http://portonvictor.org/ns/comment ; :name "format" ] ]
              ]) ;
     :OkResult "OK" ;
     :preservance 0.9 ;
     :stability 0.9 ;
     :preference 0.9 ;
     :validatorKind :entire
   ] ;

A :validator is specified in the same way as :script-data (see below), except that :transformerKind parameter is ignored. The validator may have :OkResult to specify what output of the validator signifies a valid document. In absence of :OkResult for a named script and :CommandLine valid document is signified by successful command return value (0 on Unix) and for :WebService the value of :OkResult defaults to empty string.

In :validator the property :language may also refer to a namespace URL of some XML scheme (such as http://www.w3.org/2001/XMLSchema). In this case :OkResult is ignored.

A human readable description of a namespace should be specified with Dublin Core parameters.

The :link nodes are like resources in RDDL (but with our namespace instead of RDDL namespace).

A namespace description may provide :validate parameter to specify how to validate the documents whose root element is of our namespace. The :validate parameter has a subparameter :nature which should be understood accordingly RDDL specification.

There may be multiple :validate parameters in order to allow to use schema of different natures.

link parameter with subparameters :role and :nature is backward compatible with RDDL and should be understood in accordance with the RDDL specification.

Also a namespace may be a member of the following classes: :NotGrouped, :GroupedWithDescendants, :GroupedAll. See grouping examples.

RDF describing a transformer

Note: Transformers should be run in a secure sandbox, so that they would be unable to damage or read user's files. Also the time of the entire operation should be limited. (Rationale: If we are going to limit particular parts of the entire process rather it as a whole, then we would be unable to limit parts of operations done by sandboxed application, and the entire stuff would make no sense.) We may also limit the total amount of data transferred through the network, if the operating system supports it. (We can't limit a specific operation inside the sandbox.)

Implementation note: Such sandboxing can be implemented for example with SELinux for Linux. It is tempting to use Java security manager, but as of start of 2014 year, Java security is too buggy and therefore should not be used.

Their formal tree:

:Transformer
- {1..*} :sourceNamespace (source namespaces) (not every transformer is associated with a namespace)
- {0..*} :sourceNamespace (source namespaces)
- {0..*} :targetNamespace (target namespaces)
- {0..1} :universal [xsd:boolean, default false] (ignore target) [TODO: better name]
- {0..1} :inward [xsd:boolean, default true] (process XML from outward to inward or inward to outward)
- {0..1} :precedence (precedence)
- {0..*} :script (script)
  - _ (script node)

If there is :universal true option, then the target namespace is ignored for the purpose of figuring out the next transformation script. (In this case it is also recommended to skip :targetNamespace option and give a warning if it is present?)

Rationale: Consider converting XInclude to some other "inclusion" framework. Which of the transformations apply can be decided by the order of loading RDF files. This is the simplest way. TODO: Another option: "black list" some transformers and/or scripts.

Here is an example of an XSLT transformer:

 <...>
   a :Transformer ;
   dc:description <http://...> ;
   # Other Dublin Core metadata.
   :sourceNamespace <...> ;
   :targetNamespace <...> ;
   :precedence <...> ;
   :script [
     a :Command ;
     :language lang:XSLT ;
     :minVersion "2.0" ;
     :scriptURL <http://example.org/scripts/foo.xslt> ;
     :transformerKind :entire ;
     :argument [
       :name "debug" ;
       :value false
     ] ;
     :argument [
       :name "other" ;
       :value 123
     ] ;
     #:initial-context-node ... ; # See XSLT 2.0 spec.
     :initial-template "first" ;
     :initial-mode: "first" ;
     :preservance 0.9 ;
     :stability 0.9 ;
     :preference 0.9
   ] .

Both :sourceNamespace and :targetNamespace parameters are not required.

It is recommended but not required that objects of predicates :sourceNamespace and :targetNamespace are of :Namespace class.

A transformer may have no target NS. Example: XInclude. In this case every NS in consideration can act as the target.

We need to define precedences for different kinds of transformers, for example we would probably have the precedence “include” for XInclude and other cross-document facilities, “macro” for macroses, or precedence “formatting” for a transformer generating XSL formatting objects or SVG.

Common arguments

All transformers are subclasses of the class :Transformer. All transformers accept the following parameters:

:transformerKind may be :entire, :simpleSequential, :subdocumentSequential, :downUp, :plainText. It is used accordingly the section “Order kinds of of document transformers”.
:preservance, :stability and :preference specify a number 0..1.0. :preservance describes how much of the XML meaning is preserved (that is not lost during conversion). :stability describes how reliable is the transformer (that is whether it is likely to crash or produce meaningless results), :preference is to denote other factors for calculating priority (see below).

Priority of a chain of transformations is calculated using preservance, stability, and preference of the links of the chain. The recommended algorithm is to multiply all preservances, stabilities, and precedences of all links and then sum them.

All validators are subclasses of the class :validator. All valdators accept the following parameters:

:validatorKind may be :entire or :parts. It is used as described in the Validation chapter.

Particular types of transformers

A language transformer (as below) has either :scriptURL or :scriptText predicate (but not both).

XSLT, Java, Python, Ruby, et al

 :script [
   a :Command ;
   :language lang:Python ;
   :minVersion "2.1" ;
   :maxVersion "3.2" ;
   :scriptURL <http://example.org/script.py>
 ]

This example means that the script http://example.org/script.py is run by Python interpreter of at least 2.1 up to 3.2 version.

Max version may be of the form X.* to denote all subversions of X. TODO: Describe the grammar and comparison order of versions. http://www.dmitry-kazakov.de/ada/strings_edit.htm#11 and https://groups.google.com/forum/#!topic/comp.lang.ada/GRM4ZDi4H6M

named script
- {0..1} :minVersion "2.1" (xsd:string) (minimum version)
- {0..1} :maxVersion "3.2" (xsd:string) (maximum version)
- {1..1} :scriptURL (script URL)
- {0..1} :arguments (only for XSLT) (script arguments)
- {0..1} :initialTemplate (only for XSLT) (the initial template for XSLT)
- {0..1} :initialMode (only for XSLT) (specifies the initial mode for XSLT)

Recommendation: If several suitable versions of the interpreter are available, use the maximal allowed version.

The following languages should be available:

XSLT
Python
Java
Ruby
Perl
TODO

Web service.

 :script
   a :WebService ;
   :form <http://example.org/form> ;
   :method "post" ; # or "get"
   :xmlField "text" .

This sends POST request to http://example.org/form which should return an XML document.

Describing precedences

:Precedence is an RDF-S class, whose members are RDF-S classes.

It is required that precedences are members of :Precedence class.

:rdfs:subClassOf and :higherThan for precedences work only if both the left and the right side are declared as precedences (a :Precedence) in the same RDF file.

 <http://example.org/precedences/macro>
   a :Precedence
   rdfs:subClassOf <...> ;
   :higherThan <...> ;
   :higherThan <...> ;
   :lowerThan <...> .

The predicate :higherThan can apply to precedences.

The "subclass'" relation is the smallest partial order for given rdfs:subClassOf relations of all loaded assets. It is an error if there is no such partial order (that if there are cycles).

The following rules (see also https://math.stackexchange.com/q/2593701/4876) are used to deduce which entities have “higher than” precedence relative an other entity:

Every precedence is higher than itself.
If :higherThan parameter is specified inside a :Precedence description then the described entity is of higher precedence than the referred to entity.
If a class A has higher precedence than an entity B and the entity B has higher precedence than an class C, then the class A has higher precedence than the class C.
If A has strictly higher precedence than B then the same holds for every their respective subclasses A1 and B1.

The entities are related by “higher than” relation if and only if this relation can be deduced from the above rules (for all currently loaded RDF resources). In other words, higher than is the smallest partial order conforming to the above.

If a circle of precedences is encountered this is a fatal error.

A precedence is singleton when either it is declared to be a member of :Singleton class as in the following example or it is a direct or indirect subclass of a signleton:

<http://example.org/MyPrecedence> a :Singleton .

https://math.stackexchange.com/a/2606958/4876 about calculating the "higher than" relation.

Implementation notes

← Extracting information from RDF

Automatic transformation of XML namespaces

Analyzing XML →