Automatic transformation of XML namespaces/Future directions

Some support for http://www.w3.org/2000/xmlns/ (as specified by DOM specification) and non-namespaced elements.

There should be a mean that a user could provide an asset to be loaded before the main loop. There are a few issues about this however:

  • It should be a local file (not an Internet URL) until we provide reliable sandboxing.
  • Notwithstanding the above, we should be able to provide either a file or a URL; should we have two command line options, for a file, and for a URL?
  • We provide a single file or a namespace URL of an asset? (there may be more than one asset identified by the same URL, for example a local file or a real content of the URL).
  • If we provide a single file (not an URL of a namespace), then should we record that this file was accessed not to be downloaded again?

There are three "next script" algorithms, five transformer kinds etc. We probably should simplify this specification by removing not really necessary components.

Instead of summing preferences, take their minimum. This would refrain from transformation with low preference like HTML -> text -> HTML (which would just strip tags). For performance, it is better to sum the inverses? Idea: For performance specify actual milliseconds (in a typical case), well it is dependent on the CPU.

Add "plain text" output compatible with every destination namespace. <x>...</x> should transform to ... or to <x>...</x> without enclosed tags? How to specify these two distinct transformations?

Use collections ontology?

Make parts of RDF files optional based on a user provided set of URI options. (Use RDF graphs?)

Specifying a transformer with source/target being several namespaces treated as if it would be one NS. (Example: Dublin Core and dcterms: namespaces.)

Ability to restrict to certain elements/attributes of a namespace instead specifying the whole namespace.

Non-XML output formats. (For these only entire document transformers can be applied.) We can also use non-XML input formats. Also note that XSLT 3.0 supports JSON.

XProc: An XML Pipeline Language

Can we process several namespaces at once when the transformations (with different source namespaces) have exactly equal precedence? (Hm, this cannot be done if the transformers have different kinds. Should we enable concurrent processing of several namespaces when both their precedence and their order kind is the same?) Should we point one or several processors (one for each NS) for these multiple-namespaces transformers? (The transformers should be of the same order kind.)

The option of interactive choosing order of transformers.

There should be a (finite or infinite) mapping from a URL to several URLs when we downloading them.

Should we introduce “composite” scripts (consisting of several transformations sequentially)? First, they would badly interact with searching transformation path. Second, it looks like a cart ahead of horse that in this case we define a script through transformations (not vice versa). Mentioning this, are there weighty enough arguments to add such a construct?

We should formally describe and use XML Grouping. Some combinations of grouping and order kinds of transformers make no sense. Require to give a warning in such situations. How grouping should limit arbitrary choice one of several enriched scripts of the same singular precedence? (It is a rather difficult problem. Please leave your comments.) What to do with namespaced attributes?

We can associate precedences with namespaces and make it the default precedence of the transformers associated with the namespace in question.

Option to stop transformation if a document does not validate.

We should publish in SoftwareX.

What's about multiple output files (as when splitting an XHTML file into chapters)?

Group transformations by "topics" or "themes". For example in "color" theme XHTML tag <strong> may be transformed into red color. It also makes sense to define sub-themes: For example "background color" would be a subtheme of "color" and transform <strong> into an item with red background. It seems impossible to implement coherently: For example, one would need to specify that XInclude processing is unrelated to colors, in order not to filter out XInclude when we are on a color theme. Well, we can limit limiting to color theme only for these transformations/scripts, which belong to "visual" theme. Is it a good idea?

Support loading an archive (such as .tag.gz or .zip) with source code (for transformation or validation).

We should support asynchronous downloading of several files at once. Or shall we (for simplicity) download strictly in sequence?

What to do if there are multiple rdf:type triples per node?

What's about the datatype for string values? should it be xsd:string or arbitrary?

Plugins to support non-XML formats.

Support of "pipelines" (like Unix pipelines) of several scripts for a transformation.

User options to differentiate between several different semantics assigned to the same namespace.