Abstract Syntax Tree

This learning resources about the generation of an abstract syntax tree from a given document. The learning resource will address the syntactical analysis MediaWiki documents, e.g. in Wikiversity.

Flow of data in a typical parser
Flow of data in a typical parser

History of Learning Resouce

edit

Learning resource was generated in the context of passes for MarkDown in Markup languages like HTML. Abstract Syntax Trees (AST) are generated when parsers analyse the source code in a specific programming language. The tree structures represent nested environments in a programming language. As an introduction to Compilers Theory we will create an Abstract Syntax Tree (AST) for text document.

Decomposition of a Document

edit
 
Abstract Syntax Tree Document

In a first step we decompose a document into tree nodes. A tree consists of connected tree nodes. A tree is a specific graph in which die nodes (vertices) may have attributes. In our context is

  • (Document Node) the root node of the tree is the "document" node. The document node contains the information about the document which was parsed. If we parse the document in Wiki markdown, we would add references and time stamps and version id of the source document, so that JSON file contains the information about the origin of the document. This can be compared with a citation in scientific paper where a certain part of the text in an article refers to source of the cited content.
  • (Content Element List) is basically a array of content elements. An array is required because the order of the content elements must be preserved.
  • (Content Element) Can be anything from text block to media elements like video, audio or images.

Nested structures can be created by using a "Content Element List" as "Content Element".

    • "Sections" a specific "Content Element Lists" that contain a "Title" and "Content Element List" of "paragrapha",
    • "Paragraph"
    • "Images"
    • "Audio"
    • "Video"
    • "Quiz"
    • "Mathematical Expression" - Type-Attribute: inline/block
    • ...

Learning Tasks

edit
  • Define a JSON structure for Wikiversity Quizzes?
  • Try to define a grammar and create your own parser. A good starting point is aPRODUCT grammar for basic mathematical expressions like (4+3)*(8/2) and decompose that into tree. Root node of the tree will be for example a PRODUCT token, that decomposes in to child nodes that are
    • an ADDITION and
    • a FRACTION
    • the addition and fractions decompose into two numbers each. To learn about grammars and parsers you might want to start with online experiments with PEG.JS[1]
  • How can you export a Wiki source for a quiz from a JSON?
  • What are the benefits and drawbacks of a JSON format for a Wikiversity quiz instead of having the source code available directly as a text file?
  • Analyze the concept of a JSON Editor and create Quiz editor that is able to generate
  • (Parser Generator) A parser generator gets as input a Grammar and returns parser ready to use for the defined grammar.
    • Use the Open Source parser generator PEG and create a parser in Javascript that decompose the HTML text into the sections.
    • Create a JSON of the following structure
  [
      {
         title:"Introduction",
         level: 1,
         text: "My text of the introducton"  
      },
      {
         title:"Definition",
         level: 1,
         text: "provide a definition of the topic."  
      }
  ]

See also

edit

References

edit
  1. Wakita, K., Homizu, K., & Sasaki, A. (2014, August). Hygienic macro system for JavaScript and Its light-weight implementation framework. In Proceedings of ILC 2014 on 8th International Lisp Conference (pp. 12-21).
  2. SurveyJS GitHub Respository (2021) URL: https://github.com/surveyjs/survey-library (Accessed 2021/01/22)