WikiJournal Preprints/Functional Tuple Store

WikiJournal Preprints
Open access • Publication charge free • Public peer review

This article is an unpublished pre-print not yet undergoing peer review.

To submit this article for peer review, please:
Submit authorship declaration form

Article information

Abstract

The advancement in ordered key-value store tools, make them a good candidate to implement higher level database abstractions. In fact, it is possible to store tuples of n-items in a Direct-Acyclic-Graph history. That can allow to implement a workflow similar to git in the context of bigger than memory data.

Introduction

Nowadays, there is a lot of ordered key-value store available. Some of those are distributed like TiKV or FoundationDB. Some are embedded like WiredTiger or RocksDB. It is nice abstraction that has some success in the industry, among others there is Google Spanner^[1].

This work goes beyond the work done on OSTRICH^[2] in sense that fstore handles tuple of $n$ items and tuples are versioned using branches like in git.

The first section describe how to implement a generic $n$ tuple store, the second part dive into how to use the previous abstraction to implement an efficient versioned n-tuple store. The paper ends with a conclusion and future work.

Triple store

The generic tuple store is a database abstraction that store tuple of $n$ items in set. Every tuple contains $n$ items and every tuple is unique.

The problem is: how to query 3-tuples in an ordered key-value store (okvs).

We model the ordered key-value store using a table with two columns that stores bytes where keys are ordered using lexicographic order (dictionary order). Here is an example:

key	value
01	01
01 02	02
02	03
10 01	04

There exists an algorithm that allows to translate any tuple of integer and string into bytes that preserve their natural order. The lexicographic packing algorithm will order integers before strings. That is, we can consider that the ordered key-store can host tuples of integers and strings and that every (key, value) appears sorted by the key. Here is an example:

key	value
("hello", "world", 42)	0
("hello", "world", 1337)	0
("hello", "world", 2019)	0

As you can see this forms already some kind of tuple store where (subject, predicate, object) are stored as key and the value is a filler.

One of the property of ordered key-value store is that they allow to query by key range and key prefix. For instance, in any ordered key-value store it would be very easy to query for every tuple that starts with "hello". We also use that property to allow to query any pattern. A pattern is a tuple (subject, predicate, object) where one or more item as variable. Here are all the patterns for 3-tuples where variables are in bold:

(subject, predicate, object)
(subject, predicate, object)
(subject, predicate, object)
(subject, predicate, object)
(subject, predicate, object)
(subject, predicate, object)

To be able to query any pattern, one could use every permutations of (subject, predicate, object) and query by pattern prefix. For instance, the following pattern:

((variable 'one) "world" 1337)

Can be queried by constructing a subspace of the key-value store where (predicate, object) or (object, predicate) is prefix. For every tuple the database contains, it must permute each of the tuple 6 times to be able to query every pattern. That being said, we can recognize a given permutation can allow to query multiple pattern. So there is some duplication. Let's say by trial and error, we discover that the minimal set that allow to query any 3-tuple pattern in one hop is the following permutations:

(subject, predicate, object)
(predicate, object, subject)
(object, subject, predicate)

Re-using the above example, we will need to construct the following okvs:

Versioned Generic Tuple Store

There are other solutions. One can find them all by computing all 3-set of permutations of (subject, predicate, value) and checking that the 3 set verify the following property:

(define (permutation-prefix? c o)
  (any (lambda (p) (prefix? p o)) (permutations c)))

(define (ok? combinations candidate)
  (every (lambda (c) (any (lambda (p) (permutation-prefix? c p)) candidate)) combinations))

Conclusion and future work

The functional tuple store rely on an ordered key-value store to deliver a pragmatic versatile ACID-compliant versioned generic tuple store. The fact that it is generic allows to represent many kinds of data and metadata. The fact that it is versioned allows to keep track of what happened through time travelling queries. Hopefully, more generic, possibly versioned, tuple store will be created in other programming languages.

Additional information

Acknowledgements

Any people, organisations, or funding sources that you would like to thank.

Competing interests

Any conflicts of interest that you would like to declare. Otherwise, a statement that the authors have no competing interest.

Ethics statement

An ethics statement, if appropriate, on any animal or human research performed should be included here or in the methods section.

References

↑ "Spanner: Google's Globally-Distributed Database". ai.google. Retrieved 2019-06-16.
↑ Verborgh, Ruben; Mannens, Erik; Herwegen, Joachim Van; Sande, Miel Vander; Taelman, Ruben (2019-01-01). "Triple Storage for Random-Access Versioned Querying of RDF Archives". Journal of Web Semantics 54 (1): 4–28. doi:10.1016/j.websem.2018.08.001. https://rdfostrich.github.io/article-jws2018-ostrich/.

[3] "Spanner: Google's Globally-Distributed Database". ai.google. Retrieved 2019-06-16.

[4] Verborgh, Ruben; Mannens, Erik; Herwegen, Joachim Van; Sande, Miel Vander; Taelman, Ruben (2019-01-01). "Triple Storage for Random-Access Versioned Querying of RDF Archives". Journal of Web Semantics 54 (1): 4–28. doi:10.1016/j.websem.2018.08.001. https://rdfostrich.github.io/article-jws2018-ostrich/.

[1]

[2]