Big Data/CumulusRDF
< Big Data
CumulusRDF is a distributed RDF store that stores the RDF triples in the key-value store Apache Cassandra.
Each RDF triple consists of a subject (S), a property (P) and an object (O). Several RDF triple form a graph where P is an labelled directed edge starting at S and leading to O. The graph can be queried by using eight basic graph pattern (BGP). In order to answer these queries efficiently three indices are provided by CumulusRDF: SPO, POS and OSP. The different BGPs and which index is required to answer them are shown in the following table:
Triple Pattern | Index |
---|---|
(spo) | SPO, POS, OSP |
(sp?) | SPO |
(?po) | POS |
(s?o) | OSP |
(?p?) | POS |
(s??) | SPO |
(??o) | OSP |
(???) | SPO, POS, OSP |
CumulusRDF supports two storage representations for RDF triples of the form (s, p, o):
- Hierarchical Layout
- { s : { p : { o : - } } }, { o : { s : { p : - } } } and { p : { o : { s : - } } } are inserted.
- Flat Layout
- { s : { po : - } }, { o : { sp : - } }, { po : { s : - } } and { po : { 'p' : p } } are inserted.
- The third key-key-value triple is required since Apache Cassandra stores all triples with the same key on the same data node. Since some property like rdf:type are used very often this would lead to an unbalanced load distribution. Therefore, the property concatenated with the object is used as key.
- In order to find all triples with the same property, i.e., (?p?) the fourth triple is required. It is used in a secondary index that maps all values p to all keys (po) in which this value occurs.
References
edit- CumulusRDF - official web site
- G. Ladwig and A. Harth "CumulusRDF: Linked Data Management on Nested Key-Value Stores". Proceedings of the 7th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS2011) at the 10th International Semantic Web Conference (ISWC2011), 2011.