A Universal RDF Store Interface

This document attempts to summarize some fundamental components of an RDF store. The motivation is to outline a standard set of interfaces for providing the support needed to persist an RDF Graph in a way that is universal and not tied to any specific implementation.

For the most part, the interfaces adhere to the core RDF model and use terminology that is consistent with the RDF Model specifications. However, these suggested interfaces also extends an RDF store with additional requirements necessary to facilitate those aspects of Notation 3 that go beyond the RDF model to provide a framework for First Order Predicate Logic processing and persistence.

Terminology

Interpreting Syntax

The following Notation 3 document:

{ ?x a :N3Programmer } => { ?x :has [a :Migraine] }

Could cause the following statements to be asserted in the store:

_:a log:implies _:b

This statement would be asserted in the partition associated with quoted statements (in a formula named _:a)

?x rdf:type :N3Programmer

Finally, these statements would be asserted in the same partition (in a formula named _:b)

?x :has _:c

_:c rdf:type :Migraine

Formulae and Variables as Terms

Formulae and variables are distinguishable from URI references, Literals, and BNodes by the following syntax:

{ .. } - Formula ?x - Variable

They must also be distinguishable in persistence to ensure they can be round-tripped.

Note

There are a number of other issues regarding the Persisting Notation 3 Terms.

Database Management

An RDF store should provide standard interfaces for the management of database connections. Such interfaces are standard to most database management systems (Oracle, MySQL, Berkeley DB, Postgres, etc..)

The following methods are defined to provide this capability (see below for description of the configuration string):

Store.open(configuration, create=False)[source]

Opens the store specified by the configuration string. If create is True a store will be created if it does not already exist. If create is False and a store does not already exist an exception is raised. An exception is also raised if a store exists, but there is insufficient permissions to open the store. This should return one of: VALID_STORE, CORRUPTED_STORE, or NO_STORE

Store.close(commit_pending_transaction=False)[source]

This closes the database connection. The commit_pending_transaction parameter specifies whether to commit all pending transactions before closing (if the store is transactional).

Store.destroy(configuration)[source]

This destroys the instance of the store identified by the configuration string.

The configuration string is understood by the store implementation and represents all the parameters needed to locate an individual instance of a store. This could be similar to an ODBC string or in fact be an ODBC string, if the connection protocol to the underlying database is ODBC.

The open() function needs to fail intelligently in order to clearly express that a store (identified by the given configuration string) already exists or that there is no store (at the location specified by the configuration string) depending on the value of create.

Triple Interfaces

An RDF store could provide a standard set of interfaces for the manipulation, management, and/or retrieval of its contained triples (asserted or quoted):

Store.add(xxx_todo_changeme, context, quoted=False)[source]

Adds the given statement to a specific context or to the model. The quoted argument is interpreted by formula-aware stores to indicate this statement is quoted/hypothetical It should be an error to not specify a context and have the quoted argument be True. It should also be an error for the quoted argument to be True when the store is not formula-aware.

Store.remove(xxx_todo_changeme1, context=None)[source]

Remove the set of triples matching the pattern from the store

Store.triples(triple_pattern, context=None)[source]

A generator over all the triples matching the pattern. Pattern can include any objects for used for comparing against nodes in the store, for example, REGEXTerm, URIRef, Literal, BNode, Variable, Graph, QuotedGraph, Date? DateRange?

Parameters:

context – A conjunctive query can be indicated by either

providing a value of None, or a specific context can be queries by passing a Graph instance (if store is context aware).

Note

The triples() method can be thought of as the primary mechanism for producing triples with nodes that match the corresponding terms in the (s, p, o) term pattern provided. The term pattern (None, None, None) matches all nodes.

Store.__len__(context=None)[source]

Number of statements in the store. This should only account for non- quoted (asserted) statements if the context is not specified, otherwise it should return the number of statements in the formula or context given.

Parameters:

context – a graph instance to query or None

Formula / Context Interfaces

These interfaces work on contexts and formulae (for stores that are formula-aware) interchangeably.

ConjunctiveGraph.contexts(triple=None)[source]

Iterate over all contexts in the graph

If triple is specified, iterate over all contexts the triple is in.

ConjunctiveGraph.remove_context(context)[source]

Removes the given context from the graph

Interface Test Cases

Basic

Tests parsing, triple patterns, triple pattern removes, size, contextual removes

Source Graph

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix : <http://test/> .
{:a :b :c; a :foo} => {:a :d :c} .
_:foo a rdfs:Class .
:a :d :c .

Test code

implies = URIRef("http://www.w3.org/2000/10/swap/log#implies")
a = URIRef('http://test/a')
b = URIRef('http://test/b')
c = URIRef('http://test/c')
d = URIRef('http://test/d')
for s,p,o in g.triples((None,implies,None)):
    formulaA = s
    formulaB = o

    #contexts test
    assert len(list(g.contexts()))==3

    #contexts (with triple) test
    assert len(list(g.contexts((a,d,c))))==2

    #triples test cases
    assert type(list(g.triples((None,RDF.type,RDFS.Class)))[0][0]) == BNode
    assert len(list(g.triples((None,implies,None))))==1
    assert len(list(g.triples((None,RDF.type,None))))==3
    assert len(list(g.triples((None,RDF.type,None),formulaA)))==1
    assert len(list(g.triples((None,None,None),formulaA)))==2
    assert len(list(g.triples((None,None,None),formulaB)))==1
    assert len(list(g.triples((None,None,None))))==5
    assert len(list(g.triples((None,URIRef('http://test/d'),None),formulaB)))==1
    assert len(list(g.triples((None,URIRef('http://test/d'),None))))==1

    #Remove test cases
    g.remove((None,implies,None))
    assert len(list(g.triples((None,implies,None))))==0
    assert len(list(g.triples((None,None,None),formulaA)))==2
    assert len(list(g.triples((None,None,None),formulaB)))==1
    g.remove((None,b,None),formulaA)
    assert len(list(g.triples((None,None,None),formulaA)))==1
    g.remove((None,RDF.type,None),formulaA)
    assert len(list(g.triples((None,None,None),formulaA)))==0
    g.remove((None,RDF.type,RDFS.Class))

    #remove_context tests
    formulaBContext=Context(g,formulaB)
    g.remove_context(formulaB)
    assert len(list(g.triples((None,RDF.type,None))))==2
    assert len(g)==3 assert len(formulaBContext)==0
    g.remove((None,None,None))
    assert len(g)==0

Formula and Variables Test

Source Graph

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix : <http://test/> .
{?x a rdfs:Class} => {?x a :Klass} .

Test Code

implies = URIRef("http://www.w3.org/2000/10/swap/log#implies")
klass = URIRef('http://test/Klass')
for s,p,o in g.triples((None,implies,None)):
    formulaA = s
    formulaB = o
    assert type(formulaA) == Formula
    assert type(formulaB) == Formula
    for s,p,o in g.triples((None,RDF.type,RDFS.Class)),formulaA):
        assert type(s) == Variable
    for s,p,o in g.triples((None,RDF.type,klass)),formulaB):
        assert type(s) == Variable

Transactional Tests

To be instantiated.

Additional Terms to Model

These are a list of additional kinds of RDF terms (all of which are special Literals)

  • rdflib.plugins.store.regexmatching.REGEXTerm - a REGEX string which can be used in any term slot in order to match by applying the Regular Expression to statements in the underlying graph.

  • Date (could provide some utility functions for date manipulation / serialization, etc..)

  • DateRange

Namespace Management Interfaces

The following namespace management interfaces (defined in Graph) could be implemented in the RDF store. Currently, they exist as stub methods of Store and are defined in the store subclasses (e.g. IOMemory):

Store.bind(prefix, namespace)[source]
Store.prefix(namespace)[source]
Store.namespace(prefix)[source]
Store.namespaces()[source]

Open issues

Does the Store interface need to have an identifier property or can we keep that at the Graph level?

The Store implementation needs a mechanism to distinguish between triples (quoted or asserted) in ConjunctiveGraphs (which are mutually exclusive universes in systems that make closed world assumptions - and queried separately). This is the separation that the store identifier provides. This is different from the name of a context within a ConjunctiveGraph (or the default context of a conjunctive graph). I tried to diagram the logical separation of ConjunctiveGraphs, SubGraphs and QuotedGraphs in this diagram

_images/ContextHierarchy.png

An identifier of None can be used to indicate the store (aka all contexts) in methods such as triples(), __len__(), etc. This works as long as we’re only dealing with one Conjunctive Graph at a time – which may not always be the case.

Is there any value in persisting terms that lie outside N3 (rdflib.plugins.store.regexmatching.REGEXTerm, Date, etc..)?

Potentially, not sure yet.

Should a conjunctive query always return quads instead of triples? It would seem so, since knowing the context that produced a triple match is an essential aspect of query construction / optimization. Or if having the triples function yield/produce different length tuples is problematic, could an additional - and slightly redundant - interface be introduced?:

ConjunctiveGraph.quads(triple_or_quad=None)[source]

Iterate over all the quads in the entire conjunctive graph

Stores that weren’t context-aware could simply return None as the 4th item in the produced/yielded tuples or simply not support this interface.