rdf4j logo Eclipse rdf4j: documentation

Home

Programming with RDF4J

1. Setup

Before you can get started programming with RDF4J, you will need to set up your development environment, download the necessary, libraries, and so on. This chapter gives you some pointers on how to install the RDF4J libraries and how to initialize your project.

1.1. Using Apache Maven

By far the most flexible and useful way to include RDF4J in your project, is to use Maven. Apache Maven is a software management tool that helps you by offering things like library version management and dependency management (which is very useful because it means that once you decide you need a particular RDF4J library, Maven automatically downloads all the libraries that your library of choice requires in turn). For details on how to start using Maven, we advise you to take a look at the Apache Maven website. If you are familiar with Maven, here are a few pointers to help set up your maven project.

1.1.1. Maven Repository

RDF4J is available from the Central Repository, which means you don’t need to add an additional repository configuration to your project.

1.1.2. Maven Artifacts

The groupId for all RDF4J core artifacts is org.eclipse.rdf4j. To include a maven dependency in your project that automatically gets you the entire RDF4J core framework, use artifactId rdf4j-runtime:

    <dependency>
      <groupId>org.eclipse.rdf4j</groupId>
      <artifactId>rdf4j-runtime</artifactId>
      <version>${rdf4j.version}</version>
    </dependency>

For many projects you will not need the entire RDF4J framework, however. You can fine-tune your dependencies so that you don’t include more than you need. Here are some typical scenarios and the dependencies that go with it. Of course, it’s up to you to vary on these basic scenarios and figure exactly which components you need (and if you don’t want to bother you can always just use the ‘everything and the kitchen sink’ rdf4j-runtime dependency).

1.1.3. Simple local storage and querying of RDF

If you require functionality for quick in-memory storage and querying of RDF, you will need to include dependencies on the SAIL repository module (artifactId rdf4j-repository-sail) and the in-memory storage backend module (artifactId rdf4j-sail-memory):

    <dependency>
      <groupId>org.eclipse.rdf4j</groupId>
      <artifactId>rdf4j-repository-sail</artifactId>
      <version>${rdf4j.version}</version>
    </dependency>
    <dependency>
      <groupId>org.eclipse.rdf4j</groupId>
      <artifactId>rdf4j-sail-memory</artifactId>
      <version>${rdf4j.version}</version>
    </dependency>

A straightforward variation on this scenario is of course if you decide you need a more scalable persistent storage instead of (or alongside) simple in-memory storage. In this case, you can include the native store:

    <dependency>
      <groupId>org.eclipse.rdf4j</groupId>
      <artifactId>rdf4j-sail-nativerdf</artifactId>
      <version>${rdf4j.version}</version>
    </dependency>

1.1.4. Parsing / writing RDF files

The RDF4J parser toolkit is called Rio, and it is split in several modules: one for its main API (rdf4j-rio-api), and one for each specific syntax format. If you require functionality to parse or write an RDF file, you will need to include a dependency on any of the parsers for that you will want to use. For example, if you expect to need an RDF/XML syntax parser and a Turtle syntax writer, include the following 2 dependencies (you do not need to include the API dependency explicitly since each parser implementation depends on it already):

<dependency>
  <groupId>org.eclipse.rdf4j</groupId>
  <artifactId>rdf4j-rio-rdfxml</artifactId>
  <version>${rdf4j.version}</version>
</dependency>
<dependency>
  <groupId>org.eclipse.rdf4j</groupId>
  <artifactId>rdf4j-rio-turtle</artifactId>
  <version>${sesame.version}</version>
</dependency>

1.1.5. Accessing a remote RDF4J Server

If your project only needs functionality to query/manipulate a remotely running RDF4J Server, you can stick to just including the HTTPRepository module (rdf4j-repository-http):

<dependency>
  <groupId>org.eclipse.rdf4j</groupId>
  <artifactId>rdf4j-repository-http</artifactId>
  <version>${rdf4j.version}</version>
</dependency>

1.1.6. Accessing a SPARQL endpoint

If you want to have functionality to query a remote SPARQL endpoint, such as DBPedia, you can use the SPARQLRepository module (rdf4j-repository-sparql):

<dependency>
  <groupId>org.eclipse.rdf4j</groupId>
  <artifactId>rdf4j-repository-sparql</artifactId>
  <version>${rdf4j.version}</version>
</dependency>

1.1.7. Using the BOM (Bill Of Materials)

A problem in larger projects is a thing called ‘version mismatch’: one part of your project uses version 1.0 of a particular RDF4J artifact, and another part uses 1.0.2 of the same (or a slightly different) artifact, and because they share dependencies you get duplicate libraries on your classpath.

To help simplify this, RDF4J provides a BOM (Bill Of Materials) for you to include in your project. A BOM is basically a list of related artifacts and their versions. The advantage of including a BOM in your project is that you declare the version of RDF4J only once, and then can rely on all specific RDF4J artifact dependencies to use the correct version.

To include the BOM in your project, add the following to your project root pom:

<dependencyManagement>
	<dependencies>
		<dependency>
			<groupId>org.eclipse.rdf4j</groupId>
			<artifactId>rdf4j-bom</artifactId>
			<version>2.0</version>
			<type>pom</type>
			<scope>import</scope>
		</dependency>
	</dependencies>
</dependencyManagement>

After you have done this, you can simply include any RDF4J artifact as a normal dependency, but you can leave out the version number in that dependency. The included BOM ensures that all included RDF4J artifacts throughout your project will use version 2.0.

1.2. Using the onejar or SDK distribution

If you are not familiar with Apache Maven, an alternative way to get started with using the RDF4J libraries is to download the RDF4J onejar library and include it in your classpath.

The RDF4J onejar contains all of RDF4J’s own functionality. However, it does not contain any of the third-party libraries on which RDF4J depends, which means that if you use the onejar, you will, in addition, need to download and install these third-party libraries (if your project does not already use them, as most of these libraries are pretty common).

It is important to note that the RDF4J framework consists of a set of libraries: RDF4J is not a monolithic piece of software, you can pick and choose which parts you want and which ones you don’t. In those cases where you don’t care about picking and choosing and just want to get on with it, the onejar is a good choice.

If, however, you want a little more control over what is included, you can download the complete SDK and select (from the lib directory) those libraries that you require. The SDK distribution contains all RDF4J libraries as individual jar files, and in addition it also contains all the third-party libraries you need work with RDF4J.

1.3. Logging: SLF4J initialization

Before you begin using any of the RDF4J libraries, one important configuration step needs to be taken: the initialization and configuration of a logging framework.

RDF4J uses the Simple Logging Facade for Java (SLF4J), which is a framework for abstracting from the actual logging implementation. SLF4J allows you, as a user of the RDF4J framework, to plug in your own favorite logging implementation at deployment time. SLF4J supports the most popular logging implementations such as Java Logging, Apache Commons Logging, Logback, log4j, etc. See the SLF4J website for more info.

What you need to do is to decide which logging implementation you are going to use and include the appropriate SLF4J logger adapter in your classpath. For example, if you decide to use Apache log4j, you need to include the SFL4J-Log4J adapter in your classpath. The SLF4J release packages includes adapters for various logging implementations; just download the SLF4J release package and include the appropriate adapter in your classpath (or, when using Maven, set the appropriate dependency); slf4j-log4j12-(version).jar, for example.

One thing to keep in mind when configuring logging is that SLF4J expects only a single logger implementation on the classpath. Thus, you should choose only a single logger. In addition, if parts of your code depend on projects that use other logging frameworks directly, you can include a Legacy Bridge which makes sure calls to the legacy logger get redirected to SLF4J (and from there on, to your logger of choice.

In particular, when working with RDF4J’s HTTPRepository or SPARQLRepository libraries, you may want to include the jcl-over-slf4j legacy bridge. This is because RDF4J internally uses the Apache Commons HttpClient, which relies on JCL – Jakarta Commons Logging. You can do without this if your own app is a webapp, to be deployed in e.g. Tomcat, but otherwise, your application will probably show a lot of debug log messages on standard output, starting with something like:

DEBUG httpclient.wire.header

When you set this up correctly, you can have a single logger configuration for your entire project, and you will be able to control both this kind of logging by third party libraries and by RDF4J itself using this single config.

The RDF4J framework itself does not prescribe a particular logger implementation (after all, that’s the whole point of SLF4J, that you get to choose your preferred logger). However, several of the applications included in RDF4J (such as RDF4J Server, Workbench, and the command line console) do use a logger implementation. The server and console application both use logback, which is the successor to log4j and a native implementation of SLF4J. The Workbench uses java.util.logging instead.

2. The RDF Model API

The RDF Model API is the core of the RDF4J framework. It provides the basic building blocks for manipulating RDF data in Java. In this chapter, we introduce these basic building blocks and show some examples on how to use them.

2.1. RDF Building Blocks: IRIs, literals, blank nodes and statements

The core of the RDF4J framework is the RDF Model API (see the Model API Javadoc), defined in package org.eclipse.rdf4j.model. This API defines how the building blocks of RDF (statements, IRIs, blank nodes, literals, and models) are represented.

RDF statements are represented by the Statement interface. Each Statement has a subject, predicate, object and (optionally) a context (more about contexts below, in the section about the Repository API). Each of these 4 items is a Value. The Value interface is further specialized into Resource, and Literal. Resource represents any RDF value that is either a blank node or a IRI (in fact, it specializes further into IRI and BNode). Literal represents RDF literal values (strings, dates, integer numbers, and so on).

To create new values and statements, we can use a ValueFactory. You can use a default ValueFactory implementation called SimpleValueFactory:

1 2 3 4
import org.eclipse.rdf4j.model.ValueFactory; import org.eclipse.rdf4j.model.impl.SimpleValueFactory; ValueFactory factory = SimpleValueFactory.getInstance();

You can also obtain a ValueFactory from the Repository you are working with, and in fact, this is the recommend approach. More about that in the next section.

Regardless of how you obtain your ValueFactory, once you have it, you can use it to create new URIs, Literals, and Statements:

1 2 3 4
IRI bob = factory.createIRI("http://example.org/bob"); IRI name = factory.createIRI("http://example.org/name"); Literal bobsName = factory.createLiteral("Bob"); Statement nameStatement = factory.createStatement(bob, name, bobsName);

The Model API also provides pre-defined IRIs for several well-known vocabularies, such as RDF, RDFS, OWL, DC (Dublin Core), FOAF (Friend-of-a-Friend), and more. These constants can all be found in the org.eclipse.rdf4j.model.vocabulary package, and can be quite handy in quick creation of RDF statements (or in querying a Repository, as we shall see later):

1
Statement typeStatement = factory.createStatement(bob, RDF.TYPE, FOAF.PERSON);

2.2. The Model interface

The above interfaces and classes show how we can create the individual building blocks that make up an RDF model. However, an actual collection of RDF data is just that: a collection. In order to deal with collections of RDF statements, we can use the org.eclipse.rdf4j.model.Model interface.

Model is an extension of the default Java Collection class java.util.Set<Statement>. This means that you can use a Model like any other Java collection in your code:

1 2 3 4 5 6 7 8 9 10 11
// create a new Model to put statements in Model model = new LinkedHashModel(); // add an RDF statement model.add(typeStatement); // add another RDF statement by simply providing subject, predicate, and object. model.add(bob, name, bobsName); // iterate over every statement in the Model for (Statement statement: model) { ... }

In addition, however, Model offers a number of useful methods to quickly get subsets of statements and otherwise search/filter your collection of statements. For example, to quickly iterate over all statements that make a resource an instance of the class foaf:Person, you can do:

1 2 3
for (Statement typeStatement: model.filter(null, RDF.TYPE, FOAF.PERSON)) { // ... }

Even more convenient is that you can quickly retrieve the building blocks that make up the statements. For example, to immediately iterate over all subject-resources that are of type foaf:Person and then retrieve each person’s name, you can do something like the following:

1 2 3 4
for (Resource person: model.filter(null, RDF.TYPE, FOAF.PERSON).subjects()) { // get the name of the person (if it exists) Optional<Literal> name = Models.objectLiteral(model.filter(person, FOAF.NAME, null)); }

The filter() method returns a Model again. However, the Model returned by this method is still backed by the original Model. Thus, changes that you make to this returned Model will automatically be reflected in the original Model as well.

RDF4J provides two default implementations of the Model interface: org.eclipse.rdf4j.model.impl.LinkedHashModel, and org.eclipse.rdf4j.model.impl.TreeModel. The difference between the two is in their performance for different kinds of lookups and insertion patterns (see their respective javadoc entries for details). These differences are only really noticable when dealing with quite large collections of statements, however.

2.3. Building RDF Models with the ModelBuilder

Since version 2.1, RDF4J provides a ModelBuilder utility. The ModelBuilder provides a fluent API to quickly and efficiently create RDF models programmatically.

Here’s a simple code example that demonstrates how to quickly create an RDF graph with some FOAF data:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
ModelBuilder builder = new ModelBuilder(); // set some namespaces builder.setNamespace("ex", "http://example.org/").setNamespace(FOAF.NS); builder.namedGraph("ex:graph1") // add a new named graph to the model .subject("ex:john") // add several statements about resource ex:john .add(FOAF.NAME, "John") // add the triple (ex:john, foaf:name "John") to the named graph .add(FOAF.AGE, 42) .add(FOAF.MBOX, "john@example.org"); // add a triple to the default graph builder.defaultGraph().add("ex:graph1", RDF.TYPE, "ex:Graph"); // return the Model object Model m = builder.build();

The ModelBuilder offers several conveniences:

  • you can specify a subject/predicate IRI as a prefixed name string (for example “ex:john”), so you don’t have to use a ValueFactory to create an IRI object first.

  • you can add a literal object as a String, an int, or several other supported Java primitive types.

  • the subject() method make it easier to take a resource-centric view when building an RDF Model.

2.4. Quickly accessing data with the Models utility

The Models utility class offers a number of useful methods for convenient access and manipulation of data in a Model object. We have already shown some examples of its use in previous sections. For example, to retrieve the value of the foaf:name properties for all resources of type foaf:Person:

1 2 3 4
for (Resource person: model.filter(null, RDF.TYPE, FOAF.PERSON).subjects()) { // get the name of the person (if it exists) Optional<Literal> name = Models.objectLiteral(model.filter(person, FOAF.NAME, null)); }

The Models.objectLiteral method retrieves an arbitrary object literal value from the statements in the supplied Model. Since the supplied Model is filtered to only contain the foaf:name statements for the given person, the resulting object literal value is the name value for this person. Note that if the model happens to contain more than one name value for this person, this will just return an arbitrary one.

The Models utility provides variants for retrieving different types of object values: Models.object() retrieves a Value, Models.objectResource() a Resource, Models.objectIRI an IRI.

2.4.1. Property-centric access

To provide quicker access to a property’s value(s), the Models class offers some further shortcuts that bypass the need to first filter the Model. For example, to retrieve the name literal, we can replace the objectLiteral call from the previous example like so:

1 2 3 4
for (Resource person: model.filter(null, RDF.TYPE, FOAF.PERSON).subjects()) { // get the name of the person (if it exists) Optional<Literal> name = Models.getPropertyLiteral(model, person, FOAF.NAME); }

Models also provides methods that allow retrieving all values, instead of one arbitrary one:

1 2 3 4
for (Resource person: model.filter(null, RDF.TYPE, FOAF.PERSON).subjects()) { // get all name-values of the person Set<Literal> names = Models.getPropertyLiterals(model, person, FOAF.NAME); }

For both retrieval types, Models also provides variants that retrieve other value types such as IRIs. The Models javadoc is worth exploring for a complete overview of all methods.

In addition to retrieving values in a property-centric manner, Models also provides a setProperty method, which can be used to quickly give a resoure’s property a new value. For example:

1 2
Literal newName = vf.createLiteral("John"); Models.setProperty(person, FOAF.NAME, newName);

This will remove any existing name-properties for the given person, and set it to the single new value "John".

2.5. RDF Collections

To model closed lists of items, RDF provides a Collection vocabulary . RDF Collections are represented as a list of items using a Lisp-like structure. The list starts with a head resource (typically a blank node), which is connected to the first collection member via the rdf:first relation. The head resource is then connected to the rest of the list via an rdf:rest relation. The last resource in the list is marked using the rdf:nil node.

As an example, a list containing three values, “A”, “B”, and “C” looks like this as an RDF Collection:

rdf collection
Figure 1. An RDF Collection containing three items

Here, the blank node _:n1 is the head resource of the list. In this example it is declared an instance of rdf:List, however this is not required for the collection to be considered well-formed. For each collection member, a new node is added (linked to the previous node via the rdf:rest property), and the actual member value is linked to to this node via the rdf:first property. The last member member of the list is marked by the fact that the value of its rdf:rest property is set to `rdf:ni`l.

Working with this kind of structure directly is rather cumbersome. To make life a little easier, the RDF4J API provide several utilities to convert between Java Collections and RDF Collections.

2.5.1. Converting to/from Java Collections

As an example, suppose we wish to add the above list of three string literals as a property value for the property ex:favoriteLetters of ex:John .

The RDFCollections utility allows us to do this, as follows:

1 2 3 4 5 6 7 8 9 10 11 12 13 14
String ns = "http://example.org/"; ValueFactory vf = SimpleValueFactory.getInstance(); // IRI for ex:favoriteLetters IRI favoriteLetters = vf.createIRI(ns, "favoriteLetters"); // IRI for ex:John IRI john = vf.createIRI(ns, "John"); // create a list of letters List<Literal> letters = Arrays.asList(new Literal[] { vf.createLiteral("A"), vf.createLiteral("B"), vf.createLiteral("C") }); // create a head resource for our list Resource head = vf.createBNode(); // convert our list and add it to a newly-created Model Model aboutJohn = RDFCollections.asRDF(letters, head, new LinkedHashModel()); // set the ex:favoriteLetters property to link to the head of the list aboutJohn.add(john, favoriteLetters, head);

Of course, we can also convert back:

1 2 3 4 5 6 7 8 9
Model aboutJohn = ... ; // our Model about John // get the value of the ex:favoriteLetters property Resource node = Models.objectResource(aboutJohn.filter(john, favoriteLetters, null)).orElse(null); // Convert its collection back to an ArrayList of values if(node != null) { List<Value> values = RDFCollections.asValues(aboutJohn, node, new ArrayList<Value>()); // you may need to cast back to Literal. Literal a = (Literal)values.get(0); }

2.5.2. Extracting, copying, or deleting an RDF Collection

To extract an RDF Collection from the model which contains it, we can do the following:

1 2 3 4 5 6 7
Model aboutJohn = ...; // our model // get the value of the ex:favoriteLetters property Resource node = Models.objectResource(aboutJohn.filter(john, favoriteLetters, null)).orElse(null); // get the RDF Collection in a separate model if (node != null) { Model rdfList = RDFCollections.getCollection(aboutJohn, node, new LinkedHashModel()); }

As you can see, instead of converting the RDF Collection to a Java List of values, we get back another Model object from this, containing a copy of the RDF statements that together form the RDF Collection. This is useful in cases where your original Model contains more data than just the RDF Collection, and you want to isolate the collection.

Once you have this copy of your Collection, you can use it to add it somewhere else, or to remove the collection from your Model:

1 2 3 4
// remove the collection from our model about John aboutJohn.removeAll(rdfList); // finally remove the triple that linked John to the collection aboutJohn.remove(john, favoriteLetters, node);

Actually, deleting can be done more efficiently than this. Rather than first creating a completely new copy of the RDF Collection only to then delete it, we can use a streaming approach instead:

1 2 3 4
// extract the collection from our model in streaming fashion and remove each statement from the model RDFCollections.extract(aboutJohn, node, st -> aboutJohn.remove(st)); // remove the statement that linked john to the collection aboutJohn.remove(john, favoriteLetters, node);

3. The Repository API

The Repository API is the central access point for RDF4J-compatible RDF databases (a.k.a. triplestores), as well as for SPARQL endpoints. Its purpose is to give a developer-friendly access point to RDF repositories, offering various methods for querying and updating the data, while hiding a lot of the nitty gritty details of the underlying machinery.

The interfaces for the Repository API can be found in package org.eclipse.rdf4j.repository. Several implementations for these interface exist in various sub-packages.

3.1. Creating a Repository object

The first step in any action that involves repositories is to create a Repository for it.

The central interface of the repository API is the Repository interface. There are several implementations available of this interface. The three main ones are:

  • SailRepository is a Repository that operates directly on top of a Sail - that is a particular database. This is the class most commonly used when accessing/creating a local RDF4J repository. SailRepository operates on a (stack of) Sail object(s) for storage and retrieval of RDF data. An important thing to remember is that the behaviour of a repository is determined by the Sail(s) that it operates on; for example, the repository will only support RDF Schema or OWL semantics if the Sail stack includes an inferencer for this.

  • HTTPRepository is, as the name implies, a Repository implementation that acts as a proxy to a repository available on a remote RDF4J Server, accessible through HTTP.

  • SPARQLRepository is a Repository implementation that acts as a proxy to any remote SPARQL endpoint (whether that endpoint is implemented using RDF4J or not).

Creating Repository objects can be done in multiple ways. We will first show an easy way to quickly create such an object ‘on the fly’. In the section about The RepositoryManager and RepositoryProvider, we show some more advanced patterns, which are particularly useful in larger applications which have to handle and share references to multiple repositories.

We will first take a look at the use of the SailRepository class in order to create and use a local repository.

3.1.1. Creating a main memory RDF Repository

One of the simplest configurations is a repository that just stores RDF data in main memory without applying any inferencing. This is also by far the fastest type of repository that can be used. The following code creates and initializes a non-inferencing main-memory repository:

1 2 3 4 5 6
import org.eclipse.rdf4j.repository.Repository; import org.eclipse.rdf4j.repository.sail.SailRepository; import org.eclipse.rdf4j.sail.memory.MemoryStore; ... Repository repo = new SailRepository(new MemoryStore()); repo.initialize();

The constructor of the SailRepository class accepts any object of type Sail, so we simply pass it a new main-memory store object (which is, of course, a Sail implementation). Following this, the repository needs to be initialized to prepare the Sail(s) that it operates on.

The repository that is created by the above code is volatile: its contents are lost when the object is garbage collected or when your Java program is shut down. This is fine for cases where, for example, the repository is used as a means for manipulating an RDF model in memory.

Different types of Sail objects take parameters in their constructor that change their behaviour. The MemoryStore for example takes a data directory parameter that specifies a data directory for persisent storage. If specified, the MemoryStore will write its contents to this directory so that it can restore it when it is re-initialized in a future session:

1 2 3
File dataDir = new File("C:\\temp\\myRepository\\"); Repository repo = new SailRepository( new MemoryStore(dataDir) ); repo.initialize();

As you can see, we can fine-tune the configuration of our repository by passing parameters to the constructor of the Sail object. Some Sail types may offer additional configuration methods, all of which need to be called before the repository is initialized. The MemoryStore currently has one such method: setSyncDelay(long), which can be used to control the strategy that is used for writing to the data file, e.g.:

1 2 3 4 5
File dataDir = new File("C:\\temp\\myRepository\\"); MemoryStore memStore = new MemoryStore(dataDir); memStore.setSyncDelay(1000L); Repository repo = new SailRepository(memStore); repo.initialize();

3.1.2. Creating a Native RDF Repository

A Native RDF Repository does not keep its data in main memory, but instead stores it directly to disk (in a binary format optimized for compact storage and fast retrieval). It is an efficient, scalable and fast solution for RDF storage of datasets that are too large to keep entirely in memory.

The code for creation of a Native RDF repository is almost identical to that of a main memory repository:

1 2 3 4 5 6 7
import org.eclipse.rdf4j.repository.Repository; import org.eclipse.rdf4j.repository.sail.SailRepository; import org.eclipse.rdf4j.sail.nativerdf.NativeStore; ... File dataDir = new File("/path/to/datadir/"); Repository repo = new SailRepository(new NativeStore(dataDir)); repo.initialize();

By default, the Native store creates a set of two indexes (see section-native-store-config). To configure which indexes it should create, we can either use the NativeStore.setTripleIndexes(String) method, or we can directly supply a index configuration string to the constructor:

1 2 3 4 5 6 7 8
import org.eclipse.rdf4j.repository.Repository; import org.eclipse.rdf4j.repository.sail.SailRepository; import org.eclipse.rdf4j.sail.nativerdf.NativeStore; ... File dataDir = new File("/path/to/datadir/"); String indexes = "spoc,posc,cosp"; Repository repo = new SailRepository(new NativeStore(dataDir, indexes)); repo.initialize();

3.1.3. Creating a repository with RDF Schema inferencing

As we have seen, we can create Repository objects for any kind of back-end store by passing them a reference to the appropriate Sail object. We can pass any stack of Sails this way, allowing all kinds of repository configurations to be created quite easily. For example, to stack an RDF Schema inferencer on top of a memory store, we simply create a repository like so:

1 2 3 4 5 6 7 8 9
import org.eclipse.rdf4j.repository.Repository; import org.eclipse.rdf4j.repository.sail.SailRepository; import org.eclipse.rdf4j.sail.memory.MemoryStore; import org.eclipse.rdf4j.sail.inferencer.fc.ForwardChainingRDFSInferencer; ... Repository repo = new SailRepository( new ForwardChainingRDFSInferencer( new MemoryStore())); repo.initialize();

Each layer in the Sail stack is created by a constructor that takes the underlying Sail as a parameter. Finally, we create the SailRepository object as a functional wrapper around the Sail stack.

The ForwardChainingRDFSInferencer that is used in this example is a generic RDF Schema inferencer; it can be used on top of any Sail that supports the methods it requires. Both MemoryStore and NativeStore support these methods. However, a word of warning: the RDF4J inferencers add a significant performance overhead when adding and removing data to a repository, an overhead that gets progressively worse as the total size of the repository increases. For small to medium-sized datasets it peforms fine, but for larger datasets you are advised not to use it and to switch to alternatives.

3.1.4. Creating a Repository with a Custom Inferencing Rule

The previous subsection showed how to use the built-in RDF schema inferencer. This subsection will briefly show how to create a repository capable of performing inferences according to a custom rule that you provide.

1 2 3 4 5 6 7 8 9 10 11 12 13
import org.eclipse.rdf4j.query.QueryLanguage; import org.eclipse.rdf4j.repository.Repository; import org.eclipse.rdf4j.repository.sail.SailRepository; import org.eclipse.rdf4j.sail.memory.MemoryStore; import org.eclipse.rdf4j.sail.inferencer.fc.CustomGraphQueryInferencer; ... String pre = "PREFIX : <http://foo.org/bar#>\n"; String rule = pre + "CONSTRUCT { ?p :relatesTo :Cryptography } WHERE " + "{ { :Bob ?p :Alice } UNION { :Alice ?p :Bob } }"; String match = pre + "CONSTRUCT { ?p :relatesTo :Cryptography } " + "WHERE { ?p :relatesTo :Cryptography }"; Repository repo = new SailRepository(new CustomGraphQueryInferencer( new MemoryStore(), QueryLanguage.SPARQL, rule, match));

Here is a data sample (given in the popular Turtle format) that serves to illustrate this example:

@prefix : <http://foo.org/bar#> .
:Bob   :exchangesKeysWith :Alice .
:Alice :sendsMessageTo    :Bob .

If the above data is loaded into the repository, the repository will also automatically have the folliwng inferred statements:

@prefix : <http://foo.org/bar#> .
:exchangesKeysWith :relatesTo :Cryptography .
:sendsMessageTo    :relatesTo :Cryptography .

The SPARQL graph query in ‘rule’ defines a pattern to search on, and the inferred statements to add to the repository.

The graph query in ‘match’ is needed to decide what inferred statements already exist that may need to be removed when the normal repository contents change. For example, if the first sample data statement was removed, then the inference layer will automatically remove the inferred statement regarding :exchangesKeysWith.

In simple rule cases, such as this one, an empty string could have been provided for ‘match’ instead, and the correct matcher query would have been deduced.

3.1.5. Accessing a server-side repository

Working with remote repositories is just as easy as working with local ones. We can simply use a different Repository object, the HTTPRepository, instead of the SailRepository class.

A requirement is of course that there is a RDF4J Server running on some remote system, which is accessible over HTTP. For example, suppose that at http://example.org/rdf4j-server/ a RDF4J Server is running, which has a repository with the identification ‘example-db’. We can access this repository in our code as follows:

1 2 3 4 5 6 7
import org.eclipse.rdf4j.repository.Repository; import org.eclipse.rdf4j.repository.http.HTTPRepository; ... String rdf4jServer = "http://example.org/rdf4j-server/"; String repositoryID = "example-db"; Repository repo = new HTTPRepository(rdf4jServer, repositoryID); repo.initialize();

3.1.6. Accessing a SPARQL endpoint

We can use the Repository interface to access any SPARQL endpoint as well. This is done as follows:

1 2 3 4 5 6
import org.eclipse.rdf4j.repository.Repository; import org.eclipse.rdf4j.repository.sparql.SPARQLRepository; ... String sparqlEndpoint = "http://example.org/sparql"; Repository repo = new SPARQLRepository(sparqlEndpoint); repo.initialize();

After you have done this, you can query the SPARQL endpoint just as you would any other type of Repository.

3.1.7. The RepositoryManager and RepositoryProvider

Using what we’ve seen in the previous section, we can easily create and use various different types of repositories. However, when developing an application in which you have to keep track of several repositories, sharing references to these repositories between different parts of your code can quickly become complex. Ideal would be one central location where all information on the repositories in use (including id, type, directory for persistent data storage, etc.) is kept. This is the role of the RepositoryManager and RepositoryProvider.

Using the RepositoryManager for handling repository creation and administration offers a number of advantages, including:

  • a single RepositoryManager object can be more easily shared throughout your application than a host of static references to individual repositories;

  • you can more easily create and manage repositories ‘on-the-fly’, for example if your application requires creation of new repositories on user input;

  • the RepositoryManager stores your configuration, including all repository data, in one central spot on the file system.

The RepositoryManager comes in two flavours: the LocalRepositoryManager and the RemoteRepositoryManager.

A LocalRepositoryManager manages repository handling for you locally, and is always created using a (local) directory. This directory is where all repositories handled by the manager store their data, and also where the LocalRepositoryManager itself stores its configuration data.

You create a new LocalRepositoryManager as follows:

1 2 3 4 5
import java.io.File; import org.eclipse.rdf4j.repository.manager.LocalRepositoryManager; File baseDir = new File("/path/to/storage/dir/"); LocalRepositoryManager manager = new LocalRepositoryManager(baseDir); manager.initialize();

To use a LocalRepositoryManager to create and manage repositories is slightly different from what we’ve seen before about creating repositories. The LocalRepositoryManager works by providing it with RepositoryConfig objects, which are declarative specifications of the repository you want. You add a RepositoryConfig object for your new repository, and then request the actual Repository back from the LocalRepositoryManager:

1 2 3 4 5 6 7
import org.eclipse.rdf4j.repository.config.RepositoryConfig; String repositoryId = "test-db"; RepositoryConfig repConfig = new RepositoryConfig(repositoryId, repositoryTypeSpec); manager.addRepositoryConfig(repConfig); Repository repository = manager.getRepository(repositoryId);

In the above bit of code, you may have noticed that I provide an innocuous-looking variable called repositoryTypeSpec to the constructor of our RepositoryConfig. This variable is an instance of a class called RepositoryImplConfig, and this specifies the actual configuration of our new repository: what backends to use, whether or not to use inferencing, and so on.

Creating a RepositoryImplConfig object can be done in two ways: programmatically, or by reading a (RDF) config file. Here, we will show the programmatic way.

1 2 3 4 5 6 7 8 9 10
import org.eclipse.rdf4j.sail.config.SailImplConfig; import org.eclipse.rdf4j.sail.memory.config.MemoryStoreConfig; import org.eclipse.rdf4j.repository.config.RepositoryImplConfig; import org.eclipse.rdf4j.repository.sail.config.SailRepositoryConfig; // create a configuration for the SAIL stack SailImplConfig backendConfig = new MemoryStoreConfig(); // create a configuration for the repository implementation RepositoryImplConfig repositoryTypeSpec = new SailRepositoryConfig(backendConfig);

As you can see, we use a class called MemoryStoreConfig for specifying the type of storage backend we want. This class resides in a config sub-package of the memory store package (org.eclipse.rdf4j.sail.memory). Each particular type of SAIL in RDF4J has such a config class.

As a second example, we create a slightly more complex type of store: still in-memory, but this time we want it to use the memory store’s persistence option, and we also want to add RDFS inferencing. In RDF4J, RDFS inferencing is provided by a separate SAIL implementation, which can be ‘stacked’ on top of another SAIL. We follow that pattern in the creation of our config object:

1 2 3 4 5 6 7 8 9 10 11
import org.eclipse.rdf4j.sail.inferencer.fc.config.ForwardChainingRDFSInferencerConfig; // create a configuration for the SAIL stack boolean persist = true; SailImplConfig backendConfig = new MemoryStoreConfig(persist); // stack an inferencer config on top of our backend-config backendConfig = new ForwardChainingRDFSInferencerConfig(backendConfig); // create a configuration for the repository implementation SailRepositoryConfig repositoryTypeSpec = new SailRepositoryConfig(backendConfig);

3.1.8. The RemoteRepositoryManager

A useful feature of RDF4J is that most its APIs are transparent with respect to whether you are working locally or remote. This is the case for the RDF4J repositories, but also for the RepositoryManager. In the above examples, we have used a LocalRepositoryManager, creating repositories for local use. However, it is also possible to use a RemoteRepositoryManager, using it to create and manage repositories residing on a remotely running RDF4J Server.

A RemoteRepositoryManager is initialized as follows:

1 2 3 4 5 6
import org.eclipse.rdf4j.repository.manager.RemoteRepositoryManager; // URL of the remote RDF4J Server we want to access String serverUrl = "http://localhost:8080/rdf4j-server"; RemoteRepositoryManager manager = new RemoteRepositoryManager(serverUrl); manager.initialize();

Once initialized, the RemoteRepositoryManager can be used in the same fashion as the LocalRepositoryManager: creating new repositories, requesting references to existing repositories, and so on.

3.1.9. Sharing Managers with the RepositoryProvider

Finally, RDF4J also includes a RepositoryProvider class. This is a utility class that holds static references to RepositoryManagers, making it easy to share Managers (and the repositories they contain) across your application. In addition, the RepositoryProvider also has a built-in shutdown hook, which makes sure all repositories managed by it are shut down when the JVM exits.

To obtain a RepositoryManager from a RepositoryProvider you invoke it with the location you want a RepositoryManager for. If you provide a HTTP url, it will automatically return a RemoteRepositoryManager, and if you provide a local file URL, it will be a LocalRepositoryManager.

1 2 3
import org.eclipse.rdf4j.repository.manager.RepositoryProvider; String url = "http://localhost:8080/rdf4j-server"; RepositoryManager manager = RepositoryProvider.getRepositoryManager(url);

The RepositoryProvider creates and keeps a singleton instance of RepositoryManager for each distinct location you specify, which means that you invoke the above call in several places in your code without having to worry about creating duplicate manager objects.

3.1.10. Creating a Federation

It is possible to create a virtual repository that is a federation of existing repositories. The following code illustrates how to use the RepositoryManagerFederator class to create a federation. It assumes you already have a reference to a RepositoryManager instance, and is a simplified form of what the RDF4J Console runs when its federate command is invoked:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
void federate(RepositoryManager manager, String fedID, String description, Collection<String> memberIDs, boolean readonly, boolean distinct) throws MalformedURLException, RDF4JException { if (manager.hasRepositoryConfig(fedID)) { System.err.println(fedID + " already exists."); } else if (validateMembers(manager, readonly, memberIDs)) { RepositoryManagerFederator rmf = new RepositoryManagerFederator(manager); rmf.addFed(fedID, description, memberIDs, readonly, distinct); System.out.writeln("Federation created."); } } boolean validateMembers(RepositoryManager manager, boolean readonly, Collection<String> memberIDs) throws RDF4JException { boolean result = true; for (String memberID : memberIDs) { if (manager.hasRepositoryConfig(memberID)) { if (!readonly) { if (!manager.getRepository(memberID).isWritable()) { result = false; System.err.println(memberID + " is read-only."); } } } else { result = false; System.err.println(memberID + " does not exist."); } } return result; }

3.2. Using a repository: RepositoryConnections

Now that we have created a Repository, we want to do something with it. In RDF4J, this is achieved through the use of RepositoryConnection objects, which can be created by the Repository.

A RepositoryConnection represents – as the name suggests – a connection to the actual store. We can issue operations over this connection, and close it when we are done to make sure we are not keeping resources unnnecessarily occupied.

In the following sections, we will show some examples of basic operations.

3.2.1. Adding RDF to a repository

The Repository API offers various methods for adding data to a repository. Data can be added by specifying the location of a file that contains RDF data, and statements can be added individually or in collections.

We perform operations on a repository by requesting a RepositoryConnection from the repository. On this RepositoryConnection object we can perform various operations, such as query evaluation, getting, adding, or removing statements, etc.

The following example code adds two files, one local and one available through HTTP, to a repository:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
import org.eclipse.rdf4j.RDF4JException; import org.eclipse.rdf4j.repository.Repository; import org.eclipse.rdf4j.repository.RepositoryConnection; import org.eclipse.rdf4j.rio.RDFFormat; import java.io.File; import java.net.URL; ... File file = new File("/path/to/example.rdf"); String baseURI = "http://example.org/example/local"; try { RepositoryConnection con = repo.getConnection(); try { con.add(file, baseURI, RDFFormat.RDFXML); URL url = new URL("http://example.org/example/remote.rdf"); con.add(url, url.toString(), RDFFormat.RDFXML); } finally { con.close(); } } catch (RDF4JException e) { // handle exception } catch (java.io.IOEXception e) { // handle io exception }

As you can see, the above code does very explicit exception handling and makes sure resources are properly closed when we are done. A lot of this can be simplified. RepositoryConnection implements AutoCloseable, so a first simple change is to use a try-with-resources construction for handling proper opening and closing of the RepositoryConnection:

1 2 3 4 5 6 7 8 9 10 11 12 13 14
File file = new File("/path/to/example.rdf"); String baseURI = "http://example.org/example/local"; try (RepositoryConnection con = repo.getConnection()) { con.add(file, baseURI, RDFFormat.RDFXML); URL url = new URL("http://example.org/example/remote.rdf"); con.add(url, url.toString(), RDFFormat.RDFXML); } catch (RDF4JException e) { // handle exception. This catch-clause is // optional since RDF4JException is an unchecked exception } catch (java.io.IOEXception e) { // handle io exception }

More information on other available methods can be found in the javadoc reference of the RepositoryConnection interface.

3.2.2. Querying a repository

The Repository API has a number of methods for creating and evaluating queries. Three types of queries are distinguished: tuple queries, graph queries and boolean queries. The query types differ in the type of results that they produce.

The result of a tuple query is a set of tuples (or variable bindings), where each tuple represents a solution of a query. This type of query is commonly used to get specific values (URIs, blank nodes, literals) from the stored RDF data. SPARQL SELECT queries are tuple queries.

The result of graph queries is an RDF graph (or set of statements). This type of query is very useful for extracting sub-graphs from the stored RDF data, which can then be queried further, serialized to an RDF document, etc. SPARQL CONSTRUCT and DESCRIBE queries are graph queries.

The result of boolean queries is a simple boolean value, i.e. true or false. This type of query can be used to check if a repository contains specific information. SPARQL ASK queries are boolean queries.

3.2.3. Evaluating a tuple query

To evaluate a tuple query we can do the following:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
import java.util.List; import org.eclipse.rdf4j.RDF4JException; import org.eclipse.rdf4j.repository.RepositoryConnection; import org.eclipse.rdf4j.query.TupleQuery; import org.eclipse.rdf4j.query.TupleQueryResult; import org.eclipse.rdf4j.query.BindingSet; import org.eclipse.rdf4j.query.QueryLanguage; ... try (RepositoryConnection conn = repo.getConnection()) { String queryString = "SELECT ?x ?y WHERE { ?x ?p ?y } "; TupleQuery tupleQuery = con.prepareTupleQuery(QueryLanguage.SPARQL, queryString); TupleQueryResult result = tupleQuery.evaluate(); try (TupleQueryResult result = tupleQuery.evaluate()) { while (result.hasNext()) { // iterate over the result BindingSet bindingSet = result.next(); Value valueOfX = bindingSet.getValue("x"); Value valueOfY = bindingSet.getValue("y"); // do something interesting with the values here... } } }

This evaluates a SPARQL SELECT query and returns a TupleQueryResult, which consists of a sequence of BindingSet objects. Each BindingSet contains a set of Binding objects. A binding is pair relating a variable name (as used in the query’s SELECT clause) with a value.

As you can see, we use the TupleQueryResult to iterate over all results and get each individual result for x and y. We retrieve values by name rather than by an index. The names used should be the names of variables as specified in your query (note that we leave out the ‘?’ or ‘$’ prefixes used in SPARQL). The TupleQueryResult.getBindingNames() method returns a list of binding names, in the order in which they were specified in the query. To process the bindings in each binding set in the order specified by the projection, you can do the following:

1 2 3 4 5 6 7
List<String> bindingNames = result.getBindingNames(); while (result.hasNext()) { BindingSet bindingSet = result.next(); Value firstValue = bindingSet.getValue(bindingNames.get(0)); Value secondValue = bindingSet.getValue(bindingNames.get(1)); // do something interesting with the values here... }

Finally, it is important to make sure that both the TupleQueryResult and the RepositoryConnection are properly closed after we are done with them. A TupleQueryResult evaluates lazily and keeps resources (such as connections to the underlying database) open. Closing the TupleQueryResult frees up these resources. You can either expliclty invoke close() in the finally clause, or use a try-with-resources construction (as shown in the above examples) to let Java itself handle proper closing for you. In the following code examples, we will use both ways to handle both result and connection closure interchangeably.

As said: a TupleQueryResult evaluates lazily, and keeps an open connection to the data source while being processed. If you wish to quickly materialize the full query result (for example, convert it to a Java List) and then close the TupleQueryResult, you can do something like this:

1 2 3 4
List<BindingSet> resultList; try (TupleQueryResult result = tupleQuery.evaluate()) { resultList = QueryResults.asList(result); }

3.2.4. Doing a tuple query in a single line of code: the Repositories utility

RDF4J provides a convenience utility class org.eclipse.rdf4j.repository.util.Repositories, which allows us to significantly shorten our boilerplate code. In particular, the Repositories utility allows us to do away with opening/closing a RepositoryConnection completely. For example, to open a connection, create and evaluate a SPARQL SELECT query, and then put that query’s result in a list, we can do the following:

1 2
List<BindingSet> results = Repositories.tupleQuery(rep, "SELECT * WHERE {?s ?p ?o }", r -> QueryResults.asList(r));

As you can see, we make use of so-called Lambda expressions to process the result. In this particular example, the only processing we do is to convert the TupleQueryResult object into a List. However, you can supply any kind of function to this interface to fully customize the processing that you do on the result.

3.2.5. Using TupleQueryResultHandlers

You can also directly process the query result by supplying a TupleQueryResultHandler to the query’s evaluate() method. The main difference is that when using a return object, the caller has control over when the next answer is retrieved (namely, whenever next() is called), whereas with the use of a handler, the connection pushes answers to the handler object as soon as it has them available.

As an example we will use SPARQLResultsCSVWriter to directly write the query result to the console. SPARQLResultsCSVWriter is a TupleQueryResultHandler implementation that writes SPARQL Results as comma-separated values.

1 2
String queryString = "SELECT * WHERE {?x ?p ?y }"; con.prepareTupleQuery(queryString).evaluate(new SPARQLResultsCSVWriter(System.out));

RDF4J provides a number of standard implementations of TupleQueryResultHandler, and of course you can also supply your own application-specific implementation. Have a look in the Javadoc for more details.

3.2.6. Evaluating a graph query

The following code evaluates a graph query on a repository:

1 2
import org.eclipse.rdf4j.query.GraphQueryResult; GraphQueryResult graphResult = con.prepareGraphQuery("CONSTRUCT { ?s ?p ?o } WHERE {?s ?p ?o }".evaluate();

A GraphQueryResult is similar to TupleQueryResult in that is an object that iterates over the query solutions. However, for graph queries the query solutions are RDF statements, so a GraphQueryResult iterates over Statement objects:

1 2 3 4
while (graphResult.hasNext()) { Statement st = graphResult.next(); // ... do something with the resulting statement here. }

You can also quickly turn a GraphQueryResult into a Model (that is, a Java Collection of statements), by using the org.eclipse.rdf4j.query.QueryResults utility class:

1
Model resultModel = QueryResults.asModel(graphQueryResult);

3.2.7. Doing a graph query in a single line of code

Similarly to how we do this with SELECT queries, we can use the Repositories utility to obtain a result from a SPARQL CONSTRUCT (or DESCRIBE) query in a single line of Java code:

1
Model m = Repositories.graphQuery(rep, "CONSTRUCT WHERE {?s ?p ?o}", r -> QueryResults.asModel(r));

3.2.8. Using RDFHandlers

For graph queries, we can supply an org.eclipse.rdf4j.rio.RDFHandler to the evaluate() method. Again, this is a generic interface, each object implementing it can process the reported RDF statements in any way it wants.

All Rio writers (such as the RDFXmlWriter, TurtleWriter, TriXWriter, etc.) implement the RDFHandler interface. This allows them to be used in combination with querying quite easily. In the following example, we use a TurtleWriter to write the result of a SPARQL graph query to standard output in Turtle format:

1 2 3 4 5 6 7 8
import org.eclipse.rdf4j.rio.Rio; import org.eclipse.rdf4j.rio.RDFFormat; import org.eclipse.rdf4j.rio.RDFWriter; try (RepositoryConnection conn = repo.getConnection()) { RDFWriter writer = Rio.createWriter(RDFFormat.TURTLE, System.out); conn.prepareGraphQuery(QueryLanguage.SPARQL, "CONSTRUCT {?s ?p ?o } WHERE {?s ?p ?o } ").evaluate(writer); }

Note that in the above code we use the org.eclipse.rdf4j.rio.Rio utility to quickly create a writer of the desired format. The Rio utility offers a lot of useful functions to quickly create writers and parser for various formats.

3.2.9. Preparing and Reusing Queries

In the previous sections we have simply created a query from a string and immediately evaluated it. However, the prepareTupleQuery and prepareGraphQuery methods return objects of type Query, specifically TupleQuery and GraphQuery.

A Query object, once created, can be (re)used. For example, we can evaluate a Query object, then add some data to our repository, and evaluate the same query again.

The Query object also has a setBinding() method, which can be used to specify specific values for query variables. As a simple example, suppose we have a repository containing names and e-mail addresses of people, and we want to do a query for each person, retrieve his/her e-mail address, for example, but we want to do a separate query for each person. This can be achieved using the setBinding() functionality, as follows:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
try (RepositoryConnection con = repo.getConnection()){ // First, prepare a query that retrieves all names of persons TupleQuery nameQuery = con.prepareTupleQuery("SELECT ?name WHERE { ?person ex:name ?name . }"); // Then, prepare another query that retrieves all e-mail addresses of persons: TupleQuery mailQuery = con.prepareTupleQuery("SELECT ?mail WHERE { ?person ex:mail ?mail ; ex:name ?name . }"); // Evaluate the first query to get all names try (TupleQueryResult nameResult = nameQuery.evaluate()){ // Loop over all names, and retrieve the corresponding e-mail address. while (nameResult.hasNext()) { BindingSet bindingSet = nameResult.next(); Value name = bindingSet.get("name"); // Retrieve the matching mailbox, by setting the binding for // the variable 'name' to the retrieved value. Note that we // can set the same binding name again for each iteration, it will // overwrite the previous setting. mailQuery.setBinding("name", name); try ( TupleQueryResult mailResult = mailQuery.evaluate()) { // mailResult now contains the e-mail addresses for one particular person .... } } } }

The values with which you perform the setBinding operation of course do not necessarily have to come from a previous query result (as they do in the above example). Using a ValueFactory you can create your own value objects. You can use this functionality to, for example, query for a particular keyword that is given by user input:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
ValueFactory factory = myRepository.getValueFactory(); // In this example, we specify the keyword string. Of course, this // could just as easily be obtained by user input, or by reading from // a file, or... String keyword = "foobar"; // We prepare a query that retrieves all documents for a keyword. // Notice that in this query the 'keyword' variable is not bound to // any specific value yet. TupleQuery keywordQuery = con.prepareTupleQuery("SELECT ?document WHERE { ?document ex:keyword ?keyword . }"); // Then we set the binding to a literal representation of our keyword. // Evaluation of the query object will now effectively be the same as // if we had specified the query as follows: // SELECT ?document WHERE { ?document ex:keyword "foobar". } keywordQuery.setBinding("keyword", factory.createLiteral(keyword)); // We then evaluate the prepared query and can process the result: TupleQueryResult keywordQueryResult = keywordQuery.evaluate();

3.2.10. Creating, retrieving, removing individual statements

The RepositoryConnection can also be used for adding, retrieving, removing or otherwise manipulating individual statements, or sets of statements.

To be able to add new statements, we can use a ValueFactory to create the Values out of which the statements consist. For example, we want to add a few statements about two resources, Alice and Bob:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
import org.eclipse.rdf4j.model.vocabulary.RDF; import org.eclipse.rdf4j.model.vocabulary.RDFS; ... ValueFactory f = myRepository.getValueFactory(); // create some resources and literals to make statements out of IRI alice = f.createIRI("http://example.org/people/alice"); IRI bob = f.createIRI("http://example.org/people/bob"); IRI name = f.createIRI("http://example.org/ontology/name"); IRI person = f.createIRI("http://example.org/ontology/Person"); Literal bobsName = f.createLiteral("Bob"); Literal alicesName = f.createLiteral("Alice"); try ( RepositoryConnection con = myRepository.getConnection()) { // alice is a person conn.add(alice, RDF.TYPE, person); // alice's name is "Alice" conn.add(alice, name, alicesName); // bob is a person conn.add(bob, RDF.TYPE, person); // bob's name is "Bob" conn.add(bob, name, bobsName); }

Of course, it will not always be necessary to use a ValueFactory to create IRIs. In practice, you will find that you quite often retrieve existing IRIs from the repository (for example, by evaluating a query) and then use those values to add new statements. Also, for several well-knowns vocabularies we can simply reuse the predefined constants found in the org.eclipse.rdf4j.model.vocabulary package, and using the ModelBuilder utility you can very quickly create collections of statements without ever touching a ValueFactory.

Retrieving statements works in a very similar way. One way of retrieving statements we have already seen actually: we can get a GraphQueryResult containing statements by evaluating a graph query. However, we can also use direct method calls to retrieve (sets of) statements. For example, to retrieve all statements about Alice, we could do:

1
RepositoryResult<Statement> statements = con.getStatements(alice, null, null);

Similarly to the TupleQueryResult object and other types of query results, the RepositoryResult is an iterator-like object that lazily retrieves each matching statement from the repository when its next() method is called. Note that, like is the case with QueryResult objects, iterating over a RepositoryResult may result in exceptions which you should catch to make sure that the RepositoryResult is always properly closed after use:

1 2 3 4 5 6 7 8 9 10
RepositoryResult<Statement> statements = con.getStatements(alice, null, null, true); try { while (statements.hasNext()) { Statement st = statements.next(); ... // do something with the statement } } finally { statements.close(); // make sure the result object is closed properly }

Or alternatively, using try-with-resources:

1 2 3 4 5 6
try (RepositoryResult<Statement> statements = con.getStatements(alice, null, null, true)) { while (statements.hasNext()) { Statement st = statements.next(); ... // do something with the statement } }

In the above getStatements() invocation, we see four parameters being passed. The first three represent the subject, predicate and object of the RDF statements which should be retrieved. A null value indicates a wildcard, so the above method call retrieves all statements which have as their subject Alice, and have any kind of predicate and object. The optional fourth parameter indicates whether or not inferred statements should be included or not (you can leave this parameter out, in which case it defaults to ‘true’).

Removing statements again works in a very similar fashion. Suppose we want to retract the statement that the name of Alice is “Alice”):

1
con.remove(alice, name, alicesName);

Or, if we want to erase all statements about Alice completely, we can do:

1
con.remove(alice, null, null);

3.2.11. Using named graphs/context

RDF4J supports the notion of context, which you can think of as a way to group sets of statements together through a single group identifier (this identifier can be a blank node or a URI).

A very typical way to use context is tracking provenance of the statements in a repository, that is, which file these statements originate from. For example, consider an application where you add RDF data from different files to a repository, and then one of those files is updated. You would then like to replace the data from that single file in the repository, and to be able to do this you need a way to figure out which statements need to be removed. The context mechanism gives you a way to do that.

Another typical use case is to support named graphs: in the SPARQL query language, named graphs can be queried as subsets of the dataset over which the query is evaluated. In RDF4J, named graphs are implemented via the context mechanism. This means that if you put data in RDF4J in a context, you can query that context as a named graph in SPARQL.

We will start by showing some simple examples of using context in the API. In the following example, we add an RDF document from the Web to our repository, in a context. In the example, we make the context identifier equal to the Web location of the file being uploaded.

1 2 3 4 5
String location = "http://example.org/example/example.rdf"; String baseURI = location; URL url = new URL(location); URI context = f.createURI(location); conn.add(url, baseURI, RDFFormat.RDFXML, context);

We can now use the context mechanism to specifically address these statements in the repository for retrieve and remove operations:

1 2 3 4 5 6 7 8 9 10 11 12
// Get all statements in the context try (RepositoryResult<Statement> result = conn.getStatements(null, null, null, context)) { while (result.hasNext()) { Statement st = result.next(); ... // do something interesting with the result } } // Export all statements in the context to System.out, in RDF/XML format RDFHandler writer = Rio.createWriter(RDFFormat.RDFXML, System.out); conn.export(context, writer); // Remove all statements in the context from the repository conn.clear(context);

In most methods in the Repository API, the context parameter is a vararg, meaning that you can specify an arbitrary number (zero, one, or more) of context identifiers. This way, you can combine different contexts together. For example, we can very easily retrieve statements that appear in either ‘context1’ or ‘context2’.

In the following example we add information about Bob and Alice again, but this time each has their own context. We also create a new property called ‘creator’ that has as its value the name of the person who is the creator a particular context. The knowledge about creators of contexts we do not add to any particular context, however:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
IRI context1 = f.createIRI("http://example.org/context1"); IRI context2 = f.createIRI("http://example.org/context2"); IRI creator = f.createIRI("http://example.org/ontology/creator"); // Add stuff about Alice to context1 conn.add(alice, RDF.TYPE, person, context1); conn.add(alice, name, alicesName, context1); // Alice is the creator of context1 conn.add(context1, creator, alicesName); // Add stuff about Bob to context2 conn.add(bob, RDF.TYPE, person, context2); conn.add(bob, name, bobsName, context2); // Bob is the creator of context2 conn.add(context2, creator, bobsName);

Once we have this information in our repository, we can retrieve all statements about either Alice or Bob by using the context vararg:

1 2 3
// Get all statements in either context1 or context2 RepositoryResult<Statement> result = con.getStatements(null, null, null, context1, context2);

You should observe that the above RepositoryResult will not contain the information that context1 was created by Alice and context2 by Bob. This is because those statements were added without any context, thus they do not appear in context1 or context2, themselves.

To explicitly retrieve statements that do not have an associated context, we do the following:

1 2 3
// Get all statements that do not have an associated context RepositoryResult<Statement> result = con.getStatements(null, null, null, (Resource)null);

This will give us only the statements about the creators of the contexts, because those are the only statements that do not have an associated context. Note that we have to explicitly cast the null argument to Resource, because otherwise it is ambiguous whether we are specifying a single value or an entire array that is null (a vararg is internally treated as an array). Simply invoking getStatements(s, p, o, null) without an explicit cast will result in an IllegalArgumentException.

We can also get everything that either has no context or is in context1:

1 2 3
// Get all statements that do not have an associated context, or that are in context1 RepositoryResult<Statement> result = con.getStatements(null, null, null, (Resource)null, context1);

So as you can see, you can freely combine contexts in this fashion.

getStatements(null, null, null);

is not the same as:

getStatements(null, null, null, (Resource)null);

The former (without any context id parameter) retrieves all statements in the repository, ignoring any context information. The latter, however, only retrieves statements that explicitly do not have any associated context.

3.2.12. Working with Models, Collections and Iterations

Most of these examples sofar have been on the level of individual statements. However, the Repository API offers several methods that work with Java Collections of statements, allowing more batch-like update operations.

For example, in the following bit of code, we first retrieve all statements about Alice, put them in a Model (which, as we have seen in the previous sections, is an implementation of java.util.Collection) and then remove them:

1 2 3 4 5 6 7 8 9
import org.eclipse.rdf4j.query.QueryResults; import org.eclipse.rdf4j.model.Model; // Retrieve all statements about Alice and put them in a Model RepositoryResult<Statement> statements = con.getStatements(alice, null, null); Model aboutAlice = QueryResults.asModel(statements); // Then, remove them from the repository con.remove(aboutAlice);

As you can see, the QueryResults class provides a convenient method that takes a CloseableIteration (of which RepositoryResult is a subclass) as input, and returns the Model with the contents of the iterator added to it. It also automatically closes the Result object for you.

In the above code, you first retrieve all statements, put them in a Model, and then remove them. Although this works fine, it can be done in an easier fashion, by simply supplying the resulting object directly:

1
con.remove(con.getStatements(alice, null, null));

The RepositoryConnection interface has several variations of add, retrieve and remove operations. See the Javadoc for a full overview of the options.

3.2.13. RDF Collections and RepositoryConnections

In a previous section we have already seen how we can use the RDFCollections utility on top of a Model. This makes it very easy to insert any RDF Collection into your Repository - after all a Model can simply be added as follows:

1 2 3 4
Model rdfList = ... ; try (RepositoryConnection conn = repo.getConnection()) { conn.add(rdfList); }

In addition to this the Repository API offers the Connections utility class, which contains some useful utility functions specifically for retrieving RDF Collections from a Repository.

For example, to retrieve all statements corresponding to an RDF Collection identified by the resource node from our Repository, we can do the following:

1 2 3 4 5
// retrieve all statements forming our RDF Collection from the Repository and put // them in a Model try(RepositoryConnection conn = rep.getConnection()) { Model rdfList = Connections.getRDFCollection(conn, node, new LinkedHashModel()); }

Or instead, you can retrieve them in streaming fashion as well:

1 2 3 4
try(RepositoryConnection conn = repo.getConnection()) { Connections.consumeRDFCollection(conn, node, st -> { // ... do something with the triples forming the collection }); }

3.3. Transactions

So far, we have shown individual operations on repositories: adding statements, removing them, etc. By default, each operation on a RepositoryConnection is immediately sent to the store and committed.

The RepositoryConnection interface supports a full transactional mechanism that allows one to group modification operations together and treat them as a single update: before the transaction is committed, none of the operations in the transaction has taken effect, and after, they all take effect. If something goes wrong at any point during a transaction, it can be rolled back so that the state of the repository is the same as before the transaction started. Bundling update operations in a single transaction often also improves update performance compared to multiple smaller transactions.

We can indicate that we want to begin a transaction by using the RepositoryConnection.begin() method. In the following example, we use a connection to bundle two file addition operations in a single transaction:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
File inputFile1 = new File("/path/to/example1.rdf"); String baseURI1 = "http://example.org/example1/"; File inputFile2 = new File("/path/to/example2.rdf"); String baseURI2 = "http://example.org/example2/"; try (RepositoryConnection con = myRepository.getConnection()) { // start a transaction con.begin(); try { // Add the first file con.add(inputFile1, baseURI1, RDFFormat.RDFXML); // Add the second file con.add(inputFile2, baseURI2, RDFFormat.RDFXML); // If everything went as planned, we can commit the result con.commit(); } catch (RepositoryException e) { // Something went wrong during the transaction, so we roll it back con.rollback(); } }

In the above example, we use a transaction to add two files to the repository. Only if both files can be successfully added will the repository change. If one of the files can not be added (for example because it can not be read), then the entire transaction is cancelled and none of the files is added to the repository.

As you can see, we open a new try block after calling the begin() method (line 9 and further). The purpose of this is to be able to catch any errors that happen during transaction execution, so that we can explicitly call rollback() on the transaction (line 19). If you prefer your code shorter, you can leave this out, and just do this:

1 2 3 4 5 6 7 8 9 10
try (RepositoryConnection con = myRepository.getConnection()) { // start a transaction con.begin(); // Add the first file con.add(inputFile1, baseURI1, RDFFormat.RDFXML); // Add the second file con.add(inputFile2, baseURI2, RDFFormat.RDFXML); // If everything went as planned, we can commit the result con.commit(); }

The close() method, which is automatically invoked by Java when the try-with resources block ends, will also ensure that an unfinished transaction is rolled back (it will also log a warning about this).

A RepositoryConnection only supports one active transaction at a time. You can check at any time whether a transaction is active on your connection by using the isActive() method. If you need concurrent transactions, you will need to use several separate RepositoryConnections.

3.3.1. Transaction Isolation Levels

Any transaction operates according to a certain transaction isolation level. A transaction isolation level dictates who can ‘see’ the updates that are perfomed as part of the transaction while that transaction is active, as well as how concurrent transactions interact with each other.

The following transaction isolation levels are available:

  • NONE The lowest isolation level; transactions can see their own changes, but may not be able to roll them back, and no support isolation among transactions is guaranteed. This isolation level is typically used for things like bulk data upload operations.

  • READ_UNCOMMITTED Transactions can be rolled back, but are not necessarily isolated: concurrent transactions may be able to see other’s uncommitted data (so-called ‘dirty reads’).

  • READ_COMMITTED In this transaction isolation level, only data from concurrent transactions that has been committed can be seen by the current transaction. However, consecutive reads within the same transaction may see different results. This isolation level is typically used for long-lived operations.

  • SNAPSHOT_READ In addition to being READ_COMMITTED, query results in this isolation level will observe a consistent snapshot. Changes occurring to the data while a query is evaluated will not affect that query’s result. This isolation level is typically used in scenarios where there multiple concurrent transactions that do not conflict with each other.

  • SNAPSHOT In addition to being SNAPSHOT_READ, succesful transactions in this isolation level will operate against a particular dataset snapshot. Transactions in this isolation level will either see the complete effects of other transactions (consistently throughout) or not at all. This isolation level is typically used in scenarios where a write operation depends on the result of a previous read operation.

  • SERIALIZABLE In addition to SNAPSHOT, this isolation level requires that all other transactions must appear to occur either completely before or completely after a succesful serializable transaction. This isolation is typically used when multiple concurrent transactions are likely to conflict.

Which transaction isolation level is active is dependent on the actual store the action is performed upon. In addition, not all the transaction isolation levels listed above are by necessity supported by every store.

By default, both the memory store and the native store use the SNAPSHOT_READ transaction isolation level. In addition, both of them support the NONE, READ_COMMITTED, SNAPSHOT, and SERIALIZABLE levels.

The native and memory store use an optimistic locking scheme. This means that these stores allow multiple concurrent write operations, and set transaction locks ‘optimistically’, that is, they assume that no conflicts will occur. If a conflict does occur, an exception is thrown on commit, and the calling user has the option to replay the same transaction with the updated state of the store. This setup significantly reduces the risk of deadlocks, and makes a far greater degree of parallel processing possible, with the downside of having to deal with possible errors thrown to prevent inconsistencies. In cases where concurrent transactions are likely to conflict, the user is advised to use the SERIALIZABLE isolation level.

You can specify the transaction isolation level by means of an optional parameter on the begin() method. For example, to start a transaction that uses SERIALIZABLE isolation:

1 2 3 4 5
try (RepositoryConnection conn = rep.getConnection()) { conn.begin(IsolationLevels.SERIALIZABLE); .... conn.commit(); }

A transaction isolation level is a sort of contract, that is, a set of guarantees of what will minimally happen while the transaction is active. As such, a store will make a best effort to honor the guarantees of the requested isolation level. If it does not support the specific isolation level being requested, it will attempt to use a level it does support that offers minimally the same guarantees.

3.3.2. Automated transaction handling

Although transactions are a convenient mechanism, having to always call begin() and commit() to explictly start and stop your transactions can be tedious. RDF4J offers a number of convenience utility functions to automate this part of transaction handling, using the Repositories utility class.

As an example, consider this bit of transactional code. It opens a connection, starts a transaction, adds two RDF statements, and then commits. It also makes sure that it rolls back the transaction if something went wrong, and it ensures that once we’re done, the connection is closed.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
ValueFactory f = myRepository.getValueFactory(); IRI bob = f.createIRI("urn:bob"); RepositoryConnection conn = myRepository.getConnection(); try { conn.begin(); conn.add(bob, RDF.TYPE, FOAF.PERSON); conn.add(bob, FOAF.NAME, f.createLiteral("Bob")); conn.commit(); } catch (RepositoryException e) { conn.rollback(); } finally { conn.close(); }

That’s an awful lot of code for just inserting two triples. The same thing can be achieved with far less boilerplate code, as follows:

1 2 3 4 5 6
ValueFactory f = myRepository.getValueFactory(); IRI bob = f.createIRI("urn:bob"); Repositories.consume(myRepository, conn -> { conn.add(bob, RDF.TYPE, FOAF.PERSON); conn.add(bob, RDFS.LABEL, f.createLiteral("Bob")); });

As you can see, using Repositories.consume(), we do not explicitly begin or commit a transaction. We don’t even open and close a connection explicitly – this is all handled internally. The method also ensures that the transaction is rolled back if an exception occurs.

This pattern is useful for simple transactions, however as we’ve seen above, we sometimes do need to explicitly call begin(), especially if we want to modify the transaction isolation level.

3.4. Multithreaded Repository Access

The Repository API supports multithreaded access to a store: multiple concurrent threads can obtain connections to a Repository and query and performs operations on it simultaneously (though, depending on the transaction isolation level, access may occassionally block as a thread needs exclusive access).

The Repository object is thread-safe, and can be safely shared and reused across multiple threads (a good way to do this is via a RepositoryProvider).

RepositoryConnection is not thread-safe. This means that you should not try to share a single RepositoryConnection over multiple threads. Instead, ensure that each thread obtains its own RepositoryConnection from a shared Repository object. You can use transaction isolation levels to control visibility of concurrent updates between threads.

4. Parsing and Writing RDF with Rio

The RDF4J framework includes a set of parsers and writers called Rio. Rio (“RDF I/O”) is a toolkit that can be used independently from the rest of RDF4J. In this chapter, we will take a look at various ways to use Rio to parse from or write to an RDF document. We will show how to do a simple parse and collect the results, how to count the number of triples in a file, how to convert a file from one syntax format to another, and how to dynamically create a parser for the correct syntax format.

If you use RDF4J via the Repository API, then typically you will not need to use the parsers directly: you simply supply the document (either via a URL, or as a File, InputStream or Reader object) to the RepositoryConnection and the parsing is all handled internally. However, sometimes you may want to parse an RDF document without immediately storing it in a triplestore. For those cases, you can use Rio directly.

4.1. Listening to the parser

The Rio parsers all work with a set of Listener interfaces that they report results to: ParseErrorListener, ParseLocationListener, and RDFHandler. Of these three, RDFHandler is the most useful one: this is the listener that receives parsed RDF triples. So we will concentrate on this interface here.

The RDFHandler interface is quite simple, it contains just five methods: startRDF, handleNamespace, handleComment, handleStatement, and endRDF. Rio also provides a number of default implementations of RDFHandler, such as StatementCollector, which stores all received RDF triples in a Java Collection. Depending on what you want to do with parsed statements, you can either reuse one of the existing RDFHandlers, or, if you have a specific task in mind, you can simply write your own implementation of RDFHandler. Here, I will show you some simple examples of things you can do with RDFHandlers.

4.2. Parsing a file and collecting all triples

As a simple example of how to use Rio, we parse an RDF document and collect all the parsed statements in a Java Collection object (specifically, in a Model object).

Let’s say we have a Turtle file, available at http://example.org/example.ttl:

1 2
java.net.URL documentUrl = new URL(http://example.org/example.ttl”); InputStream inputStream = documentUrl.openStream();

We now have an open InputStream to our RDF file. Now we need a RDFParser object that reads this InputStream and creates RDF statements out of it. Since we are reading a Turtle file, we create a RDFParser object for the RDFFormat.TURTLE syntax format:

1
RDFParser rdfParser = Rio.createParser(RDFFormat.TURTLE);

Note that all Rio classes and interfaces are in package org.eclipse.rdf4j.rio or one of its subpackages.

We also need an RDFHandler which can receive RDF statements from the parser. Since we just want to create a collection of Statements for now, we’ll just use Rio’s StatementCollector:

1 2
Model model = new LinkedHashModel(); rdfParser.setRDFHandler(new StatementCollector(model));

Note, by the way, that you can use any standard Java Collection class (such as java.util.ArrayList or java.util.HashSet) in place of the Model object, if you prefer.

Finally, we need to set the parser to work:

1 2 3 4 5 6 7 8 9 10 11 12
try { rdfParser.parse(inputStream, documentURL.toString()); } catch (IOException e) { // handle IO problems (e.g. the file could not be read) } catch (RDFParseException e) { // handle unrecoverable parse error } catch (RDFHandlerException e) { // handle a problem encountered by the RDFHandler }

After the parse() method has executed (and provided no exception has occurred), the collection model will be filled by the StatementCollector. As an aside: you do not have to provide the StatementCollector with a list in advance, you can also use an empty constructor and then just get the collection, using StatementCollector.getStatements().

The Rio utility class provides additional helper methods, to make parsing to a Model a single API call:

1
Model results = Rio.parse(inputStream, documentUrl.toString(), RDFFormat.TURTLE);

4.3. Using your own RDFHandler: counting statements

Suppose you want to count the number of triples in an RDF file. You could of course parse the file, add all triples to a Collection, and then check the size of that Collection. However, this will get you into trouble when you are parsing very large RDF files: you might run out of memory. And in any case: creating and storing all these Statement objects just to be able to count them seems a bit of a waste. So instead, we will create our own RDFHandler implementation, which just counts the parsed RDF statements and then immediately throws them away.

To create your own handler, you can of course create a class that implements the RDFHandler interface, but a useful shortcut is to instead create a subclass of AbstractRDFHandler. This is a base class that provides dummy implementations of all interface methods. The advantage is that you only have to override the methods in which you need to do something. Since what we want to do is just count statements, we only need to override the handleStatement method. Additionaly, we of course need a way to get back the total number of statements found by our counter:

1 2 3 4 5 6 7 8 9 10 11 12 13
class StatementCounter extends AbstractRDFHandler { private int countedStatements = 0; @Override public void handleStatement(Statement st) { countedStatements++; } public int getCountedStatements() { return countedStatements; } }

Once we have our custom RDFHandler class, we can supply that to the parser instead of the StatementCollector we saw earlier:

1 2 3 4 5 6 7 8 9
StatementCounter myCounter = new StatementCounter(); rdfParser.setRDFHandler(myCounter); try { rdfParser.parse(inputStream, documentURL.toString()); } catch (Exception e) { // oh no! } int numberOfStatements = myCounter.getCountedStatements();

4.4. Detecting the file format

In the examples sofar, we have always assumed that you know what the syntax format of your input file is: we assumed Turtle syntax and created a new parser using RDFFormat.TURTLE. However, you may not always know in advance what exact format the RDF file is in. What then? Fortunately, Rio has a couple of useful features to help you.

The Rio utility class has a couple of methods for guessing the correct format, given either a filename or a MIME-type. For example, to get back the RDF format for our Turtle file, we could do the following:

1
RDFFormat format = Rio.getParserFormatForFileName(documentURL.toString()).orElse(RDFFormat.RDFXML);

This will guess, based on the name of the file, that it is a Turtle file and return the correct format. We can then use that with the Rio class to create the correct parser dynamically.

Note the .orElse(RDFFormat.RDFXML) bit at the end: if Rio can not guess the parser format based on the file name, it will simply return RDFFormat.RDFXML as a default value. Of course if setting a default value makes no sense, you could also choose to return null or even to throw an exception - that’s up to you.

Once we have the format determined, we can create a parser for it like so:

1
RDFParser rdfParser = Rio.createParser(format);

As you can see, we still have the same result: we have created an RDFParser object which we can use to parse our file, but now we have not made the explicit assumption that the input file is in Turtle format: if we would later use the same code with a different file (say, a .owl file – which is in RDF/XML format), our program would be able to detect the format at runtime and create the correct parser for it.

4.5. Writing RDF

Sofar, we’ve seen how to read RDF, but Rio of course also allows you to write RDF, using RDFWriters, which are a subclass of RDFHandler that is intended for writing RDF in a specific syntax format.

As an example, we start with a Model containing several RDF statements, and we want to write these statements to a file. In this example, we’ll write our statements to a file in RDF/XML syntax:

1 2 3 4 5 6 7 8 9 10 11 12 13
Model model; // a collection of several RDF statements FileOutputStream out = new FileOutputStream("/path/to/file.rdf"); RDFWriter writer = Rio.createWriter(RDFFormat.RDFXML, out); try { writer.startRDF(); for (Statement st: model) { writer.handleStatement(st); } writer.endRDF(); } catch (RDFHandlerException e) { // oh no, do something! }

Again, the Rio helper class provides convenience methods which you can use to make this a one step process. If the collection is a Model and the desired output format supports namespaces, then the namespaces from the model will also be serialised.

1 2 3
Model model; // a collection of several RDF statements FileOutputStream out = new FileOutputStream("/path/to/file.rdf") Rio.write(myGraph, out, RDFFormat.RDFXML);

Since we have now seen how to read RDF using a parser and how to write using a writer, we can now convert RDF files from one syntax to another, simply by using a parser for the input syntax, collecting the statements, and then writing them again using a writer for the intended output syntax. However, you may notice that this approach may be problematic for very large files: we are collecting all statements into main memory (in a Model object).

Fortunately, there is a shortcut. We can eliminate the need for using a Model altogether. If you’ve paid attention, you might have spotted it already: RDFWriters are also RDFHandlers. So instead of first using a StatementCollector to collect our RDF data and then writing that to our RDFWriter, we can simply use the RDFWriter directly. So if we want to convert our input RDF file from Turtle syntax to RDF/XML syntax, we can do that, like so:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
// open our input document java.net.URL documentUrl = new URL(http://example.org/example.ttl”); InputStream inputStream = documentUrl.openStream(); // create a parser for Turtle and a writer for RDF/XML RDFParser rdfParser = Rio.createParser(RDFFormat.TURTLE); RDFWriter rdfWriter = Rio.createWriter(RDFFormat.RDFXML, new FileOutputStream("/path/to/example-output.rdf"); // link our parser to our writer... rdfParser.setRDFHandler(rdfWriter); // ...and start the conversion! try { rdfParser.parse(inputStream, documentURL.toString()); } catch (IOException e) { // handle IO problems (e.g. the file could not be read) } catch (RDFParseException e) { // handle unrecoverable parse error } catch (RDFHandlerException e) { // handle a problem encountered by the RDFHandler }

5. Customization SAILs

In addition to some of the SAIL implementations we have already seen, RDF4J offers a number of additional SAILs that allow customization of of your RDF database in various ways. For example they allow improved full-text search, or custom rule-based inference and validation. In this chapter, we discuss these customization SAILs in more detail.

5.1. Full text indexing with the Lucene SAIL

The LuceneSail enables you to add full text search of RDF literals to find subject resources to any Sail stack. It provides querying support for the following statement patterns:

PREFIX search: <http://www.openrdf.org/contrib/lucenesail#>
?subj search:matches [
              search:query "search terms...";
              search:property my:property;
              search:score ?score;
              search:snippet ?snippet ] .

The ‘virtual’ properties in the search: namespace have the following meaning:

  • search:matches – links the resource to be found with the following query statements (required)

  • search:query – specifies the Lucene query (required)

  • search:property – specifies the property to search. If omitted all properties are searched (optional)

  • search:score – specifies a variable for the score (optional)

  • search:snippet – specifies a variable for a highlighted snippet (optional)

5.1.1. Configuration

The LuceneSail is a stacked Sail: To use it, simply wrap your base SAIL with it:

Sail baseSail = new NativeStore(new File("."));
LuceneSail lucenesail = new LuceneSail();
// set any parameters, this one stores the Lucene index files into memory
lucenesail.setParameter(LuceneSail.LUCENE_RAMDIR_KEY, "true");
...
// wrap base sail
lucenesail.setBaseSail(baseSail);

Search is case-insensitive, wildcards and other modifiers can be used to broaden the search. For example, search all literals containing words starting with "alic" (e.g. persons named "Alice"):

....
Repository repo = new SailRepository(lucenesail);
repo.initialize();

// Get the subjects and a highlighted snippet
String qry = "PREFIX search: <http://www.openrdf.org/contrib/lucenesail#> " +
			"SELECT ?subj ?text " +
			"WHERE { ?subj search:matches [" +
					" search:query ?term ; " +
					" search:snippet ?text ] } ";

List<BindingSet> results;
try (RepositoryConnection con = repo.getConnection()) {
	ValueFactory fac = con.getValueFactory();

	TupleQuery tq = con.prepareTupleQuery(QueryLanguage.SPARQL, qry);
	// add wildcard '*' to perform wildcard search
	tq.setBinding("term", fac.createLiteral("alic" + "*"));

	// copy the results and processs them after the connection is closed
	results = QueryResults.asList(tq.evaluate());
}

results.forEach(res -> {
		System.out.println(res.getValue("subj").stringValue());
		System.out.println(res.getValue("text").stringValue());
});

5.1.3. SearchIndex implementations

The LuceneSail can currently be used with five SearchIndex implementations:

SearchIndex implementation Maven module

Apache Lucene 5

org.eclipse.rdf4j.sail.lucene.LuceneIndex

org.eclipse.rdf4j:rdf4j-sail-lucene

ElasticSearch

org.eclipse.rdf4j.sail.elasticsearch.ElasticSearchIndex

org.eclipse.rdf4j:rdf4j-sail-elasticsearch

Apache Solr

org.eclipse.rdf4j.sail.solr.SolrIndex

org.eclipse.rdf4j:rdf4j-sail-solr

Each SearchIndex implementation can easily be extended if you need to add extra features or store/access data with a different schema.

5.2. Reasoning and Validation suport with SPIN

The SPARQL Inferencing Notation (SPIN) is a way to represent a wide range of business rules on top of an RDF dataset. These rules can be anything from constraint validation to inferred property value calculation. Configuration

The SpinSail (currently in beta) is a StackedSail component that adds a forward-chaining SPIN rule engine on top of any store. In its most basic form it can be used directly on top of a Sail, like so:

// create a basic Sail Stack with a simple Memory Store and SPIN inferencing support
SpinSail spinSail = new SpinSail();
spinSail.setBaseSail(new MemoryStore());
// create a repository with the Sail stack:
Repository rep = new SailRepository(spinSail);
rep.initialize();

Alternatively, a SpinSail can be configured via the RepositoryManager, like so:

// create the config for the sail stack
SailImplConfig spinSailConfig = new SpinSailConfig(new MemoryStoreConfig());
RepositoryImplConfig repositoryTypeSpec = new SailRepositoryConfig(spinSailConfig);
// create the config for the actual repository
String repositoryId = "spin-test";
RepositoryConfig repConfig = new RepositoryConfig(repositoryId, repositoryTypeSpec);
manager.addRepositoryConfig(repConfig);
// get the Repository from the manager
Repository repository = manager.getRepository(repositoryId);

While this configuration already allows you to do many useful things, it does not do complete SPIN reasoning: the SpinSail relies on basic RDFS inferencing to be supplied by the underlying Sail stack. This means that for use cases where you need to rely on things like transitivity of rdfs:subClassOf relations, you should configure a Sail stack that includes the ForwardChainingRDFSInferencer. In addition, a DedupingInferencer is supplied which is a small optimization for both reasoners: it takes care to filter out potential duplicate results – though at the cost of an increase in memory usage. The full configuration with both additional inferencers looks like this:

// create a basic Sail Stack with a simple Memory Store, full RDFS reasoning,
// and SPIN inferencing support
SpinSail spinSail = new SpinSail();
spinSail.setBaseSail(
        new ForwardChainingRDFSInferencer(
               new DedupingInferencr(new MemoryStore())
        )
);
// create a repository with the Sail stack:
Repository rep = new SailRepository(spinSail);
rep.init();

or using configuration via the RepositoryManager:

// create the config for the sail stack
SailImplConfig spinSailConfig = new SpinSailConfig(
           new ForwardChainingRDFSInferencerConfig(
                 new DedupingInferencerConfig(new MemoryStoreConfig())
           )
);
RepositoryImplConfig repositoryTypeSpec = new SailRepositoryConfig(spinSailConfig);
// create the config for the actual repository
String repositoryId = "spin-test";
RepositoryConfig repConfig = new RepositoryConfig(repositoryId, repositoryTypeSpec);
manager.addRepositoryConfig(repConfig);
// get the Repository from the manager
Repository repository = manager.getRepository(repositoryId);

5.2.1. Adding rules

Once your repository is set up with SPIN support, you can add rules by simply uploading an RDF document contain SPIN rules (which are expressed in RDF using the SPIN vocabulary). The SpinSail will automatically execute these rules on the data.

As an example, consider the following data:

@prefix ex: <http://example.org/>.
ex:John a ex:Father ;
        ex:parentOf ex:Lucy .
ex:Lucy a ex:Person .

Now assume we wish to introduce a rule that defines persons who are the object of the ex:parentOf relation to be subject of an ex:childOf relation (in other words, we want to infer the inverse relationship for the parent-child relation). In SPIN, this could be done with the following rule:

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix sp: <http://spinrdf.org/sp#>.
@prefix spin: <http://spinrdf.org/spin#>.
@prefix ex: <http://example.org/>.
// every person who has a parent is a child of that parent.
ex:Person a rdfs:Class ;
	spin:rule [
		a sp:Construct ;
	sp:text """PREFIX ex: <http://example.org/>
	           CONSTRUCT { ?this ex:childOf ?parent . }
	           WHERE { ?parent ex:parentOf ?this . }"""
] .

To get the SpinSail to execute this rule, all you need to do is upload both above RDF datasets to the Repository. The relation will be automatically inferred at data upload time, so the query:

SELECT ?child WHERE { ?child ex:childOf ?parent }

will give this result:

child

ex:Lucy

5.2.2. Further reading

Here are some useful links to learn more about SPIN: