Enabling the Distributed Family Tree

This is the official research blog for the Distributed Family Tree, an open network of genealogical data and metadata.  In a nutshell, the big idea is that we can combine all available genealogical information on the Internet into a single distributed network.  The foundation for this network is the substance of the Master's Thesis that I am currently working on.

Optimized Index Structures for Querying RDF from the Web

This morning I read Optimized Index Structures for Querying RDF from the Web, a paper written by Andreas Harth and Stefan Decker of the Digital Enterprise Research Institute (DERI) at the National University of Galway, Ireland (I would get so lost on those streets!). The paper laments the scarcity of fast RDF database implementations and describes an optimized index structure.

The big take-away for this paper is that I need more indexes, but not too many. In a previous post I mentioned adding an index for the graph, subject, predicate, and object columns of the quads table. This paper points out that there are actually sixteen access patterns, or combinations of known and unknown values, each one of which could benefit from an index over the known columns (for example, if the subject and predicate are known, use a combined subject-predicate index):

?  ? ? ?
?  ? ? o
?  ? p ?
?  ? p o
?  s ? ?
? s ? o
?  s p ?
?  s p o
g  ? ? ?
g  ? ? o
g  ? p ?
g  ? p o
g  s ? ?
g  s ? o
g  s p ?
g  s  p o

Of course, sixteen indexes would take a lot of space. Fortunately, the same result can be achieved with just six indexes. For example, a combined subject-predicate index could be used not only to lookup (? s p ?) queries, but (? s ? ?) queries as well. From what I’ve read in the documentation for HSQLDB, these results won’t translate perfectly for me. It seems that HSQLDB will use a combined index when all of the columns of the index are present, or when only the first column of the index is present. This means I will actually need eleven indexes:

graph-subject
graph-predicate
graph-object
subject-predicate
subject-object
predicate-object
graph-subject-predicate
graph-subject-object
graph-predicate-object
subject-predicate-object
graph-subject-predicate-object

The rest of the paper described a lightweight Java implementation of the indexes and a corresponding query processing system. I take it as an axiom that I’ll never have enough time to produce a database system that can rival HSQLDB, let alone MySQL, Oracle, or any other database that I can use a JDBC driver to connect to, so I didn’t get too much into it.

    Trackbacks/Pingbacks


  1. […] A month ago I wrote about a paper I’d read on “optimized index structures for querying RDF data”. The main idea I took from that paper was the use of combined indexes to optimize queries.  I sadly noted at the time that, because of a deficiency in HSQLDB, I would have to create eleven indexes (rather than the prescribed six) to achieve the promised optimization. […]

Leave a Reply

cheap generic kamagra kamagra uk viagra, Viagra Buy generic viagra levitra and cialis pills