Oh, I Almost Forgot…

This is the official research blog for the Distributed Family Tree, an open network of genealogical data and metadata. In a nutshell, the big idea is that we can combine all available genealogical information on the Internet into a single distributed network. The foundation for this network is the substance of the Master's Thesis that I am currently working on.
Samuel Martin’s ancestors on one Web site:

Priscilla Layton’s ancestors on another Web site:

The two pedigrees seamlessly linked:

In a completely unanticipated reversal, Genesis is now shedding its database. “What!?”, you ask in blinking disbelief (as I seem to have attracted a particularly quiet readership, I get to put words in your mouth). An excellent question, I’m glad you asked. Permit me, if you will, to entertain it.
Way, way back in the beginning, long before I’d ever even heard of this semantic web thing, I was planning on Genesis being nothing more than a really good record manager (like PAF, only usable). This of course necessitated a database for storing all the data on the user’s computer. I always assumed that one would be there, even though the whole concept later evolved. It never occurred to me that a database wasn’t really neccessary anymore.
The flash of insight came yesterday morning as I was contemplating the next step in the replumbing/resurfacing effort. I don’t recall the exact circumstances, but I do remember asking myself what would happen if I stopped caching data in the database. Well, performance would go through the roof, for starters! Startup and shutdown time would become negligible. Disk space usage would fall dramatically. And perhaps most important of all, I could take advantage of the OWL inference support in Jena!
This last point bears explanation. A major part of this project is the ability for the user to indicate that Person A and Person B are in fact the same person. This is done by creating an owl:sameAs relationship between the two. Given this fact, Genesis should infer (using the OWL inference rules) that anything said about Person A is also true about Person B, and vice versa; the two are effectively one. With some tricks this could efficiently be done using a database. However, anything more complex would be next to impossible without bloating the size (and reducing the speed) of the database several orders of magnitude; all inferences would need to be precomputed each time new data is added to the database. But inferences like this can be done in-memory (without a database) on demand!
Well there are obviously many positive aspects, but are there any downsides? The most obvious drawback is the fact that it takes PGVAgent a long time to search each PhpGedView website one-by-one. Having a database means that these search results can be cached for future searches. If the database goes, so does the persistent caching. In fact, this is why I had never considered dropping the database before. Which begs the question, why did I suddenly start considering it now?
After the last few posts on the new search, I think it’s appropriate to mention that the purpose of all this is not to create the ultimate genealogy search engine. Others are tackling that beast, and more power to them. If I wanted to get in on that action, I would have …
A week-and-a-half ago I promised further details on a resurfacing project I wanted to do after I finish replumbing. Last Friday I alluded to this when I wrote about upcoming changes in search. Today I’d like to outline how the new search interface and experience will work.
Search in Genesis is …
The replumbing effort is coming along very nicely. I only have time to actually write code in short spurts, but that gives me time to think out the issues I’m tackling and address them carefully.
So far I’ve only had one problem with Jena SDB, and it’s already been fixed. The results of SPARQL …
I should probably be coding right now, but perhaps my sanity will be best preserved if I take a break to update any readers I may still have on what’s going on.
When I embarked on this project, the plan was to use Jena and NG4J for the data plumbing. When I started …
I updated Genesis today with the ability to choose and rearrange search results columns. With this also comes the ability to add custom columns. You can find an example of creating your own column on the wiki (I haven’t had time to write a full tutorial yet).
As if I wasn’t quiet …
The ability to stop searches is back, along with many other small tweaks that you may or may not notice:
Next week I’ll finish up some extension point code I’m writing to allow custom search result columns.
The latest cadre of Genesis plug-ins (revision 37) is now available for your downloading pleasure. As mentioned previously, this latest version now supports media and sources. Media support is limited to showing the first image associated with each individual. Source support is limited to showing only the website that the data …
Here’s that screenshot of media and sources that I promised:
It will take a little more work before I can release it though. Oh, and I’m going incognito through late next week. Until then!
[Permission to use the photo was granted by the respective website administrator.]
After my going MIA for the last week you may all think that I really did leave or quit. Not so, I just got swamped. There’s a lot for me to juggle, including but not limited to:
Deciding where I want to work when I graduate this December
Fixing all the bugs …
No, I’m not leaving, and no, I’m not quitting. I just finished writing the second scripted plug-in tutorial, “Goodbye, Cruel World!”, which introduces the activator extension. Plug-ins can take advantage of this extension to initialize and clean up after themselves. It also shows a simple example of scripting UI:…
There’s now a quick tutorial for writing a “Hello World!” plug-in for Genesis in Javascript on the wiki. The tutorial showcases the startup extension, which allows you to write some code that will run when Genesis starts up (in this case it just prints “Hello World!” to the console). Over the next …
As promised, the latest release of Genesis includes a Cache Manager. It can be reached from the Genesis menu.
What’s really great about the cache manager is that plug-ins can define their own categories of data which can then be selectively deleted. This is particularly useful if, …
Genesis/PGVAgent 0.0.35 are now available (if you updated to PGVAgent 0.0.33/34, this latest version will fix any problems you may be having). The latest and greatest is interpreted, sortable dates:
I suppose I should mention that only new data will be interpreted; anything already in the cache won’t …
I just saw an interesting message on the PhpGedView Help forum. It reads:
If any of you are like me and would prefer people visit your site rather than just assume they can take your data without your knowledge with PVGAgent, please read this
http://www.dftproject.org/wiki/PGVAgent
I’m not really sure how …
I’ve now got basic date interpretation working. Whenever a date in “standard form” is inserted into the datastore, an interpreted version of the date is also stored. For example, given:
:original {
:event gc:date “06 APR 2007″ ;
…
When I originally set out to enable scriptable plug-ins, my intention was to expose special extension points that would accept scripts. It wasn’t long before it occurred to me, however, that with some fancy footwork I should be able to write a standard Eclipse plug-in with script! This will be a …
I made a few small fixes and improvements to the GEDCOM importing code which will show up in Genesis/PGVAgent 0.0.31. These changes won’t impact data that has already been imported though. If you’d like to clear the old data to make way for the new, you can delete the cache by finding …
Date Interpretation
I want to write a plug-in that will interpret dates in string form as they are added to the cache and add the appropriate date form as well. In other words, when it sees dates like these:
“7 JUN 1873″
“abt 22 dec 1906″
“between 1817 and 1819″
It will interpret them and add a consistent, …
The search view now includes a query builder, which can be shown by clicking the chevron next to the ”Go” button:
The query builder is patterned after the Microsoft Outlook 2007 search interface, the idea being that you can learn the query syntax as you use the builder. …