Enabling the Distributed Family Tree

This is the official research blog for the Distributed Family Tree, an open network of genealogical data and metadata.  In a nutshell, the big idea is that we can combine all available genealogical information on the Internet into a single distributed network.  The foundation for this network is the substance of the Master's Thesis that I am currently working on.

Query Builder

The search view now includes a query builder, which can be shown by clicking the chevron next to the ”Go” button:

Query Builder 

The query builder is patterned after the Microsoft Outlook 2007 search interface, the idea being that you can learn the query syntax as you use the builder.  Currently it only provides two fields, name and gender.  I’m beginning work on proper dates now (with year, month, and day parts, as opposed to monolithic strings), so fields for dates should arrive soon.

    Comments

  1. Permalink to this comment crex

    Nice, but I was hoping it would include Places, not just Gender …

  2. Permalink to this comment Hilton

    It now includes places.

  3. Permalink to this comment crex

    Very useful. Thanx.

    I see there are some problems with records from Herlin Family Tree (e.g. http://familj.herlin.se/individual.php?ged=familj.ged&pid=I242:1). The names doesn’t show in Genesis, and in Born there are four squares of which three are filled with dates (two birth dates and one death date). I see now, there are problems with other sites as well regarding the dates … I suspect you missed something in the coding.

    I like the look of the personal information. Nice icons and cute colors :)

  4. Permalink to this comment Hilton

    Thanks for the tip! I looked into the specific case you mentioned, and it appears that the GEDCOM file has a NAME tag without a value and GIVN and SURN subtags with the given name and surname respectively. I’ve also noticed that death records are being stored as birth records in general, which was indeed an oversight. I’ll fix both issues as soon as I can and release an update.

  5. Permalink to this comment Anders Berg

    Hi there Hilton, a very interesting project you’re working with. I have some questions though.

    The update and search is EXTREMELY slow, which makes it unworkable. PGVUpdate is after several minutes just 14% thru. The search for persons finds a new item once a minute about…

    It feels like you actually collect the data from the source sites in real time! This seems very inefficient. Why don’t you index the sites with key information and store it in a database and let the user search that database? It would of course include the link to the source site data. I mean, Google doesn’t visit all the sites in the world when you hit their search button.

    When it comes to the search I now found the Toggle Query Builder (could be more obvious). You should be able to search for first and last names seperately. Also birth and death dates. The result should be sortable for both first and last names. The dates column should have the year, month and day in that order, to give sorting any meaning.

    It looks good otherwise, but my main concern is the update and speed. PGVAgent Update is now 22% ready… ;)

  6. Permalink to this comment Jesper Zedlitz

    That is where the Valhalla server comes in play. It stores the data that has been retrieved from the various remote sites. Is that your idea, Hilton? There will probably be two modes for a Valhalla server (like a DNS server) a recursive mode where the server will connect to peers and query for data and a local mode that only searches in local data.

  7. Permalink to this comment Hilton

    Anders Berg:

    I’m really surprised that the PGVAgent update is running so slowly. It could be that you’re running a search at the same time, which would tend to slow it down. Subsequent updates will go a lot faster as all the information is already in the cache.

    Genesis does indeed collect all the data in real time, which is on the one hand really cool, but on the other hand really unworkable as you say. I agree that the solution is to index it on a central server, but I don’t have the time or resources to do that right now. I’m simply working with what’s there. I would strongly encourage anyone who has the inclination to produce such a server, and I’m working hard to offer an architecture in Genesis which will make it easy to write plug-ins for any data source out there. I just don’t have the time to do it all myself.

    The query builder could be a bit more obvious, so in the next revision I’ll have it visible by default. In a future release I’ll add additional fields and make it easy to script your own fields. Once I have dates in a machine-understandable format I’ll fix the sorting as well.

    Jesper:

    You’re exactly right, as usual :).

  8. Permalink to this comment Anders Berg

    Thanks for your reply. Sounds like we agree on the best solution.

    I wonder if not my slow update is because I’m pretty low on disk space. Saw your latest entry about checking the user directory and the cache, and, oh my, 100 MB is in the Genesis directory.

Leave a Reply