Democracy in Genealogy
With a twenty page limit on the thesis proposal document, I was unable to go into much depth on many of the more interesting aspects of what I want to accomplish. One of those aspects is how a distributed network of genealogy data can accomodate conflicting information.
To begin, let’s consider the common case. Today, most genealogists organize all of their research results in a single private database (whether it be a stack of papers, a spreadsheet, or a record manager). This database usually houses only those results which the genealogist regards as true. This state of affairs is simple, efficient, and makes good sense. When two genealogists collaborate, however, sparks can and often do fly. Much genealogical data is subject to interpretation, especially where primary sources disagree on the facts. Just imagine the heated debates that would surround a universally shared family tree!
The difficulty then is how to accomodate incompatible facts in a distributed network of genealogy without losing the simplicity of a private database.
My solution to this problem is to use mediated views.
In this system, genealogists will have the ability to tag genealogical statements with a confidence rating. Internally this rating is a real number that ranges from 0 to 1, and can be thought of as a probability. Different record managers can present this rating in many different ways, however. For example, one record manager might present three options: Correct = 1, Indifferent = 0.5, and Incorrect = 0. Another might use a different scale: Sure = 1, Mostly Sure = 0.75, Don’t Know = 0.5, Not Convinced = 0.25, and Disagree = 0. Yet another might present a slider with labels Agree and Disagree at the ends.
Along with this rating, genealogists will be able to cite their reasons. Think of it as a message in a forum. The user might say, “I know that this information is correct because I have in my possession this person’s original birth certificate,” or “He can’t possibly have died before I was born because I remember meeting him when I was a little girl.” Other users may disagree and post a followup message, possibly starting a lengthy exchange. These threads of conversation become part of the permanent record on the relevant data and serve to help inform others’ confidence ratings.
Now, software can’t do much with the messages other than display them, but it can use the rating to decide what information to show the genealogist. The software will have a threshold, say 0.4 initially (I just pulled that number out of my hat), which is used to filter out information that is not sufficiently trusted by the user. This creates a personalized perspective for that user. The software may also unobtrusively indicate that there is additional filtered information and allow the user to access it.
Where the user hasn’t specified a confidence rating, the software can use other factors to decide what to show. In the simplest case it may choose to just show everything. Alternatively, it may choose to average all ratings to derive a “consensus” and then use that. It may even weight the ratings of those genealogists which the user has tagged with confidence ratings, people that are respected or known to be cavalier. There are a lot of possibilities in this area. I hope to just scratch the surface.
Technorati Tags:
Filed in 

