Web Information Quality Assessment Framework
In a recent post I discussed one of the implementation issues surrounding the DFT belief model. Richard Cyganiak made a particularly potent comment which has sent me on a completely new path. It’s things like this that make me glad I chose to develop this project in the open, rather than keep it under wraps until complete.
Originally I was planning to just create special database views that would filter any data the user doesn’t trust. The filtering policy would be written in code, and any mechanism for explaining why specific data was or was not filtered would have to be written in code on a case-by-case basis.
It turns out, though, that the folks over at the Freie Universität Berlin have already implemented a generic framework which can handle pretty much any filtering policy, along with a powerful explanation mechanism: the Web Information Quality Assessment Framework (WIQA). And it plugs right into Jena and NG4J. I can reuse this framework by simply writing my filtering policies and explanation needs in the WIQA-PL policy language (a derivative of SPARQL). All of a sudden the problem becomes much more manageable.
This doesn’t mean I won’t have my work cut out for me though. WIQA works by translating a policy to an ARQ query, which is then used to retrieve the filtered data. This process should end up using the sparql2sql query engine, so it should be fast. Subsequent operations, however, will work on the result data, and will probably be very slow with the amount of data I’m working with. I think I can get around this by rewriting some of the WIQA classes so that instead of producing a results iterator, they produce a database view, which can then be used by the sparql2sql engine in future queries. Again, this is a deviation from the work I really want to do, but I think it will give me more flexibility, as well as benefit the semantic web community at large.
Technorati Tags:
Filed in 


“I think I can get around this by rewriting some of the WIQA classes so that instead of producing a results iterator, they produce a database view, which can then be used by the sparql2sql engine in future queries.”
This sounds very interesting, as we have not tested the WIQA framework with really huge data sources and moving into this direction is clearly the way to go. Please, let us know about any results and discoveries.
Cheers
Chris
Will do. I don’t know how soon I’ll be getting to it, but I’ll keep you updated.