Improved HTTP Connections
The website was unavailable most of the day for some technical issue that the outsourced customer support representative was unable to communicate to me. Though it greatly irked me, this, like most adversity, was for the best in the end.
PGVAgent has an “update remote site registry automatically at startup” feature which is enabled by default. This feature retrieves the list of sites from the DFT registry first, followed by the PhpGedView registry. You can see this in action in the lower right-hand corner of Genesis when it first starts up, where the progress of the retrieval is displayed. Today I noticed that it was getting stuck on the DFT registry (which, as part of this website, was of course unavailable). A little investigation revealed severe deficiencies in the standard Java connection class, particularly the inability to set a timeout. A little more investigation revealed that, once again, the Apache Software Foundation has yet another excellent code library to fill in the gap: Apache Commons HttpClient. I’ve now replaced all references to the standard connection class with this new one, along with appropriate timeouts. Not only does it solve the site retrieval problem, but it speeds up searching as well (unavailable sites no longer tie up a connection that could be used for searching elsewhere).
Filed in 


Not directly related to your post, but I wonder how much bandwidth Genesis consumes from the sites you are searching? I am a little hesitant to submit my site because of this.
I don’t have any numbers, but I can paint the situation for you. You can expect an increase in bandwidth consumption because, as it is a lot easier to search all PGV sites using Genesis than it is to visit them one-by-one with a web browser, you will have more “visitors”. At the same time, however, a search by Genesis of your site consumes much less bandwidth than a search through a web browser (Genesis doesn’t download all the images or HTML markup, just raw data). Also, because the order of the sites is randomized with each search in Genesis, and because users probably don’t wait for every site to be searched, your site won’t get hit as often as it might otherwise seem.
If you do list your site and then find that too much bandwidth is being consumed, you can block it (instructions are available at http://www.dftproject.org/wiki/PGVAgent).
Also, if you search the Internet for “PGVAgent” you can find a few statistics pages to get an idea. For example, here’s one PGV site’s March 2007 report: http://webstat3.mediacenter.hu/kovach_hu/usage_200703.html. It seems that only 0.83% of this site’s traffic comes from Genesis, with a total of 164 Kb consumed by “genservice.php” requests (which is how Genesis communicates with it, though other sites may also contribute). I don’t know if this is particularly meaningful, however.