Status
thanks for the suggestion - will try it!
Location
Madhya Pradesh, India
Subscribe to GeoRSS Subscribe to KML


Web Crawling, Caching, and API’s

Published in Programming, Technology, Web


Several of the projects I’m currently working on involve harvesting (mashing, grazing, retrieving, etc.) data from other websites. The web has recently (last 2-3 years) made vast amounts of data readily available via API’s or simple spiders/harvesters. This is more recently been made apparent by the huge number of mashups that have shown up (including one of my own).

With the large number of mashups, even to the point of seeking venture capital funding, one needs to wonder as to the real value involved. As a ZDNet article points out in Mashups: who’s really in control?, the developers of these mashups haven’t necessarily added unique value, but provided a path to lead the original data providers or other larger entities to move in and provide a better solution.

Of course, then you actually see the purchase of del.icio.us, GeoBloggers developer Daniel Catt, Platial, et al. You realize that mashups are actually viable. But, as pointed out on a recent Wired podcast as well as Joel Spolsky talk, “Do What you Love”.

ProgrammableWeb is an excellent resource listing 192 available API’s (and reports that there are approximately 2.78 new mashups per day). Some of these services, such as Vast, are created purely as data providers. (via TechCrunch). They provide great REST interfaces, and many have wrappers written in your favorite language (and if it isn’t, perhaps its time you got a new favorite language).

While many services provide excellent API’s, sometimes you just need to go old fashion and farm the data yourself, in this case, find yourself something like a Ruby Spider.

Similar Posts

Leave a Response