Home » Things I've Made, Web Experiments

Six Degrees of Separation Web Service

11 December 2006 4 Comments

I’ve created a new experiment using ClearForest‘s Content Analysis services. It’s called SixDegrees and is located here:

http://www.francisshanahan.com/sixdegrees

SixDegrees is a Semantic Web experiment using Ajax, RSS and RDF combined creatively with the Content Analysis services of ClearForest. It’s a mashup but not in the typical sense.

SixDegrees uses the notion that there are Six Degrees that separate everyone in the world. My idea takes this to a new level by trying to figure out the degrees of separation in terms of everything rather than everyone.

Here’s how it works.

A repository of RSS feeds is stored in my database. These feeds are polled periodically for new content. The latest content is then parsed by ClearForest which classifies the content and returns a set of relevant tags classified by type e.g. people, company, etc.  These are then stored along with sundry other meta data. For example, a given story on Windows Vista might return the tags "Seattle", "Microsoft" and "Bill Gates" depending on the content.

Hundreds of blog entries are processed in this manner and over time a repository is established. This is where the SixDegrees service comes in.

The front end website allows you to choose a start and end entity. These are essentially Tags queried from the database. The web service will then determine if these entities are linked in any way, through common references within the database.

For example, if the term Bill Gates appears in story AAA and also in story BBB, then those two stories can be thought of as linked to one another through the common reference of Bill. If these stories then contain other key terms for example Windows and Steve Ballmer, then a link between Steve Ballmer and Windows would be established through Bill Gates. The more references, the more confident we can be that semantically this link makes sense.

As you can see the data created forms a non-directed graph. My service processes this graph efficiently to be able to return results in a real-time fashion. I also use Ajax techniques to improve the user interface.

All data is user generated, essentially from blogs. By processing data from the blogosphere in this manner and combining it with the services of ClearForest the semantics of the content can be determined. This is essentially a very small step towards the semantic web.

Once a connection has been found, the resulting link is documented in RDF. The RDF essentially describes the triples within the link. Developers will be glad to learn I have validated the RDF using the w3C RDF validator located here [LINK].

The RDF validator can generate a graph representing your RDF triplets. As an example of the output and proof that this works, checkout the graph generated between the country "Australia" and the company "Dell". [LINK]

Lastly, should you so desire, I have exposed the capabilities of the service through SOAP and REST interfaces so that developers can build on top of the data collected.  I need to document these better but for now here’s a few sample queries:

http://www.francisshanahan.com/sixdegrees/rdf.aspx?&startTag=Australia&endTag=Dell
http://www.francisshanahan.com/sixdegrees/rest.aspx?api=getTagsByType&tagType=Product
http://www.francisshanahan.com/sixdegrees/rest.aspx?api=findConnection&startTag=Microsoft&endTag=Amazon
http://www.francisshanahan.com/sixdegrees/rest.aspx?api=getStoriesByTag&startTag=United States
http://www.francisshanahan.com/sixdegrees/rest.aspx?api=getStoryById&storyId=632

and of course the connection WSDL is located here [LINK]

This is an experiment and I have already thought of a number of ways to improve it, time permitting. I hope you find this tool as interesting as I do.

4 Comments »

  • Baker said:

    Wow! Conceptually this is a pretty nice idea. I haven’t seen something like it. I think the graphic design could be improved though. Do you charge for the web service?

  • Suresh said:

    Excellent example. I hadn’t thought of using blogs like this. I suppose the more links you find the more likely there is to be a semantic connection. You can’t assume that just because one article mentions two words that there is a connection. But if there were many different blogs mentioning the same two words then it’s reasonable to assume a connection. Is that it?

  • Venky said:

    Good job. This is a nice change from standard mashup type application. I like it. Great site too. I check regularly. Kepp it up.

  • Tom said:

    I’ve been trying to wrap my head around the Semantic Web for a while. This example has tied it all together. Thanks. I see now how if things were described by RDF a spider might scan this data and allow for inferences to be made. If that’s it, it seems a little tenuous of an idea. What if the RDF is inaccurate to begin with?

Leave your response!

Add your comment below, or trackback from your own site. You can also subscribe to these comments via RSS.

Be nice. Keep it clean. Stay on topic. No spam.

You can use these tags:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

This is a Gravatar-enabled weblog. To get your own globally-recognized-avatar, please register at Gravatar.