In 1999, Tim Berners-Lee first described the semantic web in this way: " I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A ‘Semantic Web’, which should make this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The ‘intelligent agents’ people have touted for ages will finally materialize."
Since that time, there has been significant progress towards making such an idea reality (note Radar Networks' Twine, or Metaweb's Freebase). It has also become more tightly constrained and defined (e.g. Wikipedia's current definition: "The Semantic Web is an evolving extension of the World Wide Web in which web content can be expressed not only in natural language, but also in a format that can be read and used by software agents, thus permitting them to find, share and integrate information more easily.").
Going beyond RDF-related technologies OWL and other ontology frameworks, however, we may be approaching a post-semantic web phase of development of the Internet. It's not that the "semantic web" as Tim B-L dreamed it or Wikipedia defines has really fully appeared. In fact, I have a suspicion that in either case, it may never appear and function the way its proponents envision. For one, there is still deep disagreement over standards - for all its Sematicness, the community can't even agree on the semantics!
By post-semantic web, I do not mean that it has become irrelevant - but it is beginning to show signs of turning out far differently than anyone could have imagined.
We are now seeing advanced machine learning combined with natural language processing, social graph analysis, and data mining techniques that half a decade ago few could have imagined. These technologies are being put to use by incredibly powerful compute resources (particularly those in mesh or p2p networks) to pick up and analyze a tremendous array of "signals". By signals, I mean not just those most in vogue in "web 2.0" like tags or networks of friends, although these are new and valuable sources for machines to learn to serve people more effectively. I also mean "digital gestures" - small signals that convey meaning to others but differently than "natural language" typically conveys; examples might include symbology or avatars. We are becoming more expressive digitally, and we are now just beginning to be able to also harvest these expressions and have machines learn from them in order to adapt to us.
The artificial intelligence field has for many years been fascinated with the idea of autonomous agents - semi-stupid digital servants that can act on our behalf under certain circumstances. The recent push into probabilistic reasoning and advances in a particular subfield of AI called machine learning (a characteristically poor name for a field of inquiry, but oh well) has begun to produce something better than semi-stupid in terms of serving us users.
The promise of a post-semantic web goes beyond just a language and representation framework (the techno-wonk vision of the semantic web) or a series of agents that do things for people. It's really a combination of 1) the power of distributed computing, 2) the growing expressivity of digital life and the signals such a life leaves behind, and 3) a way for software to learn and adapt itself to serve users and the human communities that they belong to, better. The implications for such powerful applications are not that they necessarily do things for us (although that would be a useful side effect), but rather give us new cognitive, and perhaps social, capabilities that let us do what we humans already do - just more and better.
AAAI Symposium on Social Information Processing - http://www.aaai.org/Symposia/Spring/sss08symposia.php#ss06
iLink KDD - http://www.ai.sri.com/pub_list/1523
Radar Networks' Twine: http://www.radarnetworks.com/
Metaweb - http://www.metaweb.com/
Comments
Web 3.0 and scientific information
At a 2006 conference that IFTF organized on the future of scientific publishing, several experts talked about an emerging "Web 3.0" that would be dominated by semi-intelligent applications that could create new content and documents. The consensus was that the agents would work first with well-tagged content (i.e., the building blocks of the semantic web), because much of the hard interpretive work of identifying meaning in documents-- and identifying it in a way that can be read by machines-- was already done; and that we'd see the first successful agents in the sciences, since scientific publications are among the best-structured.
Alex Soojung-Kim Pang
Rich Media and the Semantic Web
Alex and David:
Does rich media pose any challenge to the efforts to build the semantic web? It would seem that video production of scientific content, for example what is being done in the NASA HD podcast or projects like SCIVEE, might be a challenge. To what extent can our semi-intelligent agents actually allow us to extract meaning from non-text based content?
Jerry Sheehan
phone: 858.336.2622
yahoo: calit2s
skype: zenchaos
twitter: www.twitter.com/zenchaos