Recently, ClearForest (a division of Reuters) launched OpenCalais - a web service which reads the news for you:
The Calais web service automatically attaches rich semantic metadata to the content you submit – in well under a second. Using natural language processing, machine learning and other methods, Calais categorizes and links your document with entities (people, places, organizations, etc.), facts (person ‘x’ works for company ‘y’), and events (person ‘z’ was appointed chairman of company ‘y’ on date ‘x’).
In other words, it annotates text and marks up four of the journalistic 5 Ws - who, what, where and when, hopefully making it easier for journalists to join the dots and supply the why and how.
This seems only tangentially relevant to chemistry, at first glance, but chemistry's in a sense just a special case - we want to pull the whats (chemicals) and the hows (experimental methods) out of free text - papers, theses, and journal articles. That means chemical named entity recognition, and conveniently:
Oscar3 is a system for chemical natural language processing, focussing on chemical named entity recognition.
The Royal Society of Chemistry use OSCAR3 to annotate journals as part of Project Prospect; it lets them build, automatically, a searchable index of molecular structures (and substructures) published in their journals.
So, in that, there's some overlap with what the service the Chemical Abstracts Service provides; but the CAS model is based on a small army of editors indexing papers by hand, and as such can only scale up so far. On the other hand, there's no reason OSCAR-like robot annotators, even if less accurate than the humans, can't be turned loose on much more - patent applications, university thesis repositories, scientific blogs, mainstream scientific journalism, or anywhere there's chemical text to be read. That has the potential to open up a lot of 'hidden', informal or otherwise unpublished research to chemically-meaningful indexing and search.
Recently, ClearForest (a division of Reuters) launched OpenCalais - a web service which reads the news for you: Oscar3 is a system for chemical natural language processing, focussing on chemical named entity recognition.