Aggregating crystallography: overlay journals and new databases

Andrew Walkingshaw's picture
Wildcards

A lot of science is published every year, and much of it is only available by subscription. That's inflated one of the favourite political footballs among scientists - the whole Open Access debate . It mostly concentrates on the copyright/access status of the journal articles themselves, because they're often perceived to be the major part of scientific output. But that neglects the data which gives rise to the articles, which is often as valuable. In crystallography, this is usually posted on journal websites alongside the papers, but unlike the papers the raw data's uncopyrightable; it's "just" a collection of facts. So it's not subject to the same restrictions as the articles, and you can build new databases by aggregating it.

One example of this kind of thing is CrystalEye, which brings together the latest small-molecule crystallography data, converts it to CML, and puts it up on the web in a more searchable and browsable form. But what it also does is give us a resource we can mine; it exports its data using the Atom protocol, making it easier for informaticists to write new tools to perform analyses over these streams of crystallographic data. In other words, it makes the data more amenable to programming - whether that's machine learning techniques, visualization, social filtering or something new.

The big noise in the Web world, when it comes to open data and the Semantic Web, is the Linking Open Data project. It uses RDF to make very large open datasets - and, as importantly, the links between them - accessible. Through the links, each dataset builds on the previous one, and resources like CrystalEye can be pulled into the cloud; that lets us begin to build new analyses and visualizations of that data, like this map of the global distribution of crystallography papers. As we get more data, and more connected data, then more subtle and complex relationships will be thrown up; and through that we'll get to new science.

Abstract: 

A lot of science is published every year, and much of it is only available by subscription. That's inflated one of the favourite political footballs among scientists - the whole Open Access debate . It mostly concentrates on the copyright/access status of the journal articles themselves, because they're often perceived to be the major part of scientific output. But that neglects the data which gives rise to the articles, which is often as valuable. In crystallography, this is usually posted on journal websites alongside the papers, but unlike the papers the raw data's uncopyrightable; it's "just" a collection of facts. So it's not subject to the same restrictions as the articles, and you can build new databases by aggregating it.

Tags:

Average: 4 (2 votes)

Hypotheses that reference this signal:

This signal has no hypotheses. Add a hypothesis

Forecasts that reference this signal:

This signal has no forecasts. Add a forecast