Open Access Highly Accessed Methodology

The ChEMBL database as linked open data

Egon L Willighagen1*, Andra Waagmeester1, Ola Spjuth2, Peter Ansell3, Antony J Williams4, Valery Tkachenko4, Janna Hastings5, Bin Chen6 and David J Wild6

Author Affiliations

1 Department of Bioinformatics - BiGCaT, Maastricht University, P.O. Box 616, UNS50 Box 19, NL-6200 MD, Maastricht, The Netherlands

2 Department of Pharmaceutical Biosciences, Uppsala University, PO Box 591, SE-751 24, Uppsala, Sweden

3 School of Information Technology and Electronic Engineering, University of Queensland, St Lucia, Qld 4072, Australia

4 Royal Society of Chemistry, 904 Tamaras Circle, Wake Forest, NC 27587, USA

5 Cheminformatics and Metabolism, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK

6 School of Informatics and Computing, Indiana University, Bloomington, IN, USA

For all author emails, please log on.

Journal of Cheminformatics 2013, 5:23  doi:10.1186/1758-2946-5-23

Published: 8 May 2013

Abstract

Background

Making data available as Linked Data using Resource Description Framework (RDF) promotes integration with other web resources. RDF documents can natively link to related data, and others can link back using Uniform Resource Identifiers (URIs). RDF makes the data machine-readable and uses extensible vocabularies for additional information, making it easier to scale up inference and data analysis.

Results

This paper describes recent developments in an ongoing project converting data from the ChEMBL database into RDF triples. Relative to earlier versions, this updated version of ChEMBL-RDF uses recently introduced ontologies, including CHEMINF and CiTO; exposes more information from the database; and is now available as dereferencable, linked data. To demonstrate these new features, we present novel use cases showing further integration with other web resources, including Bio2RDF, Chem2Bio2RDF, and ChemSpider, and showing the use of standard ontologies for querying.

Conclusions

We have illustrated the advantages of using open standards and ontologies to link the ChEMBL database to other databases. Using those links and the knowledge encoded in standards and ontologies, the ChEMBL-RDF resource creates a foundation for integrated semantic web cheminformatics applications, such as the presented decision support.

Keywords:
ChEMBL; Bioactivity; Semantic web; Resource Description Framework; Linked Data

Graphical abstract