Improving integrative searching of systems chemical biology data using semantic annotation
1 School of Informatics and Computing, Indiana University, Bloomington, IN, USA
2 School of Library and Information Science, Indiana University, Bloomington, IN, USA
Journal of Cheminformatics 2012, 4:6 doi:10.1186/1758-2946-4-6Published: 8 March 2012
Systems chemical biology and chemogenomics are considered critical, integrative disciplines in modern biomedical research, but require data mining of large, integrated, heterogeneous datasets from chemistry and biology. We previously developed an RDF-based resource called Chem2Bio2RDF that enabled querying of such data using the SPARQL query language. Whilst this work has proved useful in its own right as one of the first major resources in these disciplines, its utility could be greatly improved by the application of an ontology for annotation of the nodes and edges in the RDF graph, enabling a much richer range of semantic queries to be issued.
We developed a generalized chemogenomics and systems chemical biology OWL ontology called Chem2Bio2OWL that describes the semantics of chemical compounds, drugs, protein targets, pathways, genes, diseases and side-effects, and the relationships between them. The ontology also includes data provenance. We used it to annotate our Chem2Bio2RDF dataset, making it a rich semantic resource. Through a series of scientific case studies we demonstrate how this (i) simplifies the process of building SPARQL queries, (ii) enables useful new kinds of queries on the data and (iii) makes possible intelligent reasoning and semantic graph mining in chemogenomics and systems chemical biology.