Email updates

Keep up to date with the latest news and content from Journal of Cheminformatics and Chemistry Central.

This article is part of the supplement: 8th German Conference on Chemoinformatics: 26 CIC-Workshop

Open Access Open Badges Oral presentation

Toxicological knowledge discovery by mining emerging patterns from toxicity data

Richard Sherhod1*, Valerie J Gillet2, Thierry Hanser2, Philip N Judson2 and Jonathan D Vessey2

Author Affiliations

1 Information School, The University of Sheffield, Sheffield, S1 4DP, UK

2 Lhasa Limited, Leeds, LS2 9HD, UK

For all author emails, please log on.

Journal of Cheminformatics 2013, 5(Suppl 1):O9  doi:10.1186/1758-2946-5-S1-O9

The electronic version of this article is the complete one and can be found online at:

Published:22 March 2013

© 2013 Sherhod et al.; licensee BioMed Central Ltd.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Oral presentation

Predicting the risk of toxic and environmental effects of chemical compounds is of great importance to all chemical industries [1]. Expert systems have shown success in predicting toxic risk by applying established knowledge of toxicology encoded as a knowledge base of structural alerts and a reasoning model. A disadvantage of expert systems is that developing new structural alerts requires considerable time and effort from domain experts. In order to expedite this process a software tool has been developed that can automatically mine representations of activating features directly from toxicity datasets and present them in an interpretable form.

Our knowledge discovery tool applies emerging pattern (EP) mining [2]: a form of association rule mining [3] that is well known to computer science, but is relatively new to chemistry [4]. The EP mining algorithm accepts any data expressed as a series of binary properties, which is divided into two classes, and extracts patterns of those properties that are frequent within the data and are more frequent in one data class compared to the other. By mining emerging patterns from toxicity datasets, encoded as fingerprints of binary descriptors, the tool generates patterns of features that distinguish toxicants from innocuous compounds. These patterns represent potentially activating features of the toxic compounds that may then be used to define new alerts.

The knowledge discovery tool has been tested using a public dataset of 3489 mutagens and 2981 non-mutagens, encoded as fingerprints of approximately 2000 functional groups and ring descriptors. EPs were produced and grouped into a number of hierarchical families. Six of the EPs that represented distinct chemical classes were selected for manual inspection by a toxicology expert. Relevant literature was analysed to find a mechanistic rationale for the mined features, which resulted in four new structural alerts for in vitro mutagenicity.


  1. Cronin MTD, Madden JC: In Silico Toxicology: Principles and Applications.

    Royal Society of Chemistry 2010. OpenURL

  2. Dong G, Li J: Efficient Mining of Emerging Patterns: Discovering Trends and Differences. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining: 15-18 August 1999. San Diego. ACM Press; 1999:43-52. OpenURL

  3. Agrawal R, Imieliŕski T, Swami A: Mining Association Rules between Sets of Items in Large Databases. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data. Washington DC. AMC Press; 1993:207-216.

    26-28 May 1993


  4. Auer J, Bajorath J: Emerging Chemical Patterns: A New Methodology for Molecular Classification and Compound Selection.

    J Chem Inf Mod 2006, 46:2502-2514. Publisher Full Text OpenURL