Large scale chemical patent mining with UIMA and UNICORE

Klenner, Alexander; Bergmann, Sandra; Zimmermann, Marc; Romberg, Mathilde

doi:10.1186/1758-2946-4-S1-P19

Volume 4 Supplement 1

7th German Conference on Chemoinformatics: 25 CIC-Workshop

Poster presentation
Open access
Published: 01 May 2012

Large scale chemical patent mining with UIMA and UNICORE

Alexander Klenner¹,
Sandra Bergmann²,
Marc Zimmermann¹ &
…
Mathilde Romberg²

Journal of Cheminformatics volume 4, Article number: P19 (2012) Cite this article

2299 Accesses
2 Citations
Metrics details

Finding information about annotated chemical reactions for drugs and small compounds is a crucial step for pharmaceutical industries. This data often is presented in form of unstructured documents (especially patents) and manual extraction of this information is a time- and cost inefficient effort.

In our project UIMA-HPC [1], we describe the combined usage of Unstructured Information Managment Architecture (UIMA) and Uniform Interface to Computing Recources (UNICORE) for large-scale chemical patent mining. Our approach will incorporate existing software such as chemoCR for image processing (image-to-structure) and OCR for text reconstruction. All components are wrapped inside the UIMA framework pipeline. Using the UIMA framework ensures compatibility between different components of the pipeline and makes it possible to connect arbitrary annotation modules into this system. Scale-out for large document collections is achieved by the UNICORE framework on High Performance Clusters, which enables parallelization of all UIMA nodes. The aim is a fully annotated pdf collection where all biomedical entities (compound names, reaction schemes, etc.) are connected by references and thus can be easily browsed and searched by the user. Planned schematic workflow is shown in Figure 1.

Funding

BMBF grant 01IH1101.

References

. [http://www.uima-hpc.org]

Download references

Author information

Authors and Affiliations

Fraunhofer-Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, 53754, Germany
Alexander Klenner & Marc Zimmermann
Forschungszentrum Juelich GmbH, Juelich, 52425, Germany
Sandra Bergmann & Mathilde Romberg

Authors

Alexander Klenner
View author publications
You can also search for this author in PubMed Google Scholar
Sandra Bergmann
View author publications
You can also search for this author in PubMed Google Scholar
Marc Zimmermann
View author publications
You can also search for this author in PubMed Google Scholar
Mathilde Romberg
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexander Klenner.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Klenner, A., Bergmann, S., Zimmermann, M. et al. Large scale chemical patent mining with UIMA and UNICORE. J Cheminform 4 (Suppl 1), P19 (2012). https://doi.org/10.1186/1758-2946-4-S1-P19

Download citation

Published: 01 May 2012
DOI: https://doi.org/10.1186/1758-2946-4-S1-P19

7th German Conference on Chemoinformatics: 25 CIC-Workshop

Large scale chemical patent mining with UIMA and UNICORE

Funding

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Journal of Cheminformatics

Contact us

7th German Conference on Chemoinformatics: 25 CIC-Workshop

Large scale chemical patent mining with UIMA and UNICORE

Funding

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Journal of Cheminformatics

Contact us