<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="/rss.css" type="text/css"?>
<rdf:RDF xmlns="http://purl.org/rss/1.0/"
    xmlns:cc="http://web.resource.org/cc/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:extra="http://www.w3.org/1999/xhtml"
    xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <channel rdf:about="http://www.jcheminf.com/feeds/mostaccessed/journal?quantity=&amp;format=rss&amp;version=">
        <title>Journal of Cheminformatics - Most accessed articles</title>
        <link>http://www.jcheminf.com</link>
        <description>The most accessed research articles published by Journal of Cheminformatics</description>
        <dc:date>2012-02-02T00:00:00Z</dc:date>
        <items>
            <rdf:Seq>
                                <rdf:li rdf:resource="http://www.jcheminf.com/content/4/1/1" />
                                <rdf:li rdf:resource="http://www.jcheminf.com/content/4/1/2" />
                                <rdf:li rdf:resource="http://www.jcheminf.com/content/3/1/33" />
                                <rdf:li rdf:resource="http://www.jcheminf.com/content/3/1/54" />
                                <rdf:li rdf:resource="http://www.jcheminf.com/content/1/1/8" />
                                <rdf:li rdf:resource="http://www.jcheminf.com/content/4/1/3" />
                                <rdf:li rdf:resource="http://www.jcheminf.com/content/3/1/37" />
                                <rdf:li rdf:resource="http://www.jcheminf.com/content/2/1/1" />
                                <rdf:li rdf:resource="http://www.jcheminf.com/content/1/1/21" />
                                <rdf:li rdf:resource="http://www.jcheminf.com/content/3/1/19" />
                            </rdf:Seq>
        </items>
                 <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </channel>
        <item rdf:about="http://www.jcheminf.com/content/4/1/1">
        <title>Making SharePoint(R) Chemically Aware(TM)</title>
        <description>Background:
The use of SharePoint(R) collaboration software for content management has become a critical part of today&apos;s drug discovery process. SharePoint 2010 software has laid a foundation which enables researchers to collaborate and search on various contents. The amount of data generated during a transition of a single compound from preclinical discovery to commercialization can easily range in terabytes, thus there is a greater demand of a chemically aware search algorithm that supplements SharePoint which enables researchers to query for information in a more intuitive and effective way. Thus by supplementing SharePoint with Chemically Aware TM features provides a great value to the pharmaceutical and biotech companies and makes drug discovery more efficient. Using several tools we have integrated SharePoint with chemical, compound, and reaction databases, thereby improving the traditional search engine capability and enhancing the user experience.
Results:
This paper describes the implementation of a Chemically AwareTM system to supplement SharePoint. A Chemically Aware SharePoint (CASP) allows users to tag documents by drawing a structure and associating it with the related content. It also allows the user to search SharePoint software content and internal/external databases by carrying out substructure, similarity, smiles, and IUPAC name searches. Building on traditional search , CASP takes SharePoint one step further by providing a intuitive GUI to the researchers to base their search on their knowledge of chemistry than textual search. CASP also provides a way to integrate with other systems, for example a researcher can perform a sub-structure search on pdf documents with embedded molecular entities.
Conclusion:
A Chemically AwareTM system supplementing SharePoint is a step towards making drug discovery process more efficient and also helps researchers to search for information in a more intuitive way. It also helps the researchers to find information which was once difficult to find by allowing one to tag documents with molecular entities and integrating with image recognition software to find information from pdf documents.</description>
        <link>http://www.jcheminf.com/content/4/1/1</link>
                <dc:creator>Kartik Tallapragada</dc:creator>
                <dc:creator>Joseph Chewning</dc:creator>
                <dc:creator>David Kombo</dc:creator>
                <dc:creator>Beverly Ludwick</dc:creator>
                <dc:source>Journal of Cheminformatics 2012, null:1</dc:source>
        <dc:date>2012-01-12T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1758-2946-4-1</dc:identifier>
                                <prism:require>/content/figures/1758-2946-4-1-toc.gif</prism:require>
                <prism:publicationName>Journal of Cheminformatics</prism:publicationName>
        <prism:issn>1758-2946</prism:issn>
        <prism:volume>${item.volume}</prism:volume>
        <prism:startingPage>1</prism:startingPage>
        <prism:publicationDate>2012-01-12T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>PDF</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.jcheminf.com/content/4/1/2">
        <title>Predicting the mechanism of phospholipidosis</title>
        <description>The mechanism of phospholipidosis is still not well understood. Numerous different mechanisms have been proposed, varying from direct inhibition of the breakdown of phospholipids to the binding of a drug compound to the phospholipid, preventing breakdown. We have used a probabilistic method, the Parzen-Rosenblatt Window approach, to build a model from the ChEMBL dataset which can predict from a compound&apos;s structure both its primary pharmaceutical target and other targets with which it forms off-target, usually weaker, interactions. Using a small dataset of 182 phospholipidosis-inducing and non-inducing compounds, we predict their off-target activity against targets which could relate to phospholipidosis as a side-effect of a drug. We link these targets to specific mechanisms of inducing this lysosomal build-up of phospholipids in cells. Thus, we show that the induction of phospholipidosis is likely to occur by separate mechanisms when triggered by different cationic amphiphilic drugs. We find that both inhibition of phospholipase activity and enhanced cholesterol biosynthesis are likely to be important mechanisms. Furthermore, we provide evidence suggesting four specific protein targets. Sphingomyelin phosphodiesterase, phospholipase A2 and lysosomal phospholipase A1 are shown to be likely targets for the induction of phospholipidosis by inhibition of phospholipase activity, while lanosterol synthase is predicted to be associated with phospholipidosis being induced by enhanced cholesterol biosynthesis. This analysis provides the impetus for further experimental tests of these hypotheses.</description>
        <link>http://www.jcheminf.com/content/4/1/2</link>
                <dc:creator>Robert Lowe</dc:creator>
                <dc:creator>Hamse Mussa</dc:creator>
                <dc:creator>Florian Nigsch</dc:creator>
                <dc:creator>Robert Glen</dc:creator>
                <dc:creator>John Mitchell</dc:creator>
                <dc:source>Journal of Cheminformatics 2012, null:2</dc:source>
        <dc:date>2012-01-26T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1758-2946-4-2</dc:identifier>
                            <dc:title>Predicting the mechanism of phospholipidosis</dc:title>
                            <dc:description>An in silico approach was used to predict targets for phospholipidosis, a lysosomal disorder characterized by accumulation of phospholipids in tissues. By predicting targets for a database of compounds, they can be ranked by their potential to cause phospholipidosis</dc:description>
                <prism:require>/content/figures/1758-2946-4-2-toc.gif</prism:require>
                <prism:publicationName>Journal of Cheminformatics</prism:publicationName>
        <prism:issn>1758-2946</prism:issn>
        <prism:volume>${item.volume}</prism:volume>
        <prism:startingPage>2</prism:startingPage>
        <prism:publicationDate>2012-01-26T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>PDF</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.jcheminf.com/content/3/1/33">
        <title>Open Babel: An open chemical toolbox</title>
        <description>Background:
A frequent problem in computational modeling is the interconversion of chemical structures between different formats. While standard interchange formats exist (for example, Chemical Markup Language) and de facto standards have arisen (for example, SMILES format), the need to interconvert formats is a continuing problem due to the multitude of different application areas for chemistry data, differences in the data stored by different formats (0D versus 3D, for example), and competition between software along with a lack of vendor-neutral formats.
Results:
We discuss, for the first time, Open Babel, an open-source chemical toolbox that speaks the many languages of chemical data. Open Babel version 2.3 interconverts over 110 formats. The need to represent such a wide variety of chemical and molecular data requires a library that implements a wide range of cheminformatics algorithms, from partial charge assignment and aromaticity detection, to bond order perception and canonicalization. We detail the implementation of Open Babel, describe key advances in the 2.3 release, and outline a variety of uses both in terms of software products and scientific research, including applications far beyond simple format interconversion.
Conclusions:
Open Babel presents a solution to the proliferation of multiple chemical file formats. In addition, it provides a variety of useful utilities from conformer searching and 2D depiction, to filtering, batch conversion, and substructure and similarity searching. For developers, it can be used as a programming library to handle chemical data in areas such as organic chemistry, drug design, materials science, and computational chemistry. It is freely available under an open-source license from http://openbabel.org.</description>
        <link>http://www.jcheminf.com/content/3/1/33</link>
                <dc:creator>Noel O'Boyle</dc:creator>
                <dc:creator>Michael Banck</dc:creator>
                <dc:creator>Craig James</dc:creator>
                <dc:creator>Chris Morley</dc:creator>
                <dc:creator>Tim Vandermeersch</dc:creator>
                <dc:creator>Geoffrey Hutchison</dc:creator>
                <dc:source>Journal of Cheminformatics 2011, null:33</dc:source>
        <dc:date>2011-10-07T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1758-2946-3-33</dc:identifier>
                            <dc:title>Open Babel: An open chemical toolbox</dc:title>
                            <dc:description>The first publication reporting the features, implementation and validation of the open source chemical toolbox - Open Babel - is described, which includes a summary of key advances in the 2.3 release</dc:description>
                <prism:require>/content/figures/1758-2946-3-33-toc.gif</prism:require>
                <prism:publicationName>Journal of Cheminformatics</prism:publicationName>
        <prism:issn>1758-2946</prism:issn>
        <prism:volume>${item.volume}</prism:volume>
        <prism:startingPage>33</prism:startingPage>
        <prism:publicationDate>2011-10-07T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>XML</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.jcheminf.com/content/3/1/54">
        <title>New developments on the cheminformatics open workflow environment CDK-Taverna</title>
        <description>Background:
The computational processing and analysis of small molecules is at heart of cheminformatics and structural bioinformatics and their application in e.g. metabolomics or drug discovery. Pipelining or workflow tools allow for the Lego(TM)-like, graphical assembly of I/O modules and algorithms into a complex workflow which can be easily deployed, modified and tested without the hassle of implementing it into a monolithic application. The CDK-Taverna project aims at building a free open-source cheminformatics pipelining solution through combination of different open-source projects such as Taverna, the Chemistry Development Kit (CDK) or the Waikato Environment for Knowledge Analysis (WEKA). A first integrated version 1.0 of CDK-Taverna was recently released to the public.
Results:
The CDK-Taverna project was migrated to the most up-to-date versions of its foundational software libraries with a complete re-engineering of its worker&apos;s architecture (version 2.0). 64-bit computing and multi-core usage by paralleled threads are now supported to allow for fast in-memory processing and analysis of large sets of molecules. Earlier deficiencies like workarounds for iterative data reading are removed. The combinatorial chemistry related reaction enumeration features are considerably enhanced. Additional functionality for calculating a natural product likeness score for small molecules is implemented to identify possible drug candidates. Finally the data analysis capabilities are extended with new workers that provide access to the open-source WEKA library for clustering and machine learning as well as training and test set partitioning. The new features are outlined with usage scenarios.
Conclusions:
CDK-Taverna 2.0 as an open-source cheminformatics workflow solution matured to become a freely available and increasingly powerful tool for the biosciences. The combination of the new CDK-Taverna worker family with the already available workflows developed by a lively Taverna community and published on myexperiment.org enables molecular scientists to quickly calculate, process and analyse molecular data as typically found in e.g. today&apos;s systems biology scenarios.</description>
        <link>http://www.jcheminf.com/content/3/1/54</link>
                <dc:creator>Andreas Truszkowski</dc:creator>
                <dc:creator>Kalai Vanii Jayaseelan</dc:creator>
                <dc:creator>Stefan Neumann</dc:creator>
                <dc:creator>Egon Willighagen</dc:creator>
                <dc:creator>Achim Zielesny</dc:creator>
                <dc:creator>Christoph Steinbeck</dc:creator>
                <dc:source>Journal of Cheminformatics 2011, null:54</dc:source>
        <dc:date>2011-12-13T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1758-2946-3-54</dc:identifier>
                            <dc:title>New developments on CDK-Taverna</dc:title>
                            <dc:description>The most up to date version of the CDK-Taverna project is described, which aims at building a free open-source cheminformatics pipelining solution through a combination of different open-source projects</dc:description>
                <prism:require>/content/figures/1758-2946-3-54-toc.gif</prism:require>
                <prism:publicationName>Journal of Cheminformatics</prism:publicationName>
        <prism:issn>1758-2946</prism:issn>
        <prism:volume>${item.volume}</prism:volume>
        <prism:startingPage>54</prism:startingPage>
        <prism:publicationDate>2011-12-13T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>PDF</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.jcheminf.com/content/1/1/8">
        <title>Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions</title>
        <description>Background:
A method to estimate ease of synthesis (synthetic accessibility) of drug-like molecules is needed in many areas of the drug discovery process. The development and validation of such a method that is able to characterize molecule synthetic accessibility as a score between 1 (easy to make) and 10 (very difficult to make) is described in this article.
Results:
The method for estimation of the synthetic accessibility score (SAscore) described here is based on a combination of fragment contributions and a complexity penalty. Fragment contributions have been calculated based on the analysis of one million representative molecules from PubChem and therefore one can say that they capture historical synthetic knowledge stored in this database. The molecular complexity score takes into account the presence of non-standard structural features, such as large rings, non-standard ring fusions, stereocomplexity and molecule size. The method has been validated by comparing calculated SAscores with ease of synthesis as estimated by experienced medicinal chemists for a set of 40 molecules. The agreement between calculated and manually estimated synthetic accessibility is very good with r2 = 0.89.
Conclusion:
A novel method to estimate synthetic accessibility of molecules has been developed. This method uses historical synthetic knowledge obtained by analyzing information from millions of already synthesized chemicals and considers also molecule complexity. The method is sufficiently fast and provides results consistent with estimation of ease of synthesis by experienced medicinal chemists. The calculated SAscore may be used to support various drug discovery processes where a large number of molecules needs to be ranked based on their synthetic accessibility, for example when purchasing samples for screening, selecting hits from high-throughput screening for follow-up, or ranking molecules generated by various de novo design approaches.</description>
        <link>http://www.jcheminf.com/content/1/1/8</link>
                <dc:creator>Peter Ertl</dc:creator>
                <dc:creator>Ansgar Schuffenhauer</dc:creator>
                <dc:source>Journal of Cheminformatics 2009, null:8</dc:source>
        <dc:date>2009-06-10T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1758-2946-1-8</dc:identifier>
                                <prism:require>/content/figures/1758-2946-1-8-toc.gif</prism:require>
                <prism:publicationName>Journal of Cheminformatics</prism:publicationName>
        <prism:issn>1758-2946</prism:issn>
        <prism:volume>${item.volume}</prism:volume>
        <prism:startingPage>8</prism:startingPage>
        <prism:publicationDate>2009-06-10T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>XML</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.jcheminf.com/content/4/1/3">
        <title>LICSS - A chemical spreadsheet in Microsoft Excel</title>
        <description>Background:
Representations of chemical datasets in spreadsheet format are important for ready data assimilation and manipulation.  In addition to the normal spreadsheet facilities, chemical spreadsheets need to have visualisable chemical structures and data searchable by chemical as well as textual queries.  Many such chemical spreadsheet tools are available, some operating in the familiar Microsoft Excel environment.  However, within this group, the performance of Excel is often compromised, particularly in terms of the number of compounds which can usefully be stored on a sheet.SummaryLICSS is a lightweight chemical spreadsheet within Microsoft Excel for Windows.  LICSS stores structures solely as Smiles strings.  Chemical operations are carried out by calling Java code modules which use the CDK, JChemPaint and OPSIN libraries to provide cheminformatics functionality.  Compounds in sheets or charts may be visualised (individually or en masse), and sheets may be searched by substructure or similarity.  All the molecular descriptors available in CDK may be calculated for compounds (in batch or on-the-fly), and various cheminformatic operations such as fingerprint calculation, Sammon mapping, clustering and R group table creation may be carried out.We detail here the features of LICSS and how they are implemented.  We also explain the design criteria, particularly in terms of potential corporate use, which led to this particular implementation.
Conclusions:
LICSS is an Excel-based chemical spreadsheet with a difference:* 	It can usefully be used on sheets containing hundreds of thousands of compounds; it doesn&apos;t compromise the normal performance of Microsoft Excel* 	It is designed to be installed and run in environments in which users do not have admin privileges; installation involves merely file copying, and sharing of LICSS sheets invokes automatic installation* 	It is free and extensibleLICSS is open source software and we hope sufficient detail is provided here to enable developers to add their own features and share with the community.</description>
        <link>http://www.jcheminf.com/content/4/1/3</link>
                <dc:creator>Kevin Lawson</dc:creator>
                <dc:creator>Jonty Lawson</dc:creator>
                <dc:source>Journal of Cheminformatics 2012, null:3</dc:source>
        <dc:date>2012-02-02T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1758-2946-4-3</dc:identifier>
                            <dc:title>A chemical spreadsheet in Excel</dc:title>
                            <dc:description>A lightweight open source chemical spreadsheet has been developed that runs within Microsoft Excel and can be used on sheets containing hundreds of thousands of compounds without compromising normal performance</dc:description>
                <prism:require>/content/figures/1758-2946-4-3-toc.gif</prism:require>
                <prism:publicationName>Journal of Cheminformatics</prism:publicationName>
        <prism:issn>1758-2946</prism:issn>
        <prism:volume>${item.volume}</prism:volume>
        <prism:startingPage>3</prism:startingPage>
        <prism:publicationDate>2012-02-02T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>PDF</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.jcheminf.com/content/3/1/37">
        <title>Open Data, Open Source and Open Standards in chemistry: The Blue Obelisk five years on</title>
        <description>Background:
The Blue Obelisk movement was established in 2005 as a response to the lack of Open Data, Open Standards and Open Source (ODOSOS) in chemistry. It aims to make it easier to carry out chemistry research by promoting interoperability between chemistry software, encouraging cooperation between Open Source developers, and developing community resources and Open Standards.
Results:
This contribution looks back on the work carried out by the Blue Obelisk in the past 5 years and surveys progress and remaining challenges in the areas of Open Data, Open Standards, and Open Source in chemistry.
Conclusions:
We show that the Blue Obelisk has been very successful in bringing together researchers and developers with common interests in ODOSOS, leading to development of many useful resources freely available to the chemistry community.</description>
        <link>http://www.jcheminf.com/content/3/1/37</link>
                <dc:creator>Noel O'Boyle</dc:creator>
                <dc:creator>Rajarshi Guha</dc:creator>
                <dc:creator>Egon Willighagen</dc:creator>
                <dc:creator>Samuel Adams</dc:creator>
                <dc:creator>Jonathan Alvarsson</dc:creator>
                <dc:creator>Jean-Claude Bradley</dc:creator>
                <dc:creator>Igor Filippov</dc:creator>
                <dc:creator>Robert Hanson</dc:creator>
                <dc:creator>Marcus Hanwell</dc:creator>
                <dc:creator>Geoffrey Hutchison</dc:creator>
                <dc:creator>Craig James</dc:creator>
                <dc:creator>Nina Jeliazkova</dc:creator>
                <dc:creator>Andrew Lang</dc:creator>
                <dc:creator>Karol Langner</dc:creator>
                <dc:creator>David Lonie</dc:creator>
                <dc:creator>Daniel Lowe</dc:creator>
                <dc:creator>Jerome Pansanel</dc:creator>
                <dc:creator>Dmitry Pavlov</dc:creator>
                <dc:creator>Ola Spjuth</dc:creator>
                <dc:creator>Christoph Steinbeck</dc:creator>
                <dc:creator>Adam Tenderholt</dc:creator>
                <dc:creator>Kevin Theisen</dc:creator>
                <dc:creator>Peter Murray-Rust</dc:creator>
                <dc:source>Journal of Cheminformatics 2011, null:37</dc:source>
        <dc:date>2011-10-14T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1758-2946-3-37</dc:identifier>
                            <dc:title>The Blue Obelisk five years on</dc:title>
                            <dc:description>The work carried out by the Blue Obelisk movement over the last five years is described, with a discussion of the current progress and future challenges in Open Data, Open Standards, and Open Source in chemistry</dc:description>
                <prism:require>/content/figures/1758-2946-3-37-toc.gif</prism:require>
                <prism:publicationName>Journal of Cheminformatics</prism:publicationName>
        <prism:issn>1758-2946</prism:issn>
        <prism:volume>${item.volume}</prism:volume>
        <prism:startingPage>37</prism:startingPage>
        <prism:publicationDate>2011-10-14T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>XML</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.jcheminf.com/content/2/1/1">
        <title>Molecular structure input on the web</title>
        <description>A molecule editor, that is program for input and editing of molecules, is an indispensable part of every cheminformatics or molecular processing system. This review focuses on a special type of molecule editors, namely those that are used for molecule structure input on the web. Scientific computing is now moving more and more in the direction of web services and cloud computing, with servers scattered all around the Internet. Thus a web browser has become the universal scientific user interface, and a tool to edit molecules directly within the web browser is essential.The review covers a history of web-based structure input, starting with simple text entry boxes and early molecule editors based on clickable maps, before moving to the current situation dominated by Java applets. One typical example - the popular JME Molecule Editor - will be described in more detail. Modern Ajax server-side molecule editors are also presented. And finally, the possible future direction of web-based molecule editing, based on technologies like JavaScript and Flash, is discussed.</description>
        <link>http://www.jcheminf.com/content/2/1/1</link>
                <dc:creator>Peter Ertl</dc:creator>
                <dc:source>Journal of Cheminformatics 2010, null:1</dc:source>
        <dc:date>2010-02-02T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1758-2946-2-1</dc:identifier>
                                <prism:require>/content/figures/1758-2946-2-1-toc.gif</prism:require>
                <prism:publicationName>Journal of Cheminformatics</prism:publicationName>
        <prism:issn>1758-2946</prism:issn>
        <prism:volume>${item.volume}</prism:volume>
        <prism:startingPage>1</prism:startingPage>
        <prism:publicationDate>2010-02-02T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>XML</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.jcheminf.com/content/1/1/21">
        <title>Virtual screening of bioassay data</title>
        <description>Background:
There are three main problems associated with the virtual screening of bioassay data. The first is access to freely-available curated data, the second is the number of false positives that occur in the physical primary screening process, and finally the data is highly-imbalanced with a low ratio of Active compounds to Inactive compounds. This paper first discusses these three problems and then a selection of Weka cost-sensitive classifiers (Naive Bayes, SVM, C4.5 and Random Forest) are applied to a variety of bioassay datasets.
Results:
Pharmaceutical bioassay data is not readily available to the academic community. The data held at PubChem is not curated and there is a lack of detailed cross-referencing between Primary and Confirmatory screening assays. With regard to the number of false positives that occur in the primary screening process, the analysis carried out has been shallow due to the lack of cross-referencing mentioned above. In six cases found, the average percentage of false positives from the High-Throughput Primary screen is quite high at 64%. For the cost-sensitive classification, Weka&apos;s implementations of the Support Vector Machine and C4.5 decision tree learner have performed relatively well. It was also found, that the setting of the Weka cost matrix is dependent on the base classifier used and not solely on the ratio of class imbalance.
Conclusions:
Understandably, pharmaceutical data is hard to obtain. However, it would be beneficial to both the pharmaceutical industry and to academics for curated primary screening and corresponding confirmatory data to be provided. Two benefits could be gained by employing virtual screening techniques to bioassay data. First, by reducing the search space of compounds to be screened and secondly, by analysing the false positives that occur in the primary screening process, the technology may be improved. The number of false positives arising from primary screening leads to the issue of whether this type of data should be used for virtual screening. Care when using Weka&apos;s cost-sensitive classifiers is needed - across the board misclassification costs based on class ratios should not be used when comparing differing classifiers for the same dataset.</description>
        <link>http://www.jcheminf.com/content/1/1/21</link>
                <dc:creator>Amanda Schierz</dc:creator>
                <dc:source>Journal of Cheminformatics 2009, null:21</dc:source>
        <dc:date>2009-12-22T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1758-2946-1-21</dc:identifier>
                                <prism:require>/content/figures/1758-2946-1-21-toc.gif</prism:require>
                <prism:publicationName>Journal of Cheminformatics</prism:publicationName>
        <prism:issn>1758-2946</prism:issn>
        <prism:volume>${item.volume}</prism:volume>
        <prism:startingPage>21</prism:startingPage>
        <prism:publicationDate>2009-12-22T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>XML</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.jcheminf.com/content/3/1/19">
        <title>Linked open drug data for pharmaceutical research and development</title>
        <description>There is an abundance of information about drugs available on the Web. Data sources range from medicinal chemistry results, over the impact of drugs on gene expression, to the outcomes of drugs in clinical trials. These data are typically not connected together, which reduces the ease with which insights can be gained. Linking Open Drug Data (LODD) is a task force within the World Wide Web Consortium&apos;s (W3C) Health Care and Life Sciences Interest Group (HCLS IG). LODD has surveyed publicly available data about drugs, created Linked Data representations of the data sets, and identified interesting scientific and business questions that can be answered once the data sets are connected. The task force provides recommendations for the best practices of exposing data in a Linked Data representation. In this paper, we present past and ongoing work of LODD and discuss the growing importance of Linked Data as a foundation for pharmaceutical R&amp;D data sharing.</description>
        <link>http://www.jcheminf.com/content/3/1/19</link>
                <dc:creator>Matthias Samwald</dc:creator>
                <dc:creator>Anja Jentzsch</dc:creator>
                <dc:creator>Christopher Bouton</dc:creator>
                <dc:creator>Claus Stie Kallesoe</dc:creator>
                <dc:creator>Egon Willighagen</dc:creator>
                <dc:creator>Janos Hajagos</dc:creator>
                <dc:creator>M Scott Marshall</dc:creator>
                <dc:creator>Eric Prud'hommeaux</dc:creator>
                <dc:creator>Oktie Hassenzadeh</dc:creator>
                <dc:creator>Elgar Pichler</dc:creator>
                <dc:creator>Susie Stephens</dc:creator>
                <dc:source>Journal of Cheminformatics 2011, null:19</dc:source>
        <dc:date>2011-05-16T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1758-2946-3-19</dc:identifier>
                                <prism:require>/content/figures/1758-2946-3-19-toc.gif</prism:require>
                <prism:publicationName>Journal of Cheminformatics</prism:publicationName>
        <prism:issn>1758-2946</prism:issn>
        <prism:volume>${item.volume}</prism:volume>
        <prism:startingPage>19</prism:startingPage>
        <prism:publicationDate>2011-05-16T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>XML</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <cc:License rdf:about="http://creativecommons.org/licenses/by/2.0/">
        <cc:permits rdf:resource="http://creativecommons.org/ns#Reproduction" />
        <cc:permits rdf:resource="http://creativecommons.org/ns#Distribution" />
        <cc:permits rdf:resource="http://creativecommons.org/ns#DerivativeWorks" />
    </cc:License>
</rdf:RDF>

