<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1758-2946-2-5</ui>
   <ji>1758-2946</ji>
   <fm>
      <dochead>Methodology</dochead>
      <bibl>
         <title>
            <p>Towards interoperable and reproducible QSAR analyses: Exchange of datasets</p>
         </title>
         <aug>
            <au ca="yes" id="A1">
               <snm>Spjuth</snm>
               <fnm>Ola</fnm>
               <insr iid="I1"/>
               <email>ola.spjuth@farmbio.uu.se</email>
            </au>
            <au id="A2">
               <snm>Willighagen</snm>
               <mi>L</mi>
               <fnm>Egon</fnm>
               <insr iid="I1"/>
               <email>egon.willighagen@farmbio.uu.se</email>
            </au>
            <au id="A3">
               <snm>Guha</snm>
               <fnm>Rajarshi</fnm>
               <insr iid="I2"/>
               <email>guhar@mail.nih.gov</email>
            </au>
            <au id="A4">
               <snm>Eklund</snm>
               <fnm>Martin</fnm>
               <insr iid="I1"/>
               <email>martin.eklund@farmbio.uu.se</email>
            </au>
            <au id="A5">
               <snm>Wikberg</snm>
               <mi>ES</mi>
               <fnm>Jarl</fnm>
               <insr iid="I1"/>
               <email>jarl.wikberg@farmbio.uu.se</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden</p>
            </ins>
            <ins id="I2">
               <p>NIH Chemical Genomics Center, 9800 Medical Center Drive, Rockville, MD 20850, USA</p>
            </ins>
         </insg>
         <source>Journal of Cheminformatics</source>
         <issn>1758-2946</issn>
         <pubdate>2010</pubdate>
         <volume>2</volume>
         <issue>1</issue>
         <fpage>5</fpage>
         <url>http://www.jcheminf.com/content/2/1/5</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="doi">10.1186/1758-2946-2-5</pubid>
               <pubid idtype="pmpid">20591161</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>19</day>
               <month>3</month>
               <year>2010</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>30</day>
               <month>6</month>
               <year>2010</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>30</day>
               <month>6</month>
               <year>2010</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2010</year>
         <collab>Spjuth et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>QSAR is a widely used method to relate chemical structures to responses or properties based on experimental observations. Much effort has been made to evaluate and validate the statistical modeling in QSAR, but these analyses treat the dataset as fixed. An overlooked but highly important issue is the validation of the setup of the dataset, which comprises addition of chemical structures as well as selection of descriptors and software implementations prior to calculations. This process is hampered by the lack of standards and exchange formats in the field, making it virtually impossible to reproduce and validate analyses and drastically constrain collaborations and re-use of data.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR datasets, consisting of an open XML format (QSAR-ML) which builds on an open and extensible descriptor ontology. The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a dataset described by QSAR-ML makes its setup completely reproducible. We also provide a reference implementation as a set of plugins for Bioclipse which simplifies setup of QSAR datasets, and allows for exporting in QSAR-ML as well as old-fashioned CSV formats. The implementation facilitates addition of new descriptor implementations from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusions</p>
               </st>
               <p>Standardized QSAR datasets open up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible creation of datasets, solving the problems of defining which software components were used and their versions, and the descriptor ontology eliminates confusions regarding descriptors by defining them crisply. This makes is easy to join, extend, combine datasets and hence work collectively, but also allows for analyzing the effect descriptors have on the statistical model's performance. The presented Bioclipse plugins equip scientists with graphical tools that make QSAR-ML easily accessible for the community.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Quantitative Structure-Activity Relationship (QSAR) modeling is a ligand-based approach to quantitatively correlate chemical structure with a response, such as biological activity or chemical reactivity. The process is widely adopted and has for example been used to model carcinogenecity <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>, toxicity <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>, and solubility <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>. Further, the literature is replete with QSAR studies covering problems in lead optimization <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, fragrance design, and detection of doping in sports <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. In QSAR, chemical structures are expressed as descriptors, which are numerical representations such as calculated properties or enumerated fragments. Descriptors and response values are concatenated into a dataset, and statistical methods are commonly used to build predictive models of these.</p>
         <p>There exist many examples of investigations regarding the resulting statistical models with respect to validity and applicability in QSAR and similar fields <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>. However, most of these investigations consider the dataset as fixed, and the choice of descriptors and implementations is left outside the analysis.</p>
         <p>Part of the problem is the lack of a controlled vocabulary regarding descriptors; there is no easy way of defining what descriptors were used, which the underlying algorithms were, and how these were implemented. It is common to use several different software packages with results manually glued together in spreadsheets, sometimes with custom in-house calculated descriptors. The lack of a unifying standard and an exchange format means that QSAR datasets are published in articles without clear rules, usually as data matrices of precalculated descriptors, with chemical structures in a separate file.</p>
         <p>The field of bioinformatics has acknowledged the standardization problem to a much larger extent than cheminformatics. Numerous standards, ontologies, and exchange formats have been proposed and agreed upon in various domains. The Minimum Information standards are examples that specify the minimum amount of meta data and data required to meet a specific aim. The MGED consortium pioneered this in bioinformatics with Minimum Information About Microarray Experiements (MIAME) <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, and it has now become a requirement that data from microarray experiments must be deposited in MIAME-compliant public repositories in the MAGE-ML exchange format <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>, in order to be published in most journals. Standardization initiatives in cheminformatics are not as common, even though the problem of incompatible file formats and standards has been frequently discussed <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. Grammatica <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> has addressed the issue of QSAR model validation and notes that descriptor versioning as well as precisely defined algorithmic specifications are vital for developing QSAR models that can be considered reliable, robust, and reproducible (in addition to the usual issues of statistical rigor).</p>
         <p>Initiatives that work towards standardizing cheminformatis in general include the Blue Obelisk, an internet group which promotes open data, open source, and open standards in cheminformatics <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, which has proposed dictionaries for algorithms and implementations suitable for QSAR. Distributed Structure-Searchable Toxicity (DSSTox) Database Network has proposed standardized structure-data files (SDF) as a file format for exchanging raw data in toxicological SAR analyses <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. This approach does however not include any information regarding descriptors, and SDF is a legacy text format which has many variants. OECD has established rules and formats for how to report QSAR models and QSAR predictions <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>, but its intended use is communication, not complete technical coverage. It also lacks an ontology, which makes interpretation and reasoning around results much more complicated and subjective. Public repositories of QSAR datasets are limited to a few internet resources (e.g. <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> and <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>) where they are usually not deposited but reproduced from articles by others than the original authors, and due to the lack of an established exchange format and missing raw data, structures are sometimes redrawn, data manually copied from articles, and educated guesses are made in some cases. QSAR DataBank <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> is a proposal for the electronic organization and archiving of QSAR model information. It is an interesting initiative that builds on other standards, but also lacks an ontology for descriptors. The OpenTox project is another project developing a framework to share QSAR datasets using REST services <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>.</p>
         <p>In general it is not uncommon that information about what software package that was used for descriptor calculation (and its version) is unavailable, and that custom descriptors have been added manually or results preprocessed. To further complicate matters, many QSAR software packages are proprietary, closed source, and it is a non-trivial task (sometimes impossible) to get insights into how algorithms are implemented. Due to these impracticalities, journals are limited to establishing simple rules for QSAR publications such as to state that structures should be publicly available <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>.</p>
         <p>A well-defined standard with a corresponding exchange format will have problems getting accepted in the scientific community if user-friendly tools supporting them are not available. This paper introduces a file format for exchanging QSAR datasets, together with tools implemented in the graphical workbench Bioclipse <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp> to facilitate working with QSAR according to the standard.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>QSAR-ML - an exchange format for QSAR</p>
            </st>
            <p>We designed an XML-based exchange format (named QSAR-ML) with the aim to completely cover all aspects of dataset setup, including chemical structures, descriptors, software implementations, and response values. A simplified structure of QSAR-ML can be seen in Figure <figr fid="F1">1</figr>.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>QSAR-ML core structures</p>
               </caption>
               <text>
                  <p><b>QSAR-ML core structures</b>. An simplified diagram of the structure of QSAR-ML using Crow's Foot notation, with references using dotted lines.</p>
               </text>
               <graphic file="1758-2946-2-5-1" hint_layout="double"/>
            </fig>
            <p><it>Structures </it>define the chemical structures in the QSAR dataset and contains InChI <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> to ensure integrity; if a structure changes, then the QSAR-ML file can report this. <it>Structures </it>are referenced parts of a <it>Resource</it>, which is a file referenced by path or URL and also contains a checksum that can be used to verify the integrity of the files. <it>Resources </it>are in turn contained in a <it>StructureList</it>.</p>
            <p><it>Descriptors </it>are uniquely defined by referencing the Blue Obelisk Descriptor Ontology (BODO) <abbrgrp><abbr bid="B26">26</abbr></abbrgrp> and are contained in a <it>DescriptorList </it>. A <it>Descriptor </it>can also have a set of <it>Parameters</it>, which for example can be settings for the descriptor. A <it>DescriptorProvider </it>denotes a versioned software implementation, which provide implementations of descriptor algorithms.</p>
            <p><it>Responses </it>are the measured QSAR endpoints (response variable). They reference a <it>Structure </it>and a <it>ResponseUnit </it>(for example IC<sub>50 </sub>or LD<sub>50</sub>), and are contained in a <it>ResponseList</it>.</p>
            <p><it>DescriptorResults </it>are the results of a descriptor calculation on a structure, and links a <it>DescriptorValue </it>to a <it>Descriptor-Structure </it>pair. <it>DescriptorResults </it>are contained in a <it>DescriptorResultList</it>.</p>
            <p><it>Metadata </it>includes information about authors, license, description, and also contains optional <it>References</it>. The latest version of the QSAR-ML schema and documentation is available from the QSAR-ML website <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Reference Implementation</p>
            </st>
            <p>While QSAR-ML is technology neutral, a reference implementation of tools to set up QSAR datasets complying with QSAR-ML was constructed as a set of plugins for Bioclipse <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. The implementation allows for straightforward creation, loading, saving, editing, and export of QSAR-ML files (see Figure <figr fid="F2">2</figr>).</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Overview of the Bioclipse QSAR-ML implementation</p>
               </caption>
               <text>
                  <p><b>Overview of the Bioclipse QSAR-ML implementation</b>. The reference implementation of QSAR-ML is constructed as a set of plugins for Bioclipse and allows for graphical setup of datasets. Chemical structures can be imported via drag and drop or a graphical wizard. Descriptors can be selected from the descriptor ontology. Local and remote descriptor providers contribute descriptor implementations which could run on the local computer or accessed via Web services. It is also possible to add biological responses and metadata, and export the complete dataset in QSAR-ML as well as in a comma-separated file.</p>
               </text>
               <graphic file="1758-2946-2-5-2" hint_layout="double"/>
            </fig>
            <p>Using graphical wizards and drag and drop, users can easily set up new QSAR analyses, add molecules, select descriptors and implementations with optional parameters, import or add response values, and the calculations can be carried out in the background (see Figure <figr fid="F3">3</figr>). It is very easy to export QSAR-ML for import in other QSAR-ML compliant software.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Selecting descriptors from the Blue Obelisk Descriptor Ontology in Bioclipse</p>
               </caption>
               <text>
                  <p><b>Selecting descriptors from the Blue Obelisk Descriptor Ontology in Bioclipse</b>. Screenshot from Bioclipse showing selection of descriptors (lower middle), the generated dataset in a spreadsheet (top middle), the Help View (right) showing interactive help for descriptors, and the Progress View indicating the progress of the descriptor calculations (lower bottom).</p>
               </text>
               <graphic file="1758-2946-2-5-3" hint_layout="double"/>
            </fig>
            <p>The Bioclipse-QSAR feature supports multiple descriptor providers; the only requirement is that the software must be able to accept one or many chemical structures, and deliver descriptors in a deterministic fashion that can be accessed either programmatically or via a batch job (e.g. shell script). Bioclipse-QSAR also support calling descriptor calculations deployed as W3C Web services <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>, REST <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>, and XMPP cloud services <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. To add new descriptors to Bioclipse, the descriptor should preferably be registered in the Blue Obelisk Descriptor Ontology, but it could also be added to Bioclipse via a separate file.</p>
            <p>Bioclipse-QSAR comes with the Chemistry Development Kit (CDK) <abbrgrp><abbr bid="B31">31</abbr></abbrgrp> and JOELib <abbrgrp><abbr bid="B32">32</abbr></abbrgrp> integrated as local descriptor providers, supplying descriptor implementations with optional parameters that are run in the same computer and hence do not require network connection. Remote Web services of the CDK descriptors are available as REST <abbrgrp><abbr bid="B33">33</abbr></abbrgrp> and XMPP services (see Methods section). It is also possible to use the QSAR feature in the Bioclipse Scripting Language <abbrgrp><abbr bid="B34">34</abbr></abbrgrp> for setting up datasets.</p>
            <p>The Bioclipse-QSAR feature is available via the software update menu option in Bioclipse, from the main Bioclipse Update Site. Bioclipse and the Bioclipse-QSAR feature are released under Eclipse Public License <abbrgrp><abbr bid="B35">35</abbr></abbrgrp> plus an exception to allow GPL-licensed Bioclipse plugins (see <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>). EPL is a flexible open source license that can be extended by both open source as well as commercially licensed plugins.</p>
         </sec>
         <sec>
            <st>
               <p>Sample datasets</p>
            </st>
            <p>For demonstration purposes, the chemical structures for a subset of the Sutherland datasets <abbrgrp><abbr bid="B36">36</abbr></abbrgrp> were subjected to descriptor calculations for selected CDK descriptors and are available in QSAR-ML and archived Bioclipse projects at the QSAR-ML website <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>The QSAR-ML exchange format together with the Blue Obelisk Descriptor Ontology has many implications. To the best of our knowledge it is the first initiative which encompasses completely reproducible definition of QSAR datasets, including descriptor definitions and implementations. QSAR-ML is equipped with built-in properties to ensure integrity and consistency of included resources. For example are molecular resources appended with generated InChI, which can be used to verify the integrity of the chemical structures such as accidental changes or errors when transmitting data over networks. That descriptors in QSAR-ML are defined in Blue Obelisk Descriptor Ontology means that they have a formalized and clear meaning, and are uniquely referenced. Defining descriptor implementations by software name, version and identifier, and connecting this information with an entry in the descriptor ontology, uniquely defines, and makes it possible to accurately reproduce descriptor calculations. Open standards, a defined terminology, and reproducible results allows people to have trust in publicly available datasets and reconstruction of such datasets, and hence improves the reliability of the subsequent statistical analysis. Much research has been done on various aspects of QSAR modeling, such as validation, robustness, and domain applicability of models <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. This is not covered here as it is a research topic of its own, but we stress that the handling of the original chemical structures as well as the choice and implementation of descriptors are of great importance <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. This is a neglected topic, and QSAR-ML sets new standards for the field. A reproducible dataset setup enables validation not only of the resulting datasets, but allows for inclusion of e.g. chemical variability and descriptor selections with respect to model robustness and performance inside a cross validation loop <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. There is a large amount of descriptors available, and people continuously improve existing descriptors and develop new ones. An exchange format capable of harmonizing this requires an extensible architecture in order to be successful, and also intuitive tools that make this easily available for scientists. QSAR-ML, implemented as an XML Schema, and BODO, implemented in the Web Ontology Language <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>, fulfills this demand of extensibility. We would like to point out that this is a proposal for an open standard and that we welcome suggestions to improve the specification further.</p>
         <p>The Bioclipse-QSAR feature turns Bioclipse into a workbench which greatly simplifies the setup of QSAR datasets, with full support for the QSAR-ML exchange format. Bioclipse also supports many other features which are common in QSAR projects, such as conversion, editing, and visualization of chemical structures. Rich clients are software applications that take full advantage of today's modern desktop computers, but also leverages on the new e-Science tools such as online (Web) services. The Bioclipse-QSAR is a formidable example of this; Descriptors can be calculated on the local computer while, if connected to a network, remote services can provide additional descriptors or offer high performance computers for speeding up demanding calculations.</p>
         <p>There would be great rewards if QSAR-ML is widely adopted by the scientific community. For example, users could download entire QSAR datasets and reuse it together with in-house data, extend existing models, join different models, search for overlap between datasets, collaborate, reproduce, and validate results. Further, QSAR-ML enables the establishment of public repositories of QSAR datasets. We envision that deposition of QSAR models in such repositories will become a standard operation procedure prior to future publication of QSAR models and results, similar to microarray experiments in bioinformatics, and that QSAR-ML is a strong candidate for such a format.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusions</p>
         </st>
         <p>We describe a new exchange format for QSAR datasets, named QSAR-ML, which relies on the Blue Obelisk Descriptor Ontology for uniquely defining descriptors, and supports any implementation of these. QSAR-ML comprises all data and metadata required to reproduce the setup of QSAR datasets, enabling validation of chemical structures and descriptor calculations. Sharing QSAR datasets in an open, standardized format has profound implications for collaboration and information validity and reuse. We also describe a QSAR plugin for Bioclipse with full support for QSAR-ML, which greatly simplifies setting up QSAR datasets using graphical user interfaces. The implementation integrates with other cheminformatics component that are valuable in dataset preparation, such as database searching as well as editing and visualization of chemical structures.</p>
         <p>Future plans include addition of subsequent statistical analysis into the QSAR-ML format and hence not only support dataset setup but also model fitting and prediction. We also aim at setting up a public repository with means for publishing QSAR-ML datasets, which is a first step towards public repositories for sharing QSAR on a global level, and could provide the basis for supplemental data in future QSAR publications.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>XML Schema</p>
            </st>
            <p>XML is an extensible markup language that is widely used in bioinformatics as an easy to use and standardized way to store self-describing data <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. W3C XML Schema <abbrgrp><abbr bid="B40">40</abbr></abbrgrp> is used in this work to define rules in QSAR-ML, such as required elements and data types. It can also be used to validate an XML document to ensure that it conforms to the rules. The latest version of the QSAR-ML schema together with documentation is available on the QSAR-ML website <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Bioclipse</p>
            </st>
            <p>Bioclipse <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> is a graphical workbench for life science which is equipped with features required for many common cheminformatics tasks, such as loading and converting between file formats, editing of chemical structures, interactive visualisation in 2D/3D, and editing of compound collections. Bioclipse is implemented as a Rich Client based on Eclipse <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>, and is equipped with advanced plugin architecture which makes it easy to add new descriptor providers (for example third party software or custom implementations), and allows users to cherry-pick descriptors and implementations for the current analysis (see Figure <figr fid="F3">3</figr>).</p>
         </sec>
         <sec>
            <st>
               <p>CDK descriptors</p>
            </st>
            <p>The Chemistry Development Kit (CDK) <abbrgrp><abbr bid="B31">31</abbr></abbrgrp> aims to provide a comprehensive collection of descriptors <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>. In contrast to many other packages, the CDK provides descriptors for molecules, bonds and atoms. While most QSAR analyses make use of molecular descriptors, the presence of the other descriptor types allows for novel approaches to QSAR modeling. Given that many thousands of descriptors have been described in the literature <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>, CDK is focused on descriptors that have been used in numerous studies. Many of these descriptors derive from the ADAPT package <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>. Broadly, the descriptors can be categorized into four main groups: constitutional (which consider various atom and bond counts), topological (which consider 2 D connectivity), geometric (which consider the 3 D spatial arrangement of a molecule) and electronic (which consider electronic properties of the molecule). There are a total of 44 descriptor classes. It should be noted that each descriptor may actually generate multiple values. Thus the total number of descriptor values that can be calculated is much higher (in the order of 280 descriptors).</p>
         </sec>
         <sec>
            <st>
               <p>JOELib descriptors</p>
            </st>
            <p>JOELib is an open source Java cheminformatics library <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. A Bioclipse plugin for JOELib was constructed and provides ten QSAR descriptors, some of which overlap with the CDK descriptors. However, JOELib also provides a few unique descriptors, including a LogP descriptor implementing an atomic contribution algorithm <abbrgrp><abbr bid="B45">45</abbr></abbrgrp> and two SMARTS-based fragment count descriptors counting the number of acidic and basic groups.</p>
         </sec>
         <sec>
            <st>
               <p>Remote REST and XMPP services</p>
            </st>
            <p>REST services for CDK descriptors <abbrgrp><abbr bid="B33">33</abbr></abbrgrp> are available from <url>http://ws1.bmc.uu.se:8182/cdk/descriptors</url>, which conforms to REST principles <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. The return values from the REST services are in a custom XML format which is very minimal and thus extraction of descriptor values is trivial. The REST based services result in much simpler programmatic access and reduce the number of dependencies in client code than for example SOAP services <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. XMPP cloud services with IO-DATA <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> is a novel technology that allows for discoverable, asynchronous Web services. XMPP services for calculating several CDK descriptors are available from the XMPP server <url>http://ws1.bmc.uu.se</url>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Competing interests</p>
         </st>
         <p>The authors declare that they have no competing interests.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>OS and ME designed the QSAR-ML. EW and RG designed and implemented the BODO. OS implemented the Bioclipse-QSAR plugins. EW implemented the JOELib plugin. JW supervised the project. All authors read and approved the final manuscript.</p>
      </sec>
      <sec>
         <st>
            <p>Acknowledgements</p>
         </st>
         <p>The authors would like to thank all the people who have contributed to the Blue Obelisk Descriptor Ontology and the Bioclipse project, as well as Anders L&#246;vgren at the computing department at Uppsala Biomedical Center (BMC) for hosting the CDK REST and XMPP services.</p>
         <p>This work was supported by the Swedish VR (04X-05957) and Uppsala University (KoF 07).</p>
      </sec>
   </bdy>
   <bm>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Lazy Structure-Activity Relationships (LAZAR) for the Prediction of Rodent Carcinogenicity and Salmonella Mutagenicity</p>
            </title>
            <aug>
               <au>
                  <snm>Helma</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Molecular Diversity</source>
            <pubdate>2006</pubdate>
            <volume>10</volume>
            <fpage>147</fpage>
            <lpage>158</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s11030-005-9001-5</pubid>
                  <pubid idtype="pmpid" link="fulltext">16721629</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Quantitative Structure -Carcinogenicity Relationship for Detecting Structural Alerts in Nitroso Compounds: Species, Rat; Sex, Female; Route of Administration, Gavage</p>
            </title>
            <aug>
               <au>
                  <snm>Helguera</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Gonzalez</snm>
                  <fnm>MP</fnm>
               </au>
               <au>
                  <snm>Dias Soeiro Cordeiro</snm>
                  <fnm>MN</fnm>
               </au>
               <au>
                  <snm>Cabrera Perez</snm>
                  <fnm>MA</fnm>
               </au>
            </aug>
            <source>Chem Res Toxicol</source>
            <pubdate>2008</pubdate>
            <volume>21</volume>
            <issue>3</issue>
            <fpage>633</fpage>
            <lpage>642</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/tx700336n</pubid>
                  <pubid idtype="pmpid" link="fulltext">18293904</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Toward a Class-Independent Quantitative Structure-Activity Relationship Model for Uncouplers of Oxidative Phosphorylation</p>
            </title>
            <aug>
               <au>
                  <snm>Spycher</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Smejtek</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Netzeva</snm>
                  <fnm>TI</fnm>
               </au>
               <au>
                  <snm>Escher</snm>
                  <fnm>BI</fnm>
               </au>
            </aug>
            <source>Chem Res Toxicol</source>
            <pubdate>2008</pubdate>
            <volume>21</volume>
            <issue>4</issue>
            <fpage>911</fpage>
            <lpage>927</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/tx700391f</pubid>
                  <pubid idtype="pmpid" link="fulltext">18358007</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Utilizing High Throughput Screening Data for Predictive Toxicology Models: Protocols and Application to MLSCN Assays</p>
            </title>
            <aug>
               <au>
                  <snm>Guha</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Sch&#252;rer</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>J Comp Aid Molec Des</source>
            <pubdate>2008</pubdate>
            <volume>22</volume>
            <issue>6-7</issue>
            <fpage>367</fpage>
            <lpage>384</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1007/s10822-008-9192-9</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>A Computational Model for the Prediction of Aqueous Solubility That Includes Crystal Packing, Intrinsic Solubility, and Ionization Effects</p>
            </title>
            <aug>
               <au>
                  <snm>Johnson</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Murphy</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Gudmundsson</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>Mol Pharmaceutics</source>
            <pubdate>2007</pubdate>
            <volume>4</volume>
            <issue>4</issue>
            <fpage>513</fpage>
            <lpage>523</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1021/mp070030+</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Prediction of Aqueous Solubility of Organic Compounds Based on 3 D Structure Representation</p>
            </title>
            <aug>
               <au>
                  <snm>Yan</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gasteiger</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>J Chem Inf Comput Sci</source>
            <pubdate>2003</pubdate>
            <volume>43</volume>
            <fpage>429</fpage>
            <lpage>434</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12653505</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Exploiting QSAR models in lead optimization</p>
            </title>
            <aug>
               <au>
                  <snm>Gedeck</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Lewis</snm>
                  <fnm>RA</fnm>
               </au>
            </aug>
            <source>Curr Opin Drug Discov Devel</source>
            <pubdate>2008</pubdate>
            <volume>11</volume>
            <issue>4</issue>
            <fpage>569</fpage>
            <lpage>575</lpage>
            <xrefbib>
               <pubid idtype="pmpid">18600573</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Chemoinformatics-Based Classification of Prohibited Substances Employed for Doping in Sport</p>
            </title>
            <aug>
               <au>
                  <snm>Cannon</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Bender</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Palmer</snm>
                  <fnm>Aand</fnm>
               </au>
               <au>
                  <snm>Mitchell</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>J Chem Inf Model</source>
            <pubdate>2006</pubdate>
            <volume>46</volume>
            <issue>6</issue>
            <fpage>2369</fpage>
            <lpage>2380</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/ci0601160</pubid>
                  <pubid idtype="pmpid" link="fulltext">17125180</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Beware of q<sup>2</sup>!</p>
            </title>
            <aug>
               <au>
                  <snm>Golbraikh</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Tropsha</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>J Mol Graph Model</source>
            <pubdate>2002</pubdate>
            <volume>20</volume>
            <issue>4</issue>
            <fpage>269</fpage>
            <lpage>276</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S1093-3263(01)00123-1</pubid>
                  <pubid idtype="pmpid" link="fulltext">11858635</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>The C1C2: a framework for simultaneous model selection and assessment</p>
            </title>
            <aug>
               <au>
                  <snm>Eklund</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Spjuth</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Wikberg</snm>
                  <fnm>JE</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2008</pubdate>
            <volume>9</volume>
            <fpage>360</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2105-9-360</pubid>
                  <pubid idtype="pmcid">2556350</pubid>
                  <pubid idtype="pmpid" link="fulltext">18761753</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Minimum information about a microarray experiment (MIAME)-toward standards for microarray data</p>
            </title>
            <aug>
               <au>
                  <snm>Brazma</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hingamp</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Quackenbush</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Sherlock</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Spellman</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Stoeckert</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Aach</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Ansorge</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Ball</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Causton</snm>
                  <fnm>HC</fnm>
               </au>
               <au>
                  <snm>Gaasterland</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Glenisson</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Holstege</snm>
                  <fnm>FC</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>IF</fnm>
               </au>
               <au>
                  <snm>Markowitz</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Matese</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Parkinson</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Robinson</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Sarkans</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Schulze-Kremer</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Stewart</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Vilo</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Vingron</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <issue>4</issue>
            <fpage>365</fpage>
            <lpage>371</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng1201-365</pubid>
                  <pubid idtype="pmpid" link="fulltext">11726920</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Design and implementation of microarray gene expression markup language (MAGE-ML)</p>
            </title>
            <aug>
               <au>
                  <snm>Spellman</snm>
                  <fnm>PT</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Stewart</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Troup</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Sarkans</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Chervitz</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bernhart</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Sherlock</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Ball</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Lepage</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Swiatek</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Marks</snm>
                  <fnm>WL</fnm>
               </au>
               <au>
                  <snm>Goncalves</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Markel</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Iordan</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Shojatalab</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Pizarro</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hubley</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Deutsch</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Senger</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Aronow</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Robinson</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bassett</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Stoeckert</snm>
                  <fnm>CJJ</fnm>
               </au>
               <au>
                  <snm>Brazma</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <issue>9</issue>
            <fpage>RESEARCH0046</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/gb-2002-3-9-research0046</pubid>
                  <pubid idtype="pmcid">126871</pubid>
                  <pubid idtype="pmpid" link="fulltext">12225585</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Chemoinformatics-a new name for an old problem?</p>
            </title>
            <aug>
               <au>
                  <snm>Hann</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Curr Opin Chem Biol</source>
            <pubdate>1999</pubdate>
            <volume>3</volume>
            <issue>4</issue>
            <fpage>379</fpage>
            <lpage>383</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S1367-5931(99)80057-X</pubid>
                  <pubid idtype="pmpid" link="fulltext">10419846</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Principles of QSAR Models Validation: Internal and External</p>
            </title>
            <aug>
               <au>
                  <snm>Gramatica</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>QSAR Comb Sci</source>
            <pubdate>2007</pubdate>
            <volume>26</volume>
            <issue>5</issue>
            <fpage>694</fpage>
            <lpage>701</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1002/qsar.200610151</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>The Blue Obelisk-interoperability in chemical informatics</p>
            </title>
            <aug>
               <au>
                  <snm>Guha</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Howard</snm>
                  <fnm>MT</fnm>
               </au>
               <au>
                  <snm>Hutchison</snm>
                  <fnm>GR</fnm>
               </au>
               <au>
                  <snm>Murray-Rust</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Rzepa</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Steinbeck</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Wegner</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Willighagen</snm>
                  <fnm>EL</fnm>
               </au>
            </aug>
            <source>J Chem Inf Model</source>
            <pubdate>2006</pubdate>
            <volume>46</volume>
            <issue>3</issue>
            <fpage>991</fpage>
            <lpage>998</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/ci050400b</pubid>
                  <pubid idtype="pmpid" link="fulltext">16711717</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Distributed structure-searchable toxicity (DSSTox) public database network: a proposal</p>
            </title>
            <aug>
               <au>
                  <snm>Richard</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Williams</snm>
                  <fnm>CR</fnm>
               </au>
            </aug>
            <source>Mutat Res</source>
            <pubdate>2002</pubdate>
            <volume>499</volume>
            <fpage>27</fpage>
            <lpage>52</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11804603</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>QSAR Reporting Formats and JRC QSAR Model Database</p>
            </title>
            <url>http://ecb.jrc.ec.europa.eu/qsar/qsar-tools/index.php?c=QRF</url>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Cheminformatics.org</p>
            </title>
            <url>http://cheminformatics.org/datasets/</url>
         </bibl>
         <bibl id="B19">
            <title>
               <p>QSAR World Data Sets</p>
            </title>
            <url>http://www.qsarworld.com/qsar-datasets.php</url>
         </bibl>
         <bibl id="B20">
            <title>
               <p>QSAR DataBank</p>
            </title>
            <url>http://qsardb.org</url>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Collaborative Development of Predictive Toxicology Applications</p>
            </title>
            <aug>
               <au>
                  <snm>Hardy</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Douglas</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Helma</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Rautenberg</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Jeliazkova</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Jeliazkov</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Nikolova</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Benigni</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Tcheremenskaia</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Kramer</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Girschick</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Buchwald</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Wicker</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Karwath</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>G&#252;tlein</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Maunz</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Sarimveis</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Melagraki</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Afantitis</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Sopasakis</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Gallagher</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Poroikov</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Filimonov</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Zakharov</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lagunin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gloriozova</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Novikov</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Skvortsova</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Druzhilovsky</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Chawla</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ghosh</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Ray</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Patel</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Escher</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>J Chemical Informatics</source>
            <inpress/>
         </bibl>
         <bibl id="B22">
            <title>
               <p>QSAR/QSPR and Proprietary Data</p>
            </title>
            <aug>
               <au>
                  <snm>Jorgensen</snm>
                  <fnm>WL</fnm>
               </au>
            </aug>
            <source>J Chem Inf Model</source>
            <pubdate>2006</pubdate>
            <volume>46</volume>
            <issue>3</issue>
            <fpage>937</fpage>
            <xrefbib>
               <pubid idtype="doi">10.1021/ci0680079</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Bioclipse: an open source workbench for chemo-and bioinformatics</p>
            </title>
            <aug>
               <au>
                  <snm>Spjuth</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Helmus</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Willighagen</snm>
                  <fnm>EL</fnm>
               </au>
               <au>
                  <snm>Kuhn</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Eklund</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Wagener</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Murray-Rust</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Steinbeck</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Wikberg</snm>
                  <fnm>JES</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <fpage>59</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2105-8-59</pubid>
                  <pubid idtype="pmcid">1808478</pubid>
                  <pubid idtype="pmpid" link="fulltext">17316423</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Bioclipse: Integration of Data and Software in the Life Sciences</p>
            </title>
            <aug>
               <au>
                  <snm>Spjuth</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>PhD thesis</source>
            <publisher>Uppsala Univeristy</publisher>
            <pubdate>2009</pubdate>
         </bibl>
         <bibl id="B25">
            <title>
               <p>An Open Standard for Chemical Structure Representation - The IUPAC Chemical Identifier</p>
            </title>
            <aug>
               <au>
                  <snm>Stein</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Heller</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Tchekhovski</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Nimes International Chemical Information Conference Proceedings</source>
            <pubdate>2003</pubdate>
            <fpage>131</fpage>
            <lpage>143</lpage>
            <url>http://www.iupac.org/inchi/Stein-2003-ref1.html</url>
         </bibl>
         <bibl id="B26">
            <title>
               <p>The Blue Obelisk Descriptor Ontology</p>
            </title>
            <aug>
               <au>
                  <snm>Floris</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Willighagen</snm>
                  <fnm>EL</fnm>
               </au>
               <au>
                  <snm>Guha</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Rojas</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hoppe</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Tech rep, The Blue Obelisk</source>
            <pubdate>2010</pubdate>
            <volume>218</volume>
         </bibl>
         <bibl id="B27">
            <title>
               <p>QSAR-ML</p>
            </title>
            <url>http://pele.farmbio.uu.se/qsar-ml</url>
         </bibl>
         <bibl id="B28">
            <title>
               <p>W3C Web Services</p>
            </title>
            <url>http://www.w3.org/2002/ws/</url>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Architectural Styles and the Design of Network-based Software Architectures</p>
            </title>
            <aug>
               <au>
                  <snm>Fielding</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>PhD thesis</source>
            <publisher>University of California, Irvine</publisher>
            <pubdate>2000</pubdate>
         </bibl>
         <bibl id="B30">
            <title>
               <p>XMPP for cloud computing in bioinformatics supporting discovery and invocation of asynchronous Web services</p>
            </title>
            <aug>
               <au>
                  <snm>Wagener</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Spjuth</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Willighagen</snm>
                  <fnm>EL</fnm>
               </au>
               <au>
                  <snm>Wikberg</snm>
                  <fnm>JES</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2009</pubdate>
            <volume>10</volume>
            <fpage>279</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2105-10-279</pubid>
                  <pubid idtype="pmcid">2755485</pubid>
                  <pubid idtype="pmpid" link="fulltext">19732427</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>The Chemistry Development Kit (CDK): an open-source Java library for Chemo- and Bioinformatics</p>
            </title>
            <aug>
               <au>
                  <snm>Steinbeck</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Han</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Kuhn</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Horlacher</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Luttmann</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Willighagen</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>J Chem Inf Comput Sci</source>
            <pubdate>2003</pubdate>
            <volume>43</volume>
            <issue>2</issue>
            <fpage>493</fpage>
            <lpage>500</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12653513</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Data Mining und Graph Mining auf molekularen Graphen - Cheminformatik und molekulare Kodierungen f&#252;r ADME/Tox-QSAR-Analysen</p>
            </title>
            <aug>
               <au>
                  <snm>Wegner</snm>
                  <fnm>JK</fnm>
               </au>
            </aug>
            <source>PhD thesis</source>
            <publisher>Eberhard-Karls-Universit&#228;t T&#252;bingen, T&#252;bingen, Germany</publisher>
            <pubdate>2006</pubdate>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Web service infrastructure for chemoinformatics</p>
            </title>
            <aug>
               <au>
                  <snm>Dong</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Gilbert</snm>
                  <fnm>KE</fnm>
               </au>
               <au>
                  <snm>Guha</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Heiland</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Pierce</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Fox</snm>
                  <fnm>GC</fnm>
               </au>
               <au>
                  <snm>Wild</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>J Chem Inf Model</source>
            <pubdate>2007</pubdate>
            <volume>47</volume>
            <issue>4</issue>
            <fpage>1303</fpage>
            <lpage>1307</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/ci6004349</pubid>
                  <pubid idtype="pmpid" link="fulltext">17602467</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Bioclipse 2: A scriptable integration platform for the life sciences</p>
            </title>
            <aug>
               <au>
                  <snm>Spjuth</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Alvarsson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Berg</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Eklund</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kuhn</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>M&#228;sak</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Torrance</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Wagener</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Willighagen</snm>
                  <fnm>EL</fnm>
               </au>
               <au>
                  <snm>Steinbeck</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Wikberg</snm>
                  <fnm>JES</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2009</pubdate>
            <volume>10</volume>
            <fpage>397</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2105-10-397</pubid>
                  <pubid idtype="pmcid">2799422</pubid>
                  <pubid idtype="pmpid" link="fulltext">19958528</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Eclipse Public License</p>
            </title>
            <url>http://www.eclipse.org/legal/epl-v10.html</url>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Spline-Fitting with a Genetic Algorithm: A Method for Developing Classification Structure-Activity Relationships</p>
            </title>
            <aug>
               <au>
                  <snm>Sutherland</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>O'Brien</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Weaver</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>J Chem Inf Comput Sci</source>
            <pubdate>2003</pubdate>
            <volume>43</volume>
            <fpage>1906</fpage>
            <lpage>1915</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">14632439</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Are the Chemical Structures in Your QSAR Correct</p>
            </title>
            <aug>
               <au>
                  <snm>Young</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Martin</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Venkatapathy</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Harten</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>QSAR Comb Sci</source>
            <pubdate>2008</pubdate>
            <volume>27</volume>
            <issue>11-12</issue>
            <fpage>1337</fpage>
            <lpage>1345</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1002/qsar.200810084</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>OWL Web Ontology Language Overview</p>
            </title>
            <aug>
               <au>
                  <snm>McGuinness</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>van Harmelen</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>W3C recommendation, W3C</source>
            <pubdate>2004</pubdate>
            <url>Http://www.w3.org/TR/2004/REC-owl-features-20040210/</url>
         </bibl>
         <bibl id="B39">
            <title>
               <p>XML schemas for common bioinformatic data types and their application in workflow systems</p>
            </title>
            <aug>
               <au>
                  <snm>Seibel</snm>
                  <fnm>PN</fnm>
               </au>
               <au>
                  <snm>Kruger</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hartmeier</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Schwarzer</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Lowenthal</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Mersch</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Dandekar</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Giegerich</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>490</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2105-7-490</pubid>
                  <pubid idtype="pmcid">2001303</pubid>
                  <pubid idtype="pmpid" link="fulltext">17087823</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>XML Schema language</p>
            </title>
            <url>http://www.w3.org/XML/Schema</url>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Eclipse</p>
            </title>
            <url>http://www.eclipse.org</url>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Recent developments of the chemistry development kit (CDK) - an open-source java library for chemo- and bioinformatics</p>
            </title>
            <aug>
               <au>
                  <snm>Steinbeck</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Hoppe</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Kuhn</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Floris</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Guha</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Willighagen</snm>
                  <fnm>EL</fnm>
               </au>
            </aug>
            <source>Current pharmaceutical design</source>
            <pubdate>2006</pubdate>
            <volume>12</volume>
            <issue>17</issue>
            <fpage>2111</fpage>
            <lpage>2120</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.2174/138161206777585274</pubid>
                  <pubid idtype="pmpid" link="fulltext">16796559</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <aug>
               <au>
                  <snm>Todeschini</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Consonni</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Handbook of Molecular Descriptors</source>
            <publisher>Berlin: Wiley-VCH</publisher>
            <pubdate>2002</pubdate>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Studies of Chemical Structure Biological Activity Relations Using Patter Recognition</p>
            </title>
            <aug>
               <au>
                  <snm>Jurs</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Chou</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Yuan</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Computer Assisted Drug Design</source>
            <publisher>Washington D.C.: American Chemical Society</publisher>
            <editor>Olsen E, Christoffersen R</editor>
            <pubdate>1979</pubdate>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Prediction of Physicochemical Parameters by Atomic Contributions</p>
            </title>
            <aug>
               <au>
                  <snm>Wildman</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Crippen</snm>
                  <fnm>GM</fnm>
               </au>
            </aug>
            <source>Journal of Chemical Information and Computer Sciences</source>
            <pubdate>1999</pubdate>
            <volume>39</volume>
            <issue>5</issue>
            <fpage>868</fpage>
            <lpage>873</lpage>
         </bibl>
      </refgrp>
   </bm>
</art>

