<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1758-2946-1-18</ui>
   <ji>1758-2946</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Software platform virtualization in chemistry research and university teaching</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Kind</snm>
               <fnm>Tobias</fnm>
               <insr iid="I1"/>
               <email>tkind@ucdavis.edu</email>
            </au>
            <au id="A2">
               <snm>Leamy</snm>
               <fnm>Tim</fnm>
               <insr iid="I2"/>
               <email>tcleamy@ucdavis.edu</email>
            </au>
            <au id="A3">
               <snm>Leary</snm>
               <mi>A</mi>
               <fnm>Julie</fnm>
               <insr iid="I3"/>
               <email>jaleary@ucdavis.edu</email>
            </au>
            <au ca="yes" id="A4">
               <snm>Fiehn</snm>
               <fnm>Oliver</fnm>
               <insr iid="I1"/>
               <email>ofiehn@ucdavis.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>UC Davis Genome Center, Metabolomics, 451 Health Sci Drive, Davis, California, 95616, USA</p>
            </ins>
            <ins id="I2">
               <p>UC Davis IET, Academic Technology Services, Surge II, Hutchison Drive, Davis, California, 95616, USA</p>
            </ins>
            <ins id="I3">
               <p>UC Davis, Department of Molecular and Cellular Biology, 1 Shields Rd, Davis, California, 95616, USA</p>
            </ins>
         </insg>
         <source>Journal of Cheminformatics</source>
         <issn>1758-2946</issn>
         <pubdate>2009</pubdate>
         <volume>1</volume>
         <issue>1</issue>
         <fpage>18</fpage>
         <url>http://www.jcheminf.com/content/1/1/18</url>
         <xrefbib>
            
         <pubidlist><pubid idtype="pmpid">20150997</pubid><pubid idtype="doi">10.1186/1758-2946-1-18</pubid></pubidlist></xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>20</day>
               <month>8</month>
               <year>2009</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>16</day>
               <month>11</month>
               <year>2009</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>16</day>
               <month>11</month>
               <year>2009</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2009</year>
         <collab>Kind et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Modern chemistry laboratories operate with a wide range of software applications under different operating systems, such as Windows, LINUX or Mac OS X. Instead of installing software on different computers it is possible to install those applications on a single computer using Virtual Machine software. Software platform virtualization allows a single guest operating system to execute multiple other operating systems on the same computer. We apply and discuss the use of virtual machines in chemistry research and teaching laboratories.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Virtual machines are commonly used for cheminformatics software development and testing. Benchmarking multiple chemistry software packages we have confirmed that the computational speed penalty for using virtual machines is low and around 5% to 10%. Software virtualization in a teaching environment allows faster deployment and easy use of commercial and open source software in hands-on computer teaching labs.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Software virtualization in chemistry, mass spectrometry and cheminformatics is needed for software testing and development of software for different operating systems. In order to obtain maximum performance the virtualization software should be multi-core enabled and allow the use of multiprocessor configurations in the virtual machine environment. Server consolidation, by running multiple tasks and operating systems on a single physical machine, can lead to lower maintenance and hardware costs especially in small research labs. The use of virtual machines can prevent software virus infections and security breaches when used as a sandbox system for internet access and software testing. Complex software setups can be created with virtual machines and are easily deployed later to multiple computers for hands-on teaching classes. We discuss the popularity of bioinformatics compared to cheminformatics as well as the missing cheminformatics education at universities worldwide.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification id="endnote" subtype="user_supplied_xml" type="bmc"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Introduction</p>
         </st>
         <p>"Virtual machines have finally arrived. Dismissed for a number of years as merely academic curiosities, they are now seen as cost-effective techniques for organizing computer systems resources to provide extraordinary system flexibility and support for certain unique applications." This statement from one of the pioneers of virtualization (Goldberg 1974 <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>) is equally true 35 years later and the hype generated by the computer science community and software companies require special attention especially in chemistry and life sciences because virtual machines have undoubtedly and truly arrived. More than 50 million hits in Google and around 1000 scientific papers (see Figure <figr fid="F1">1</figr>) show the significance of software virtualization. This technology allowed companies like VMWare to reach the top five of all software companies within 10 years reaching a market capitalization of almost 20 billion dollars in 2008. Little is known about applications and use of software virtualization in life sciences and especially chemistry. In this paper we discuss basic parts of the technology; investigate the performance of chemistry software and discuss advantages and disadvantages of virtual machine applications in chemistry, cheminformatics <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>, chemometrics <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, mass spectrometry laboratories and university teaching classes.</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>Number of scientific papers and citations about virtualization and virtual machines</p>
            </caption>
            <text>
               <p><b>Number of scientific papers and citations about virtualization and virtual machines</b>. Source ISI Web of Science January 2009.</p>
            </text>
            <graphic file="1758-2946-1-18-1"/>
         </fig>
         <sec>
            <st>
               <p>Software platform virtualization</p>
            </st>
            <p>Software platform virtualization <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> (system virtual machine) allows a single host operating system to execute multiple guest operating systems on the same computer without rebooting. For example a computer running Microsoft Windows can run an independent LINUX operating system in a separate window or vice versa. Also a computer running Mac OS X can use virtualization programs to independently run operating systems like LINUX or Microsoft Windows in a separate window with the same behavior like a native Macintosh application (see Figure <figr fid="F2">2</figr>). The term application virtualization <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> (process virtual machine) refers to products such as the JAVA virtual machine, as a result JAVA applications can be run on different operating systems (write once, run anywhere). One reason to use platform virtualization is that programs compiled for a specific operating system can only run under the same operating system. Therefore software programs which are compiled for Microsoft Windows only run under Microsoft Windows, unless any other bytecode emulator or translation layer such as WINE is used <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. For software exchange between the host and guest operating system a shared folder can be used or software can be moved with the mouse via drag-and-drop or additional network connections. Other operational modes include hypervisor server virtualization (Hyper-V) <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> or hardware virtualization concepts for better performance, such as Intel Virtualization Technology (Intel-VT) and AMD Virtualization (AMD-V).</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>The virtual machine software installed on a host operating system allows the use of different operating systems on a single computer system</p>
               </caption>
               <text>
                  <p><b>The virtual machine software installed on a host operating system allows the use of different operating systems on a single computer system</b>. A Macintosh system could run native Windows or LINUX software or even multiple instances of the same operating system. All virtual machines can communicate with each other and are allowed to use all hardware computer resources such as graphic cards, DVD drives and USB ports. (Logo sources: Wikipedia, TUX mascot: Larry Ewing).</p>
               </text>
               <graphic file="1758-2946-1-18-2"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Virtual machines exist for Windows, LINUX and MAC OS X</p>
            </st>
            <p>Around fifty commercial or free open-source solutions are currently available <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. We discuss some of the commonly used desktop virtual machines (VMs) as shown in Table <tblr tid="T1">1</tblr>. For the Windows platform the VMware Workstation for Windows and the free Microsoft Virtual PC are among the most popular programs. For LINUX as host the VMware Workstation for LINUX and XEN are commonly used. For Mac OS X as host VMware's Fusion, Parallels Desktop and the open source Sun Microsystems' VirtualBox are available. Although virtually every operating system can be used both as host and as guest, running Mac OS X (Tiger) as a guest operating system in a virtual machine is currently prohibited by the software maker Apple Inc. Only the server operating system Mac OS X Server 10.5 (Leopard) allows virtualization, but only if original Apple hardware is used. In case different virtual machine software solutions are installed, it is sometimes required to convert virtual disk images to allow the use with other virtual machines <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. Such converter software tools can convert VMware, Microsoft Virtual PC, Citrix XenServe, Virtual Iron and even backup solution images from Acronis True Image or Symantec Ghost <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. Figure <figr fid="F3">3</figr> shows six different guest operating systems running simultaneously on a Windows Vista host computer workstation using Sun's VirtualBox.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>A Windows Vista host using Sun's VirtualBox runs three UBUNTU Linuxes, one WIN XP, one Windows VISTA and one Windows Server guest operating system simultaneously</p>
               </caption>
               <text>
                  <p><b>A Windows Vista host using Sun's VirtualBox runs three UBUNTU Linuxes, one WIN XP, one Windows VISTA and one Windows Server guest operating system simultaneously</b>. The hardware is an Intel Nehalem Core i7 950 quad core CPU (3 GHz) with 12 GByte RAM and 4 hard disks in RAID10. The system virtualizes a total number of 41 CPUs.</p>
               </text>
               <graphic file="1758-2946-1-18-3"/>
            </fig>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>List of common desktop virtual machines for Windows, LINUX and Mac OS X operating systems.</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Host OS</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Virtualization Software</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>WINDOWS as Guest OS</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>LINUX as Guest OS</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Mac OS X as Guest OS</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Windows OS</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>VMware Workstation</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>yes</p>
                     </c>
                     <c ca="center">
                        <p>no</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Microsoft Virtual PC</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>yes</p>
                     </c>
                     <c ca="center">
                        <p>no</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>SUN Virtual BOX</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>yes</p>
                     </c>
                     <c ca="center">
                        <p>no</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>LINUX OS</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>VMWare</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>yes</p>
                     </c>
                     <c ca="center">
                        <p>no</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Citrix XEN</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>yes</p>
                     </c>
                     <c ca="center">
                        <p>no</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Virtual Iron</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>yes</p>
                     </c>
                     <c ca="center">
                        <p>no</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>MAC OS</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>VMWare Fusion</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>yes</p>
                     </c>
                     <c ca="center">
                        <p>yes*</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Parallels Server</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>yes</p>
                     </c>
                     <c ca="center">
                        <p>yes*</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Mac OS X can in principle run on any host, but it is not officially supported. A star (*) denotes license issues (January 2009).</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Server consolidation by using virtual machine software</p>
            </st>
            <p>The term server consolidation refers to the concept of replacing a number of older computers with a single multi-core or multi-CPU system <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. For example eight computers each having oneGByte memory and one single CPU, could be replaced by a single powerful computer with a dual quad-core-CPU setup and a total of 8 GByte memory. The initial aim of server consolidation is to save energy as well as hardware and maintenance costs. Energy can be saved by using newly designed processors with a better performance per Watt ratio. Operating and management costs can be saved because the systems administrator only has to deal with a single physical computer instead of multiple computers <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. By using hypervisor server virtualization software a series of different operating systems can be installed into independent virtual machines. The right picture in Figure <figr fid="F4">4</figr> shows a XEN Hypervisor (Virtual Machine Manager) with 17 independent virtual machines across two physical servers. The hardware setup for all virtual machine installations is the same, resulting in fewer problems with different software drivers for components such as graphic cards, network adapters and hard disks.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Server consolidation: A single powerful computer runs multiple virtual machines and serves as compute server, backup server and web server</p>
               </caption>
               <text>
                  <p><b>Server consolidation: A single powerful computer runs multiple virtual machines and serves as compute server, backup server and web server</b>. Such a setup improves maintenance efficiency and reduces hardware costs. The right picture shows a production server with a XEN Virtual Machine Monitor and 17 independent running systems (Actual VM names were replaced; Picture source: Zhi-Wei Lu; UC Davis Genome Center Bioinformatics Core).</p>
               </text>
               <graphic file="1758-2946-1-18-4"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Software virtualization and multiple operating systems in chemistry</p>
            </st>
            <p>The literature coverage of virtualization concepts in chemistry is extremely sparse. One paper discusses the use of virtualization on supercomputers for molecular dynamics calculations utilizing more than 1550 processors <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. Another chemistry related publication discusses the use of virtualization software in the pharmaceutical industry for virtual screening and lead optimization in a grid-like environment <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. The use of VMs among software designers may be higher and is not fully reported in the peer-reviewed literature. The biggest advantage of installing virtual machine software is the ability to run applications from different operating systems on a single computer. Another advantage is to test software without the need of installing software into a working production environment. A similar approach is used with live-CDs that contain a bootable operating system with pre-installed software, such as the free Vigyaan electronic workbench for bioinformatics, computational biology and computational chemistry <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. Such live-CDs can be easily mounted to virtual machines without needing to reboot the original production system. Furthermore, the Microsoft Windows operating system is known to slow down after installation of hundreds of software tools. That can be prevented by installing software into a virtual machine. We will investigate possible speed penalties during the use of virtual machines with a series of scientific benchmarks (see Table <tblr tid="T2">2</tblr>). For software development purposes software virtualization is used to compile native solutions and test software on different operating systems. Besides that, different software versions can show incompatibilities with data files created with different software versions. In such a case old software must be un-installed and new software must be re-installed. In case of platform virtualization, every new software version is installed into a single independent operating system. Every system change can be completely reversed with the included differential snap-shot system. In a university teaching environment platform virtualization is a fast way to deploy copies of the same installation file to multiple computers in a classroom.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>List of system statistics and micro-benchmarks comparing an original Windows XP performance and Windows XP inside a virtual machine (Guest OS).</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>
                           <b>ID</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Task</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>WINDOWS XP Host</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>WINDOWS XP</b>
                        </p>
                        <p>
                           <b>Guest VM</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>of Guest VM</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left" cspan="2">
                        <p>
                           <b>System benchmarks</b>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>Operating system start time</p>
                     </c>
                     <c ca="center">
                        <p>2 min</p>
                     </c>
                     <c ca="center">
                        <p>1 min</p>
                     </c>
                     <c ca="center">
                        <p>50% less time</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>2</p>
                     </c>
                     <c ca="left">
                        <p>Size of windows system folder</p>
                     </c>
                     <c ca="center">
                        <p>6.95 GByte</p>
                     </c>
                     <c ca="center">
                        <p>3.01 GByte</p>
                     </c>
                     <c ca="center">
                        <p>57% less space</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>RAM memory requirement (IDLE)</p>
                     </c>
                     <c ca="center">
                        <p>760 MByte</p>
                     </c>
                     <c ca="center">
                        <p>150 MByte</p>
                     </c>
                     <c ca="center">
                        <p>80% less RAM</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>4</p>
                     </c>
                     <c ca="left">
                        <p>Average hard disk transfer rate</p>
                     </c>
                     <c ca="center">
                        <p>180 MByte/sec</p>
                     </c>
                     <c ca="center">
                        <p>127 MByte/sec</p>
                     </c>
                     <c ca="center">
                        <p>70%</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left" cspan="2">
                        <p>
                           <b>Single CPU core benchmarks</b>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>5</p>
                     </c>
                     <c ca="left">
                        <p>NIST SciMark 2.0a (JAVA 1.6 Server)</p>
                     </c>
                     <c ca="center">
                        <p>score of 661</p>
                     </c>
                     <c ca="center">
                        <p>score of 621</p>
                     </c>
                     <c ca="center">
                        <p>94%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>6</p>
                     </c>
                     <c ca="left">
                        <p>Molgen Demo - count all23862255 isomers of C12H12</p>
                     </c>
                     <c ca="center">
                        <p>42.23 sec</p>
                     </c>
                     <c ca="center">
                        <p>46.20 sec</p>
                     </c>
                     <c ca="center">
                        <p>91%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>7</p>
                     </c>
                     <c ca="left">
                        <p>CDK Descriptor GUI -- Kier &amp; Hall SMARTS for all C8H16O2 isomers</p>
                     </c>
                     <c ca="center">
                        <p>100 sec</p>
                     </c>
                     <c ca="center">
                        <p>95 sec</p>
                     </c>
                     <c ca="center">
                        <p>95%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>8</p>
                     </c>
                     <c ca="left">
                        <p>Seven Golden Rules -- generate all 28008691 formulas below 1000 Da</p>
                     </c>
                     <c ca="center">
                        <p>42 sec</p>
                     </c>
                     <c ca="center">
                        <p>42 sec</p>
                     </c>
                     <c ca="center">
                        <p>100%</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left" cspan="2">
                        <p>
                           <b>Dual CPU core benchmarks</b>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>9</p>
                     </c>
                     <c ca="left">
                        <p>ChemAxon Marvin - calculate all stereoisomers of C8H16O2</p>
                     </c>
                     <c ca="center">
                        <p>21 sec</p>
                     </c>
                     <c ca="center">
                        <p>42 sec</p>
                     </c>
                     <c ca="center">
                        <p>50%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>10</p>
                     </c>
                     <c ca="left">
                        <p>MZMine2 -- chromatographic alignment of LC-MS runs</p>
                     </c>
                     <c ca="center">
                        <p>70 sec</p>
                     </c>
                     <c ca="center">
                        <p>130 sec</p>
                     </c>
                     <c ca="center">
                        <p>54%</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Compared are an aged 2 year old Windows XP (Host OS) and a clean installed Windows XP system (Guest OS) on Microsoft Virtual PC 2007 on a Dual Opteron 254 (2.8 GHz).</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Mass spectrometry and cheminformatics software</p>
            </st>
            <p>Modern mass spectrometers produce data at such high rates that wet lab work is almost minimized to 20% of the relative project time and 80% of the time is spent with computerized data evaluations and investigation of raw data. Due to the strong interconnection of molecular spectra and molecular structures not only mass spectrometry software is used during structure elucidation, but a diverse set of programs for handling structures and for the computation of molecular properties <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. Such software includes tools for structure elucidation and mass spectrum interpretation, chromatographic peak deconvolution software, biomarker identification and alignment software, software for molecular formula determinations, software for mass spectral library search and chemical structure and descriptor generation (see Table <tblr tid="T3">3</tblr>). We will discuss the advantages of software platform virtualization in research and university teaching and show some practical applications while focusing on mass spectrometry and cheminformatics applications.</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Cheminformatics and mass spectrometry software course as part of an experimental mass spectrometry class, some of the software was deployed using WIN XP virtual machines in the computer laboratory.</p>
               </caption>
               <tblbdy cols="2">
                  <r>
                     <c ca="left">
                        <p>
                           <b>General course</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Topics covered</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>General Introduction</b>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Fighting computer illiteracy -- bits, bytes, CPUs</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Regular expressions as emergency helpers</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Structures -- resonance forms, stereoisomers, tautomers</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Mass spectrometry publications via Yahoo Pipes</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Mass spectral and molecular data handling</b>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Mass spectral data formats and conversion of mass spectra</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Open exchange formats for mass spectra (mzData, mzXML, JCAMP-DX, netCDF)</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Structure handling software and structure conversion (SMILES/SMARTS, SDF/MOL, InChI/InChIKey, PDB, CML)</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Chemical structure handling (Instant-JChem, BioClipse)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Mass spectral and molecular database search</b>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Mass spectral databases (EI, ESI, APCI) and search algorithms (PBM, dot product, mass spectral trees) and library conversion</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Proteomics data analysis (database search, de-novo sequencing, hybrid methods)</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Molecule search (exact search, substructure search, similarity search, Markush search)</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Databases (PubChem, SciFinder, Beilstein, BlueObelisk)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Mass Spectrometry Tools &amp; Concepts</b>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Resolving power, mass accuracy, isotopic pattern, charge states, charge state deconvolution</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Molecular formula space of small molecules</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Isotopic abundances as orthogonal filter for elemental compositions</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Molecular Isomer Generators, substructure predictions, simulation of mass spectra</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Concepts for GC-MS</b>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Automatic peak detection</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Peak picking and mass spectral deconvolution</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Comprehensive GCxGC-TOF-MS</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Concepts for LC-MS</b>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Deconvolution and evaluation of LC-MS data</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Adduct removal and detection during ESI-LC-MS</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Seven Golden Rules for generation of possible molecular formulas</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Structural isomer lookup example in ChemSpider</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Prediction and simulation of mass spectra</b>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Dendral - Artificial intelligence and mass spectrometry</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Prediction of the isomer substructures from a given mass spectrum</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Simulation of mass spectra from given isomer structures</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Experimental</p>
         </st>
         <sec>
            <st>
               <p>Virtual machine hardware for benchmarks and teaching labs</p>
            </st>
            <p>The test system (host computer) for benchmarks was a Dual Opteron 254 (2.8 GHz) with an ARECA-1120 Raid 6 array using WD Raptor hard disks equipped with 2.8 GByte RAM running a 32-bit Windows XP. The forty-one computers for the classroom teaching consisted of: Dell Optiplex GX745 computers with 2.4 GHz Intel Core 2 Duo processors, 4 GB of RAM, and a 160 GB hard drive. These computers had the freely available Microsoft Virtual PC 2007 installed and were configured to allow students to logon using their UC Davis computer accounts. The additional test hardware shown in Figure <figr fid="F3">3</figr> and Figure <figr fid="F5">5</figr> was an Intel Core i7 (3 GHz) quad-core system equipped with four hard disks in RAID10 (mirrored stripe) and 12 GByte memory. The hardware shown in Figure <figr fid="F4">4</figr> was a dual Intel Xeon E5430 quadcore CPU (2.66 GHz) system with 32 GByte RAM and 24 &#215; 1TB Seagate Barracuda ES.2 hard disks using an RAID6 SAS hardware controller.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>The Windows Vista Ultimate host with Sun's VirtualBox virtualizes an Ubuntu Linux system with 32 CPU threads (left side) and a Windows Server system with 10 CPU threads (right side)</p>
               </caption>
               <text>
                  <p><b>The Windows Vista Ultimate host with Sun's VirtualBox virtualizes an Ubuntu Linux system with 32 CPU threads (left side) and a Windows Server system with 10 CPU threads (right side)</b>. The guest hardware is a quad core Nehalem Core i7 950 CPU with only 8 threads. Both guest systems work without problem, but fully exhaust all underlying hardware resources when all parallel threads are in use.</p>
               </text>
               <graphic file="1758-2946-1-18-5"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Software installation for virtual machines</p>
            </st>
            <p>The freely available Microsoft Virtual PC 2007 (Version 6.0.156.0) was downloaded from Microsoft and was used for all benchmarks and teaching VMs. Memory settings for the virtual machine were set to one GByte RAM. A virtual machine for Microsoft Virtual PC 2007 was created using Microsoft XP (service pack 3) under a university volume licensing agreement. Multiple mass spectrometry and cheminformatics related software packages were downloaded from their original download websites <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> and installed into the virtual machine by simply drag-and-drop copy from one window to another or direct download from within the virtual machine. For all packages an appropriate software license was obtained. Multiple topics were covered in the teaching sessions (see Table <tblr tid="T3">3</tblr>) but a discussion of each single package goes beyond the scope of this paper. A free teaching license was obtained for the ChemAxon Marvin <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> and ChemAxon Instant-JChem package. The freely available Instant-JChem and the open source BioClipse package <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> were used for GUI driven molecule and spectral handling. Software settings for appliances shown in Figure <figr fid="F3">3</figr> and Figure <figr fid="F5">5</figr> included a Windows Vista Ultimate 64-bit operating system and the freely available Sun VirtualBox 3.0 as virtual machine software.</p>
         </sec>
         <sec>
            <st>
               <p>Benchmark software selection for virtual machine testing</p>
            </st>
            <p>The NIST SciMark 2.0 program <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> was selected because it has cross-platform capabilities and is freely available. Furthermore the five computational routines including Fast Fourier Transforms (FFT), Jacobi Successive Over-relaxation (SOR), Monte Carlo integration, Sparse matrix multiply dense LU matrix factorization represent a fair mix of scientific computing problems. The application is only single-threaded and was obtained from <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. The benchmark was run with the Sun JAVA 1.6 server compiler in off-line mode. The Molgen program is a molecular isomer generator which requires a molecular formula as input and subsequently counts or generates all possible structural isomers <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. It was included because the process of isomer generation is of importance in analytical chemistry. The demo version 3.5 (single threaded) was downloaded from <abbrgrp><abbr bid="B22">22</abbr></abbrgrp> and all isomers for C<sub>12</sub>H<sub>12 </sub>were counted. The free CDK Descriptor GUI (v0.94) <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> is a software for molecular descriptor calculation and is based on the open source chemistry development kit <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. The Kier and Hall descriptors (electrotopological E-state state indices) <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> were used on a dataset of all 13190 C<sub>8</sub>H<sub>16</sub>O<sub>2 </sub>isomers generated with the SMOG2 isomer generator software <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. The high-speed molecular formula calculator HR2 was downloaded from <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> and performed a generation of all 28,008,691 elemental compositions including the elements (C:1-78 H:1-126 N:0-20 O:0-27 P:0-9 S:0-14) below 1000 Da according to the Seven Golden Rules <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. The test conditions are below the maximum element restrictions for this mass range and were genuinely chosen for performance measurements. The software Marvin 5.1.4 was downloaded from <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> and used for the generation of all tetrahedral and double bond stereoisomers of C<sub>8</sub>H<sub>16</sub>O<sub>2</sub>. The command was invoked via the command line cxcalc command by supplying a file with all SMOG2 structural isomers. The Marvin software is multi-threaded, hence can make use of multiple CPUs. As a final test the multi-threaded and compute cluster-ready MZmine software was used. It is a package for LC-MS chromatogram alignment <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> and the free mzmine2 (beta 1.92) software was downloaded from <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. The test set was downloaded from <abbrgrp><abbr bid="B31">31</abbr></abbrgrp> and is based on a metabolic profiling study <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. All samples were included and batch processed using a zoom scan filter, three steps peak detector (local maxima mass detection, score connector chromatogram construction, Savitzky-Golay peak recognition) and alignment join aligner. Other software licenses were either purchased or obtained through university software licensing. All benchmarks were run three times and the average times were reported using the timethis command from the Windows 2000 Resource Kit Tools.</p>
         </sec>
         <sec>
            <st>
               <p>Popularity comparison of bioinformatics versus cheminformatics</p>
            </st>
            <p>For the comparison of the popularity of cheminformatics versus bioinformatics a web site specific search was performed on all 325 university websites (US) with an associated research chemistry faculty. A domain specific Google search was used to obtain information how often the words "cheminformatics", "chemoinformatics" and "bioinformatics" occur on a single university website. For example <it>cheminformatics site:berkeley.edu </it>returned 93 hits on Google, meaning the word occurred 93 times in HTML websites, in PDF and EXCEL sheets across the whole UC Berkeley website. If zero hits were returned the word would not occur on the specific university website. A JAVA program using the Google web search API was implemented to perform an automated analysis of the several thousand searches. The hit counts for all universities are discussed in the result section. All results are freely available in an EXCEL sheet from the Additional file <supplr sid="S1">1</supplr>.</p>
            <suppl id="S1">
               <title>
                  <p>Additional file 1</p>
               </title>
               <text>
                  <p><b>Popularity of cheminformatics vs. bioinformatics - complete statistics</b>. Listing of 325 US universities with chemistry program; Domain specific search and statistics(using the Google API web search) of the occurrences of the words: Cheminformatics, ChemoInformatics, BioInformatics, Chemometrics, Computational chemistry, Chemical Informatics; Format: Microsoft EXCEL 2003; Curator: Tobias Kind; FiehnLab August 2009; <url>http://fiehnlab.ucdavis.edu/staff/kind/</url></p>
               </text>
               <file name="1758-2946-1-18-S1.XLS">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Installation and micro benchmarks in a research environment</p>
            </st>
            <p>The initial installation size of the virtual machine file with Windows XP SP3 32-bit (guest OS) was 4 GByte. After the installation of multiple software packages the VM file grew to eight GByte, even though the total software installation size was only around 300 MBytes. One of the reasons may be the included swap file or NTFS file system fragmentation and folder compression in the guest OS. It has been reported that the minimum size of a Windows XP system can be as small as 700 MBytes <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. There are in general two different disk types: fixed disk and dynamic disk systems <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. A fixed disk has a constant file size. Dynamic disks in a virtual machine can grow up to the maximum available size on the host operating system. During this setup a dynamic disk was selected to leave enough room for installed programs and allow a flexible disk size management. After installation the virtual machine drive has to be defragmented and precompacted with a special precompactor program which zeros out all free space. Additionally the virtual machine has to be stopped and an external defragmentation has to be applied. General system benchmarks can be found in Table <tblr tid="T2">2</tblr>. The memory footprint of the WIN XP virtual machine guest OS was relatively small with 100 MByte RAM and after installation of the Sophos-Antivirus software the memory allocation grew to 150 MBytes. As comparison a two year old Windows XP system with hundreds of different software tools installed requires up to 500 MByte memory in idle mode (doing nothing) due to multiple drivers and resident programs. The start and save time for a virtual image OS are quite fast. Saving the virtual machine state takes around 30 seconds and restoring the saved virtual machine takes only 5 seconds. This short save time is due to the three times faster average transfer rate of the RAID 6 file system on the host computer compared to a common desktop hard disk. The times for the compute intensive single CPU core benchmarks are listed in Table <tblr tid="T2">2</tblr> and it can be seen that the speed penalty for running programs inside a virtual machine usually varies between 5% and 10%. To line out the importance of a fast disk system and multi-core and multi-CPU capabilities of programs and virtualization software also dual core CPU applications were included. The Microsoft Virtual PC does not support symmetric multi-processing (SMP), hence supports only one CPU. The penalty for running in a virtual single CPU virtual environment is 50% lower speed (Table <tblr tid="T2">2</tblr>).</p>
         </sec>
         <sec>
            <st>
               <p>Installation and use of VMs in a teaching environment</p>
            </st>
            <p>The virtual image with all the required mass spectrometry and cheminformatics software was deployed to each computer station. Due to the size of the virtual image (eight GByte) it was copied to each PC during off hours. A real-time deployment over network services to multiple computers was impossible. Students and teacher would login into the original computer workstation using their campus Kerberos authentication system. The Virtual PC is then started using the start menu without any additional certification. The set of pre-installed programs is then used for learning structure handling techniques and mass spectrometry data handling approaches including molecular formula generators, charge state deconvolution, isotopic pattern generators, mass spectral database search, tools for mass spectral interpretation and simulation, gas chromatography and liquid chromatography (GC-MS and LC-MS) deconvolution software and tools for mass spectral interpretation and simulation (see Table <tblr tid="T3">3</tblr>). Additionally the course includes structure handling approaches as well as the exploration of different file formats, structure search techniques and structural isomer generators. The whole set of teaching slides including all software references can be freely downloaded from source <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. All virtual machines are identical allowing a synchronized working from the instructor's large screen together with all students, who basically follow the instructions and take part in discussions.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <sec>
            <st>
               <p>Application of VMs in research and system benchmarks</p>
            </st>
            <p>The micro benchmarks were performed to validate the use under heavy computational tasks. The benchmark programs were selected according to frequent use in cheminformatics and mass spectrometry laboratories. The Microsoft Virtual PC only utilizes one single CPU. Therefore all results are based on single CPU speed instead of utilizing the dual core capabilities. The MS Server version and VMWare virtual machine also allow multiple CPU setups but were not used for comparison. The fastest mode in a virtual machine is the direct execution mode <abbrgrp><abbr bid="B36">36</abbr></abbrgrp> where the machine code runs without interaction from the virtual machine at almost the same native speed. Certain CPU specific commands are prevented from running within the virtual machine or they generate a CPU exception and therefore the virtual machine is needed to emulate such a machine code via binary translation and is much slower. That also explains the very small CPU based speed penalty (virtual machine overhead) for running programs inside the virtual machine. The start time of a freshly installed virtual machine is usually faster than that of an aged system, because no additional drivers and programs are installed. As seen in Table <tblr tid="T2">2</tblr> the start time of the guest OS is only 50% of the host OS. A minimum install of Windows XP usually boots in 30 seconds, but antivirus and network drivers delay such fast boot times. The minimum memory requirements are quite astounding with 150 MBytes but the real-time antivirus software needs an additional 50 MBytes. In comparison a 3 year old production system needs 750 MByte, with additional restrictions that 32-bit Windows systems can only allocate and use 2.8 GBytes even if more memory is installed. The problem is that many programs and hardware driver software for mass spectrometers are not yet certified for 64-bit operation and would create incompatibilities. One solution here is to install a large memory system with a 64-bit operating system as host OS and use the 32-bit machines as guest OS, allowing both 32-bit and 64-bit operation. Windows 64-bit can directly emulate 32-bit programs in an emulation layer. If, however, 64-bit drivers are required and not yet available the program can not be installed in the first place.</p>
         </sec>
         <sec>
            <st>
               <p>Discussion of cheminformatics and mass spectrometry related benchmarks</p>
            </st>
            <p>The single core benchmarks NIST SciMark 2.0a, Molgen Demo, CDK Descriptor GUI, Seven Golden Rules are all single-threaded benchmarks. Table <tblr tid="T2">2</tblr> shows that the speed penalty within the virtual machine is around 5-10% for each of the programs. No investigation of the impact of disk speed was performed. But the fast Areca RAID-6 system allows full guest CPU utilization. Therefore the penalty on disk use exists but is very small on a hardware RAID system. In case of a slow single hard disk on the host system, the disk performance within the guest system is also lower. In such a case the disk system overhead from guest and host system add up and decrease the overall disk speed. Many new desktop computer systems utilize minimum two CPUs. The new Intel Nehalem Core i7 technology provides fast quad-core CPUs each with a total of eight working threads. Unfortunately only few chemistry and mass spectrometry desktop applications are multi-threaded or multi-core ready <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. Among the tested versions which can make use of multi-core systems are the JChem calculational routines and MZMine2 for chromatographic alignment. The speed penalty on single-threaded programs compared to a dual core setup is severe. The stereoisomer calculation shows that on a dual processor machine a doubled performance can be obtained. Unfortunately, the Microsoft Virtual PC is a single-threaded application and does not allow the use of multi-core CPUs in the guest virtual environment. Ironically, when Microsoft bought the Virtual PC technology from Connectix in 2003 the software supported symmetric multi-processing (SMP) virtual machines. The free Microsoft Virtual PC 2007 is marketed as a desktop virtualization product and the free Microsoft Virtual Server 2 is marketed as a server product and can utilize multi-core CPUs. In comparison, the commercial VMWare Workstation and the open source VirtualBox both support virtual symmetric multiprocessing (SMP) and currently up to 32 virtual CPUs can be used in the guest system. The MZMine2 test especially shows the disk I/O dependence because of the large file size and the multi-core CPU dependence, because the software can be executed according to the number of available threads on the computer and therefore performs with double speed on a dual CPU setup.</p>
         </sec>
         <sec>
            <st>
               <p>Virtual machines can diversify operating system choices in chemistry labs</p>
            </st>
            <p>The majority of software that is commercially sold together with mass spectrometers is running under Microsoft Windows. One reason may be the sole availability of Microsoft Windows driver software for analog-digital converters (AD/DA) which are required when connecting mass spectrometers to PCs. However there is no explanation why LINUX installations cannot be used because many older instruments were successfully running under different UNIX operating systems. The reason of developing vendor software only for a single operating system is based on the complexity of the software development tools and the development and support costs. Aiming at a single platform certainly reduces costs for the vendor. Hardware near programming furthermore usually requires C or C++ code development. The data evaluation part can be done on multiple platforms including Linux, Mac OS and Windows. Here cross-platform applications written in JAVA, which have the ability of running on many different operating systems have a clear advantage.</p>
            <p>Modern mass spectrometry labs usually use multiple operating systems for historic reasons. Windows computers are used for operating chromatography equipment and mass spectrometers, LINUX OS for running software on computer clusters and Mac OS X for personal workstations and laptops. However only very few mass spectrometry desktop applications are available for MacOS <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. That problem can be solved by using a virtual PC application like VMware Fusion or Parallels Desktop for MAC to install Windows compatible applications. As already mentioned it is currently prohibited by Apple Inc. to run Mac OS X as a guest operating system in a virtual machine on non-native apple hardware.</p>
            <p>The choice between 32-bit and 64-bit operating systems <abbrgrp><abbr bid="B39">39</abbr></abbrgrp> for mass spectrometry based computer systems is based on two major factors: If the computer has more than 4 GByte RAM available and the motherboard and CPU are 64-bit capable it is recommended to use a 64-bit operating system to utilize more than 4 GByte memory. In case of less than 4 GByte RAM a 32-bit system is sufficient. The other major obstacle is the availability of programs and drivers that are natively compiled for 64-bits. If such 64-bit software drivers are not available for hardware cards (AD/DA converters, PCI cards) then it is impossible to use that hardware on a 64-bit operating system. In case of 32-bit software this is not a major problem, because most 64-bit operating systems can execute both 32-bit and 64-bit software. It is recommended however to test all system critical software on a virtual machine before deploying them in a working environment.</p>
         </sec>
         <sec>
            <st>
               <p>Hardware choices for software virtualization in chemistry labs</p>
            </st>
            <p>For server consolidation purposes usually server-grade components are used. That includes a motherboard capable of multi-socket CPU setups and enough memory banks to handle memory from 32 to 512 GByte RAM. As CPUs the quad-core or hex-core Intel XEON (based on Nehalem technology) as well as AMD Opteron (based on Shanghai or Istanbul 45 nm technology) can be recommended. The overall hard disk performance is extremely important for virtualization, therefore a series of 10,000 rpm SAS or SATA hard drives using RAID6 or RAID10 hardware RAID controllers (such as ARECA, LSI, ADAPTEC, 3WARE) should be used. A native hypervisor virtual machine monitor (XEN, VMware ESX Server or Microsoft Hyper-V) can be installed as core software layer. Any LINUX or WINDOWS guest operating system can be installed into the hypervisor.</p>
            <p>For desktop virtualization a dual-core or quad-core processor (Intel Core i7 or AMD Phenom) should be used. The memory can range from 2 to 32 GByte. As operating system any 64-bit LINUX, MAC or WINDOWS system can be installed. For each virtual machine a minimum of 800 MByte RAM should be considered. Therefore if the host operating system uses 1 GByte RAM and it is planned to run four virtual machines in parallel, a minimum of 4 GByte RAM is needed. As hard disk system an Intel Matrix software RAID with multiple disks can be used. For even higher performance a Solid State Disk (SSD) setup or server grade hard disks with an ARECA RAID controller are recommended. Currently only limited support for Direct3D graphics cards and other specialized hardware inside virtual machines are provided. Applications that require such hardware should be run on native systems. Figure <figr fid="F5">5</figr> shows an Intel Nehalem 3 GHz quad-core system equipped with four hard disks in RAID10 (mirrored stripe). The system can be used to virtualize special hardware, like a 32 thread LINUX machine as seen in the screenshot. The high average hard disk transfer rate of around 200 MByte/sec is important because the system has to deal with multiple virtual machines all performing their own disk operations. The most common source of mediocre performance of virtual machines is the use of a single slow hard disk.</p>
         </sec>
         <sec>
            <st>
               <p>Software licensing issues using virtual machines</p>
            </st>
            <p>Each operating system installation requires its own operating system license. In case of the free desktop LINUX operating system no licensing issues occur. In case of Microsoft Windows or LINUX Enterprise versions a separate license for each computer processor and for each virtual machine install must be acquired. That is also the case for most commercial software if not otherwise stated in the EULA. For universities, academic volume licenses are usually available at a reduced price. In case of trial licenses, a use in production environments is mostly prohibited. For teaching environments a special agreement must be reached with the software vendor. Some companies like ChemAxon provide three different kinds of license a) paid commercial licenses b) free teaching licenses and c) free academic licenses for use in an academic research environment.</p>
         </sec>
         <sec>
            <st>
               <p>Server consolidation in research labs using virtual machines</p>
            </st>
            <p>The use of server consolidation approaches is very common in larger research laboratories or bioinformatics labs at universities. Figure <figr fid="F4">4</figr> (right side) shows a XEN virtual machine monitor running multiple VMs at the UC Davis Genome Center Bioinformatics core lab. The idea is that a single physical computer with multiple CPUs and large memory setups runs different operating system and provides multiple services at once. That can include different web services, database front-ends or web sites. Additionally internal computations can be carried out on such a system and the system can also be used to provide intermediary backup solutions. With connected small diskless network PCs such a system could even provide simple common services as Word, EXCEL, PowerPoint and access to statistical services, without the need of purchasing individual computers for each student and researcher. Larger virtualization projects including several thousand virtual machines are usually deployed by computer IT departments at large research universities. The aims are the same: minimizing management and hardware costs.</p>
         </sec>
         <sec>
            <st>
               <p>Use of virtual machines for software testing and distribution and computer upgrades</p>
            </st>
            <p>A common application of virtual environments is application testing and development. Especially computer programmers use such VM technologies for testing the deployment of their software for cross-platform use under different operating systems. The distribution of existing (open) software packages is usually performed with live-CDs that contain a series of programs on a bootable LINUX CD <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>. Especially the Bioinformatics community has a strong history of using live-CDs such as the VLinux and the Vigyaancd for software distribution <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>. Such CDs can be converted to a single ISO file that can be mounted inside a virtual machine, allowing the LINUX system to run without the need to reboot. Although the direct distribution of pre-installed virtual machines is widely used in the computer science community the use in chemistry is very sparse. Some examples include the ECCE (Extensible Computational Chemistry Environment) <abbrgrp><abbr bid="B42">42</abbr></abbrgrp> or the MASPECTRAS platform for management and analysis of proteomics data <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>. Complex installation processes and web server installations on production systems can be easily avoided by using such pre-configured virtual hard disks (VHDs). That also includes applications for grid computing inside virtual machines <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B44">44</abbr></abbrgrp>. It is must be realized that Windows software inside a pre-configured Windows system is not suited for worldwide distribution or outside a university with volume licensing, because each installation requires a paid license. The use of the WINE emulator <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> that is capable of running Windows software inside a LINUX virtual machine could be a possible solution.</p>
            <p>Another application for VMs is legacy software testing. Certain programs may only run under native 32-bit environments. Although Windows has an in built 32-bit legacy emulation, the only solution may be to install a 32-bit and a 64-bit OS into an virtual machine. Older native Windows 16-bit installer programs will not run on Windows 64-bit computers, making the use of a 16-bit or 32-bit virtual machine a first choice. Also software which requires older operating system versions can be tested without problems. Once an application is installed and shows erroneous behavior or major incompatibilities a snap-shot from an older date can be used to restore the virtual machine to its original condition. Another favorable use is to replace old computer infrastructure but retain or keep all the current software installations. In such a case a full copy of the hard disk is created and this virtual hard disk file (VHD) is installed into a virtual machine on a faster and newer computer. Popular software tools for such a purpose are the freely available VMWare Converter and the Microsoft Disk2vhd software. In a teaching environment the use of virtual PCs is useful if no hardware based hard disk write protection is in use. In such a case the virtual machine installations including all intentional and unintentional changes can be discarded after each class and for a new teaching class a fresh original copy is simply restored.</p>
         </sec>
         <sec>
            <st>
               <p>Use of virtual machines for better computer safety - avoiding viruses, Trojan horses, zombie farms and drive-by-infections</p>
            </st>
            <p>Computer safety in chemistry and life sciences research labs not only includes anti-virus scanners and multiple stage backups of important scientific data but requires also a more active approach towards virus prevention. Current anti-virus software can detect more than 70,000 threats and viruses. However that requires the virus to be known to the anti-virus software. Any potential new virus or software exploit cannot be detected, leaving a critical exploitation time window open until virus updates are provided. Trojan horse programs can be installed to either steal passwords or log every keyboard keystroke or deactivate internal software firewalls. Such compromised systems are used from outside as servers or proxies for illegal material including music files, video files, commercial software or pornographic material. Especially proxy functions are dangerous because a computer outside a research organization or university can now access files or network ranges that are usually only accessible for computers inside the organization. This is due to the fact that many authentication schemes work IP network address based, hence assuming the computer is a registered and clean system belonging to the internal network. Unsecured and unpatched computers can become parts of large botnets which in case of large botnets like Conficker or Torpig have infected hundreds of thousands of Windows PCs <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>. The infected PCs (zombies) are controlled by Zombie Masters which use such computers to extort money, gain information, steal credit card data or rent subnets to persons that want to perform DDOS (Distributed Denial of Service attacks) or send spam mail.</p>
            <p>The use of LINUX or Mac OS systems can actually prevent virus spread, but not because such systems are generally safer, but due to the fact that administrator rights are handled very strict. LINUX computers are also prone to attacks, a recent severe vulnerability of LINUX systems in August 2008 allowed the Phalanx2 kernel rootkit to be installed and steal SSH passwords and subsequently get access to other systems <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>. One of the bigger problems in terms of computer safety is that many older programs under Windows XP always require administrator rights for installation and switching back to more restrictive user rights will let the program fail. Applying only guest user rights and installing all required software updates already reduces the number of possible virus attacks.</p>
            <p>Research centers and universities are commonly protected by multiple hardware firewalls or have internal safe-zones without any internet access or even computers prohibiting any data exchange. Under realistic scenarios such extreme protection without any internet access is contra-productive. Even with hardware firewalls activated, users are allowed to surf the internet (IP port 80) or allowed to use SSH and SFTP (IP port 21) for connecting to remote computers or compute clusters. In such a scenario virus or Trojan horse infections can still occur. Another solution would be to surf the internet through proxy software that monitors all incoming traffic with multiple anti-viruses and rootkit detection utilities. Such a software solution which includes web antimalware, and https (secure traffic) and http web traffic inspection, a network inspection system and URL filtering is available for enterprise customers <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>. A simpler solution would be to use online surfing tools like SiteAdvisor <abbrgrp><abbr bid="B48">48</abbr></abbrgrp> or simply use Sandbox technologies <abbrgrp><abbr bid="B49">49</abbr></abbrgrp> as implemented in the Google Chrome Browser which restricts program rights.</p>
            <p>The use of virtual machines for internet connections and surfing is highly recommended. If a user works on a Windows Machine as host and surfs the Internet using a small LINUX OS as host he would greatly reduce direct computer virus infection risks. Usually the computer virus itself cannot escape the virtual environment, therefore the underlying operating system and programs remain virus free. It must be mentioned that programs can detect if they are executed inside a virtual environment <abbrgrp><abbr bid="B50">50</abbr></abbrgrp> and that there are few concept studies of ultrathin hypervisors (Blue Pill/Red Pill) which can be exploited as rootkits <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>. The second barrier would be the cross-platform barrier because only few viruses exist which could be executed in LINUX and Windows together. Such a scenario requires that the guest operating system itself is not prone to any network attack from outside. The Windows Operating System 7 has inbuilt virtual machines services, therefore browser sessions could be automatically started within a virtual machine. Using sandbox browsers (Google Chrome) already reduces the risks of virus infections, but such natively running browsers are still prone to vulnerabilities from installed external plugins (Flash, PDF, bitmaps graphics, QuickTime). Most viruses can not escape from a virtualized cross-platform environment and the virtual image itself can be reset to the original stage, therefore preventing any virus infection.</p>
         </sec>
         <sec>
            <st>
               <p>Use of virtual machines in teaching environments</p>
            </st>
            <p>Cheminformatics and mass spectrometry teaching not only require the classical chalkboard talks but also laboratory sessions. In such experimental laboratory classes students perform experiments on different mass spectrometry platforms including time of flight analyzers (TOF), Orbitrap analyzers, Fourier-transform mass spectrometers and investigate different ionization modes including electrospray, matrix-assisted laser desorption ionization (MALDI) and others. Such experimental classes have to be taught in smaller group sizes, because not all students can perform the experiment itself, but they should be involved as much as possible. An estimated 80% of the time will be spent on software work and data evaluation of acquired spectra and the investigation of mass spectra and their associated molecular structures. Therefore, a strong cheminformatics syllabus and software handling courses are needed. We observed in our class that prior to the course students were mainly exposed to internet search, Microsoft Word and EXCEL and general purpose chemistry drawing programs. Very few individuals had computer programming skills. Individual discussions revealed that the software classes had direct synergistic impacts, with students independently exploring the learned databases and chemistry programs.</p>
            <p>For theoretical teaching distance learning techniques <abbrgrp><abbr bid="B52">52</abbr><abbr bid="B53">53</abbr></abbrgrp>, platform-independent chemistry web services <abbrgrp><abbr bid="B54">54</abbr><abbr bid="B55">55</abbr><abbr bid="B56">56</abbr></abbrgrp> or podcasting techniques of lectures in video and audio MP3 are <abbrgrp><abbr bid="B57">57</abbr></abbrgrp> commonly used and well accepted. For direct on-site teaching of computer applications and approaches the use of virtual machines is recommended (see Figure <figr fid="F6">6</figr>). Such hands-on classes provide real world experience with software programs. Tasks that are commonly performed in the laboratory can be tested in a non-destructive virtual environment. In case of user input errors, user modifications or program crashes, the virtual disk image can be easily used to restore the original virtual machine. An additional advantage is that everybody uses the same software settings and setups, hence installation and settings problems are avoided. If only open-source or free software installations are used it would be possible to freely distribute such pre-configured virtual machines for cheminformatics teaching to a broad community.</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Hands-on labs: Virtual machines are used for teaching cheminformatics and mass spectrometry software classes</p>
               </caption>
               <text>
                  <p><b>Hands-on labs: Virtual machines are used for teaching cheminformatics and mass spectrometry software classes</b>. The hands-on class provides everybody with the same software and setups hence avoids installation and settings problems. All required software is installed and tested on a single virtual machine and this software image is later deployed to all computers in the class room. Right picture: Screenshot of the teaching VM with WIN XP and the AMDIS and MarvinView software running.</p>
               </text>
               <graphic file="1758-2946-1-18-6"/>
            </fig>
            <p>A possible solution for cheminformatics distance learning, based on remote access to virtual machines, would be the use of a large server and a virtual machine setup with around 48 CPU cores and 256 GByte RAM. Students would then login via remote desktops or thin clients into virtual machines with teaching material. Existing software technology allows access to such services via internet browsers JAVA or ActiveX plug-ins. However in the current computer teaching laboratory setup 42 workstations with overall 82 cores and 164 GByte of RAM were available. In case of a virtual machine setup not only a powerful but expensive server had to be purchased but also individual diskless thin client computers and additional software licenses had to be acquired. Due to the optimized system management the use of individual computers in the lab was the preferred and overall cheaper solution. However, for distance based learning a total virtualization could be a good solution to reduce administration time.</p>
         </sec>
         <sec>
            <st>
               <p>About the missing cheminformatics education at universities worldwide</p>
            </st>
            <p>A general problem is, that compared to bioinformatics, cheminformatics is taught only at few universities <abbrgrp><abbr bid="B58">58</abbr></abbrgrp> with strong cheminformatics graduate-level and PhD programs. Related courses are sometimes taught together with computational chemistry, quantum chemistry or theoretical chemistry courses. Among those few universities are Indiana University, UC Irvine, Clarkson University, University of Michigan, University of New Mexico, Louis Pasteur University of Strasbourg (France), University Erlangen Nuremberg (Germany), Beilstein-Stiftungsprofessur Chemieinformatik at the University Frankfurt/Main (Germany) and University of Sheffield (UK) <abbrgrp><abbr bid="B59">59</abbr></abbrgrp> and New University of Lisbon (Portugal) <abbrgrp><abbr bid="B60">60</abbr></abbrgrp>.</p>
            <p>Cheminformatics is certainly related to chemometrics and computational chemistry <abbrgrp><abbr bid="B61">61</abbr></abbrgrp> but all three sciences cover specialized areas of chemistry with large overlaps. A comparison of the relative popularity (based on site specific Google hit counts) of cheminformatics versus bioinformatics among 325 chemistry universities and institutions across the US can be found in Figure <figr fid="F7">7</figr>. For all of the 325 universities with chemistry faculty a domain specific Google search was performed to obtain information how often the word "cheminformatics" or "bioinformatics" can be found across the whole university website. For example a Google search for <it>cheminformatics site:berkeley.edu </it>returned 93 hits and <it>bioinformatics site:berkeley.edu </it>returned 3620 hits. Therefore more bioinformatics related material (web pages, PDF documents) can be found at the UC Berkeley website. It can be concluded that bioinformatics is more popular than cheminformatics at UC Berkeley. The figure shows that the term <it>BioInformatics </it>is found almost two orders of magnitude more frequently than <it>ChemInformatics </it>or <it>ChemoInformatics </it>combined across all universities. The number of total hits for bioinformatics (315 institutes and 647082 total hits); cheminformatics (189 institutes and 5905 total hits) and for chemoinformatics (153 institutes and 4582 total hits) confirm the popularity of bioinformatics in modern life sciences.</p>
            <fig id="F7">
               <title>
                  <p>Figure 7</p>
               </title>
               <caption>
                  <p>Popularity of cheminformatics vs. bioinformatics based on site specific Google hit counts across 325 universities (US) with research chemistry faculty</p>
               </caption>
               <text>
                  <p><b>Popularity of cheminformatics vs. bioinformatics based on site specific Google hit counts across 325 universities (US) with research chemistry faculty</b>. For all 325 universities a site specific search on Google was performed and mapped on the graph, i.e. <it>cheminformatics site:berkeley.edu </it>returned 93 hits and <it>bioinformatics site:berkeley.edu </it>returned 3620 hits. Because UC Berkeley hosts more bioinformatics related material it is safe to assume that bioinformatics is more popular than cheminformatics at UC Berkeley. Around 100 universities had no occurrence of the words <it>cheminformatics </it>or <it>chemoinformatics </it>on their global university websites (scores combined); Search date: August 2009.</p>
               </text>
               <graphic file="1758-2946-1-18-7"/>
            </fig>
            <p>The current fast-paced chemistry development requires that each chemistry student should have a very early cheminformatics education which goes beyond simple database search and structure drawing <abbrgrp><abbr bid="B62">62</abbr></abbrgrp>. The use of pre-configured virtual machines containing teaching material and cheminformatics software could lead to an easier handling of complex software setups. Also distance based learning techniques could use real-time remote connections to virtual machines equipped with a wide array of cheminformatics software <abbrgrp><abbr bid="B63">63</abbr><abbr bid="B64">64</abbr></abbrgrp>. Such advanced learning tasks could include in-silico reaction planning with a large computerized reaction database and planning system <abbrgrp><abbr bid="B65">65</abbr></abbrgrp>, the use of molecular descriptors, in-silico de-novo molecular design, structure screening and searching, Quantitative Structure-Activity Relationships (QSAR) and visualization methods <abbrgrp><abbr bid="B66">66</abbr></abbrgrp>. Also the lack of programming skills among chemists must be regarded as a potential threat to a successful development of the field. The recruitment of computer scientist from outside the field is limited, because chemical structure handling and chemical reaction manipulation requires a deep understanding of chemistry in the first place. Wendy Warr an international expert in chemical information stated in a 2008 editorial <abbrgrp><abbr bid="B67">67</abbr></abbrgrp>: "<it>The catalog of courses and resources compiled in this paper might suggest that cheminformatics education is flourishing. It is not. Many examples of isolated efforts are cited here but there is no European or international coordination. Cheminformatics practitioners have still not defined their discipline and its impact, let alone successfully made a case to governments and funding agencies</it>."</p>
            <p>In a related commentary about systems chemical biology <abbrgrp><abbr bid="B68">68</abbr></abbrgrp>, the authors discussed cheminformatics tools that can integrate chemical knowledge with biological databases and raised concerns about the cancellation of the National Institute of Health (NIH/US) funding projects for the "Preapplication for Cheminformatics Research Centers" in 2007 <abbrgrp><abbr bid="B69">69</abbr></abbrgrp>, which would have been the largest funding source for the study of new cheminformatics approaches in the United States. It must be argued that cheminformatics education and research are such a fundamental part of <it>new chemistry</it>, that funding in the United States should be provided by the National Science Foundation (NSF) and not by the NIH which historically receives a much higher funding (Budget FY2009 NIH: 30.5 billion US$ and NSF: 6.5 billion US$ <abbrgrp><abbr bid="B70">70</abbr></abbrgrp>). The NSF not only has the mandate of promoting interdisciplinary research but also has a strong interest in chemistry education <abbrgrp><abbr bid="B71">71</abbr></abbrgrp>. Regrettably, U.S. funding of chemistry can barely keep up with inflation <abbrgrp><abbr bid="B72">72</abbr></abbrgrp> and the FY2009 budget for the NSF Division of Chemistry (CHE) is around $244.67 million and therefore represents only 3.7% of the whole NSF budget. But making the case that cheminformatics is a substantial building block for success in the grand challenges in chemistry <abbrgrp><abbr bid="B73">73</abbr></abbrgrp> is up to the cheminformaticians themselves.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>Software virtualization in chemistry, mass spectrometry and cheminformatics is needed for software testing and development under different deployment scenarios and operating systems without the need of having multiple standalone computers. We have shown with multiple cheminformatics and mass spectrometry software benchmarks that the computational penalty of using virtual machines is very low and usually around 5% to 10%. In order to obtain maximum performance the virtualization software must be multi-core enabled and should emulate a multiprocessor configuration in the virtual machine environment. The computational chemistry software should make use of multi-core CPUs and the computer itself should be equipped with a multi-core CPU as well as a fast SSD or RAID system. Software virtualization in research chemistry labs is useful for keeping the computational infrastructure small and manageable. Multiple operating systems can be used one multi-core CPU computer providing web services, backup services, computational and data exchange services. Software virtualization in a teaching environment allows faster deployment and easy use of commercial and open source software. Preconfigured virtual machines can be used for worldwide distribution of open source and freely available cheminformatics tools.</p>
      </sec>
      <sec>
         <st>
            <p>Competing interests</p>
         </st>
         <p>The authors declare that they have no competing interests.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>JL and TK developed the concept of the study. TL helped implementing the study. TK performed the experiments. TK and OF drafted the manuscript. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We thank all contributors who provided free or open source software and those companies that provided free teaching licenses. Thanks to Martin Scholz (UC Davis) for help with the JAVA query tool using the Google API. Thanks to Zhi-Wei Lu (UC Davis Genome Center Bioinformatics Core) for the screenshot and hardware information on the XEN hypervisor. Funding was provided by NSF MCB-0820823. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Survey of Virtual Machine Research</p>
            </title>
            <aug>
               <au>
                  <snm>Goldberg</snm>
                  <fnm>RP</fnm>
               </au>
            </aug>
            <source>IEEE Computer</source>
            <pubdate>1974</pubdate>
            <volume>7</volume>
            <issue>6</issue>
            <fpage>34</fpage>
            <lpage>45</lpage>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Chemoinformatics: a new field with a long tradition</p>
            </title>
            <aug>
               <au>
                  <snm>Gasteiger</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Analytical and bioanalytical chemistry</source>
            <pubdate>2006</pubdate>
            <volume>384</volume>
            <issue>1</issue>
            <fpage>57</fpage>
            <lpage>64</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s00216-005-0065-y</pubid>
                  <pubid idtype="pmpid" link="fulltext">16177914</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Chemometrics; what do we mean with it, and what do we want from it?</p>
            </title>
            <aug>
               <au>
                  <snm>Wold</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Chemometrics and Intelligent Laboratory Systems</source>
            <pubdate>1995</pubdate>
            <volume>30</volume>
            <issue>1</issue>
            <fpage>109</fpage>
            <lpage>115</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/0169-7439(95)00042-9</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Software Platform Virtualization</p>
            </title>
            <url>http://en.wikipedia.org/wiki/Platform_virtualization</url>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Virtual Machine Definition</p>
            </title>
            <url>http://en.wikipedia.org/wiki/Virtual_machine</url>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Wine translation layer - for running Windows applications on Linux, BSD and Mac OS X</p>
            </title>
            <url>http://www.winehq.org/</url>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Hypervisor Server Virtualization</p>
            </title>
            <url>http://en.wikipedia.org/wiki/Hypervisor</url>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Comparison of virtual machines</p>
            </title>
            <url>http://en.wikipedia.org/wiki/Comparison_of_virtual_machines</url>
         </bibl>
         <bibl id="B9">
            <title>
               <p>VMware vCenter Converter</p>
            </title>
            <url>http://www.vmware.com/</url>
         </bibl>
         <bibl id="B10">
            <title>
               <p>PlatinSpin PowerConvert for virtual machine images</p>
            </title>
            <url>http://www.platespin.com</url>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Virtual hierarchies to support server consolidation</p>
            </title>
            <aug>
               <au>
                  <snm>Marty</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Hill</snm>
                  <fnm>MD</fnm>
               </au>
            </aug>
            <source>ACM</source>
            <pubdate>2007</pubdate>
            <fpage>56</fpage>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Beyond server consolidation</p>
            </title>
            <aug>
               <au>
                  <snm>Vogels</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Queue</source>
            <pubdate>2008</pubdate>
            <volume>6</volume>
            <issue>1</issue>
            <fpage>20</fpage>
            <lpage>26</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1145/1348583.1348590</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Scalable fine-grained parallelization of plane-wave-based ab initio molecular dynamics for large supercomputers</p>
            </title>
            <aug>
               <au>
                  <snm>Ramkumar</snm>
                  <fnm>VV</fnm>
               </au>
               <au>
                  <snm>Yan</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sameer</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Laxmikant</snm>
                  <fnm>VK</fnm>
               </au>
               <au>
                  <snm>Mark</snm>
                  <fnm>ET</fnm>
               </au>
               <au>
                  <snm>Glenn</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Journal of Computational Chemistry</source>
            <pubdate>2004</pubdate>
            <volume>25</volume>
            <issue>16</issue>
            <fpage>2006</fpage>
            <lpage>2022</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/jcc.20113</pubid>
                  <pubid idtype="pmpid" link="fulltext">15473008</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Hydra: A Self Regenerating High Performance Computing Grid for Drug Discovery</p>
            </title>
            <aug>
               <au>
                  <snm>Bullard</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Gobbi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lardy</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Perkins</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Little</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Journal of Chemical Information and Modeling</source>
            <pubdate>2008</pubdate>
            <volume>48</volume>
            <issue>4</issue>
            <fpage>811</fpage>
            <lpage>816</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/ci700396b</pubid>
                  <pubid idtype="pmpid" link="fulltext">18338845</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Vigyaan biochemical software workbench</p>
            </title>
            <aug>
               <au>
                  <snm>Agarwal</snm>
                  <fnm>PK</fnm>
               </au>
            </aug>
            <url>http://www.vigyaancd.org</url>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Programs for structure elucidation of small molecules</p>
            </title>
            <aug>
               <au>
                  <snm>Kind</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <url>http://fiehnlab.ucdavis.edu/staff/kind/Metabolomics/Structure_Elucidation/</url>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Software for stereoisomer generation; Marvin 5.1.4; ChemAxon 2008</p>
            </title>
            <url>http://www.chemaxon.com/</url>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Bioclipse: an open source workbench for chemo- and bioinformatics</p>
            </title>
            <aug>
               <au>
                  <snm>Spjuth</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Helmus</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Willighagen</snm>
                  <fnm>EL</fnm>
               </au>
               <au>
                  <snm>Kuhn</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Eklund</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Wagener</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Murray-Rust</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Steinbeck</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Wikberg</snm>
                  <fnm>JES</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <issue>1</issue>
            <fpage>59</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2105-8-59</pubid>
                  <pubid idtype="pmcid">1808478</pubid>
                  <pubid idtype="pmpid" link="fulltext">17316423</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Java and numerical computing</p>
            </title>
            <aug>
               <au>
                  <snm>Boisvert</snm>
                  <fnm>RF</fnm>
               </au>
               <au>
                  <snm>Moreira</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Philippsen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Pozo</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Comput Sci Eng</source>
            <pubdate>2001</pubdate>
            <volume>3</volume>
            <issue>2</issue>
            <fpage>18</fpage>
            <lpage>24</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1109/5992.908997</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>NIST SciMark 2.0 benchmark</p>
            </title>
            <aug>
               <au>
                  <snm>Pozo</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <url>http://math.nist.gov/scimark2/index.html</url>
         </bibl>
         <bibl id="B21">
            <title>
               <p>The use of MS classifiers and structure generation to assist in the identification of unknowns in effect-directed analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Schymanski</snm>
                  <fnm>EL</fnm>
               </au>
               <au>
                  <snm>Meinert</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Meringer</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Brack</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Anal Chim Acta</source>
            <pubdate>2008</pubdate>
            <volume>615</volume>
            <issue>2</issue>
            <fpage>136</fpage>
            <lpage>147</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.aca.2008.03.060</pubid>
                  <pubid idtype="pmpid" link="fulltext">18442519</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>MOLGEN 3.5 demo version for Windows</p>
            </title>
            <url>http://www.molgen.de/</url>
         </bibl>
         <bibl id="B23">
            <title>
               <p>CDK Descriptor Calculator GUI</p>
            </title>
            <aug>
               <au>
                  <snm>Guha</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <url>http:////www.rguha.net/code/java/cdkdesc.html</url>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Recent developments of the Chemistry Development Kit (CDK) - An open-source Java library for chemo- and bioinformatics</p>
            </title>
            <aug>
               <au>
                  <snm>Steinbeck</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Hoppe</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Kuhn</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Floris</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Guha</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Willighagen</snm>
                  <fnm>EL</fnm>
               </au>
            </aug>
            <source>Curr Pharm Design</source>
            <pubdate>2006</pubdate>
            <volume>12</volume>
            <issue>17</issue>
            <fpage>2111</fpage>
            <lpage>2120</lpage>
            <xrefbib>
               <pubid idtype="doi">10.2174/138161206777585274</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Performance of Kier-hall E-state descriptors in quantitative structure activity relationship (QSAR) studies of multifunctional molecules</p>
            </title>
            <aug>
               <au>
                  <snm>Butina</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Molecules</source>
            <pubdate>2004</pubdate>
            <volume>9</volume>
            <issue>12</issue>
            <fpage>1004</fpage>
            <lpage>1009</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.3390/91201004</pubid>
                  <pubid idtype="pmpid">18007500</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Irredundant Generation of Isomeric Molecular Structures with Some Known Fragments</p>
            </title>
            <aug>
               <au>
                  <snm>Molchanova</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Zefirov</snm>
                  <fnm>NS</fnm>
               </au>
            </aug>
            <source>Journal of Chemical Information and Computer Sciences</source>
            <pubdate>1998</pubdate>
            <volume>38</volume>
            <issue>1</issue>
            <fpage>8</fpage>
            <lpage>22</lpage>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Seven Golden Rules Software</p>
            </title>
            <aug>
               <au>
                  <snm>Kind</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Fiehn</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <url>http://fiehnlab.ucdavis.edu/projects/Seven_Golden_Rules/</url>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry</p>
            </title>
            <aug>
               <au>
                  <snm>Kind</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Fiehn</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <fpage>20</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2105-8-105</pubid>
                  <pubid idtype="pmcid">1790715</pubid>
                  <pubid idtype="pmpid" link="fulltext">17244365</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>MZmine: toolbox for processing and visualization of mass spectrometry based molecular profile data</p>
            </title>
            <aug>
               <au>
                  <snm>Katajamaa</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Miettinen</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Oresic</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <issue>5</issue>
            <fpage>634</fpage>
            <lpage>636</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btk039</pubid>
                  <pubid idtype="pmpid" link="fulltext">16403790</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>MZmine 2 LC/MS framework</p>
            </title>
            <url>http://sourceforge.net/projects/mzmine/</url>
         </bibl>
         <bibl id="B31">
            <title>
               <p>LC/MS data - FAAH Knockout Dataset</p>
            </title>
            <url>http://metlin.scripps.edu/download/</url>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Assignment of Endogenous Substrates to Enzymes by Global Metabolite Profiling</p>
            </title>
            <aug>
               <au>
                  <snm>Saghatelian</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Trauger</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Want</snm>
                  <fnm>EJ</fnm>
               </au>
               <au>
                  <snm>Hawkins</snm>
                  <fnm>EG</fnm>
               </au>
               <au>
                  <snm>Siuzdak</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Cravatt</snm>
                  <fnm>BF</fnm>
               </au>
            </aug>
            <source>Biochemistry</source>
            <pubdate>2004</pubdate>
            <volume>43</volume>
            <issue>45</issue>
            <fpage>14332</fpage>
            <lpage>14339</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/bi0480335</pubid>
                  <pubid idtype="pmpid" link="fulltext">15533037</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Creating Smaller Virtual Machines</p>
            </title>
            <aug>
               <au>
                  <snm>Atwood</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <url>http://www.codinghorror.com/blog/archives/000639.html</url>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Virtualization - From the Desktop to the Enterprise</p>
            </title>
            <aug>
               <au>
                  <snm>Wolf</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Halter</snm>
                  <fnm>EM</fnm>
               </au>
            </aug>
            <url>http://dx.doi.org/10.1007/978-1-4302-0027-7</url>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Mass Spectrometry meets Cheminformatics - Teaching Slides</p>
            </title>
            <aug>
               <au>
                  <snm>Kind</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Leary</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <url>http://fiehnlab.ucdavis.edu/staff/kind/Teaching/</url>
         </bibl>
         <bibl id="B36">
            <title>
               <p>A Performance Comparison of Hypervisors - VMWare</p>
            </title>
            <url>http://www.vmware.com/pdf/hypervisor_performance.pdf</url>
         </bibl>
         <bibl id="B37">
            <title>
               <p>The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software</p>
            </title>
            <aug>
               <au>
                  <snm>Sutter</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <url>http://www.gotw.ca/publications/concurrency-ddj.htm</url>
         </bibl>
         <bibl id="B38">
            <title>
               <p>MacInChemBlog - Macs in chemistry</p>
            </title>
            <aug>
               <au>
                  <snm>Swain</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <url>http://homepage.mac.com/swain/Sites/Macinchem/default.htm</url>
         </bibl>
         <bibl id="B39">
            <title>
               <p>64-bit</p>
            </title>
            <url>http://en.wikipedia.org/wiki/64-bit</url>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Linux distributions for bioinformatics: an update</p>
            </title>
            <aug>
               <au>
                  <snm>Rana</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Foscarini</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>EMBnet news</source>
            <pubdate>2009</pubdate>
            <volume>15</volume>
            <issue>3</issue>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Open Software for Biologists: from famine to feast</p>
            </title>
            <aug>
               <au>
                  <snm>Field</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Tiwari</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Booth</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Houten</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Swan</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Bertrand</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Thurston</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nature Biotechnology</source>
            <pubdate>2006</pubdate>
            <volume>24</volume>
            <issue>7</issue>
            <fpage>801</fpage>
            <lpage>804</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nbt0706-801</pubid>
                  <pubid idtype="pmpid" link="fulltext">16841067</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>ECCE, A Problem Solving Environment for Computational Chemistry</p>
            </title>
            <url>http://ecce.pnl.gov</url>
         </bibl>
         <bibl id="B43">
            <title>
               <p>MASPECTRAS platform for management and analysis of proteomics LC-MS/MS data</p>
            </title>
            <url>http://genome.tugraz.at/maspectras/</url>
         </bibl>
         <bibl id="B44">
            <title>
               <p>A case for grid computing on virtual machines</p>
            </title>
            <aug>
               <au>
                  <snm>Figueiredo</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Dinda</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Fortes</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>IEEE Computer Society</source>
            <pubdate>1999</pubdate>
            <volume>2003</volume>
            <fpage>550</fpage>
            <lpage>559</lpage>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Your Botnet is My Botnet: Analysis of a Botnet Takeover</p>
            </title>
            <aug>
               <au>
                  <snm>Stone-Gross</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Cova</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Cavallaro</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Gilbert</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Szydlowski</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kemmerer</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Kruegel</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Vigna</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <url>http://www.cs.ucsb.edu/~seclab/projects/torpig/torpig.pdf</url>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Phalanx2 - What Happened?</p>
            </title>
            <aug>
               <au>
                  <snm>Heintz</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <url>http://hep.uchicago.edu/admin/report_072808.html</url>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Microsoft Forefront Security and Forefront Threat Management Gateway</p>
            </title>
            <url>http://www.microsoft.com/forefront/</url>
         </bibl>
         <bibl id="B48">
            <title>
               <p>McAfee SiteAdvisor</p>
            </title>
            <url>http://www.siteadvisor.com/</url>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Chromium Developer Documentation - Sandbox FAQ</p>
            </title>
            <url>http://dev.chromium.org/</url>
         </bibl>
         <bibl id="B50">
            <title>
               <p>Virtualisation as a blackhat tool</p>
            </title>
            <aug>
               <au>
                  <snm>Skapinetz</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Network Security</source>
            <pubdate>2007</pubdate>
            <volume>2007</volume>
            <issue>10</issue>
            <fpage>4</fpage>
            <lpage>7</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/S1353-4858(07)70092-2</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>Introducing Blue Pill</p>
            </title>
            <aug>
               <au>
                  <snm>Rutkowska</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <url>http://theinvisiblethings.blogspot.com/2006/06/introducing-blue-pill.html</url>
         </bibl>
         <bibl id="B52">
            <title>
               <p>Videoconferencing and Other Distance Education Techniques in Chemoinformatics Teaching and Research at Indiana University</p>
            </title>
            <aug>
               <au>
                  <snm>Wild</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Wiggins</snm>
                  <fnm>GD</fnm>
               </au>
            </aug>
            <source>Journal of Chemical Information and Modeling</source>
            <pubdate>2006</pubdate>
            <volume>46</volume>
            <issue>2</issue>
            <fpage>495</fpage>
            <lpage>502</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/ci050297q</pubid>
                  <pubid idtype="pmpid" link="fulltext">16562977</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B53">
            <title>
               <p>Electronic Analytical Reference Library (EARL) - Crawford Scientific</p>
            </title>
            <url>http://www.earl2learn.com/</url>
         </bibl>
         <bibl id="B54">
            <title>
               <p>The Blue Obelisk Interoperability in Chemical Informatics</p>
            </title>
            <aug>
               <au>
                  <snm>Guha</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Howard</snm>
                  <fnm>MT</fnm>
               </au>
               <au>
                  <snm>Hutchison</snm>
                  <fnm>GR</fnm>
               </au>
               <au>
                  <snm>Murray-Rust</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Rzepa</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Steinbeck</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Wegner</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Willighagen</snm>
                  <fnm>EL</fnm>
               </au>
            </aug>
            <source>Journal of Chemical Information and Modeling</source>
            <pubdate>2006</pubdate>
            <volume>46</volume>
            <issue>3</issue>
            <fpage>991</fpage>
            <lpage>998</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/ci050400b</pubid>
                  <pubid idtype="pmpid" link="fulltext">16711717</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B55">
            <title>
               <p>Cheminformatics Teaching Tools for the Cheminformatics Virtual Classroom</p>
            </title>
            <url>http://www.chemvc.com/</url>
         </bibl>
         <bibl id="B56">
            <title>
               <p>Web-based cheminformatics tools deployed via corporate Intranets</p>
            </title>
            <aug>
               <au>
                  <snm>Ertl</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Selzer</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>M&#252;hlbacher</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Drug Discovery Today: BIOSILICO</source>
            <pubdate>2004</pubdate>
            <volume>2</volume>
            <issue>5</issue>
            <fpage>201</fpage>
            <lpage>207</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/S1741-8364(04)02413-8</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B57">
            <title>
               <p>UC Davis iTunesU - audio, video and podcasts of news, faculty lectures and interviews</p>
            </title>
            <url>http://itunes.ucdavis.edu/</url>
         </bibl>
         <bibl id="B58">
            <title>
               <p>Challenges for chemoinformatics education in drug discovery</p>
            </title>
            <aug>
               <au>
                  <snm>Wild</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Wiggins</snm>
                  <fnm>GD</fnm>
               </au>
            </aug>
            <source>Drug discovery today</source>
            <pubdate>2006</pubdate>
            <volume>11</volume>
            <issue>9-10</issue>
            <fpage>436</fpage>
            <lpage>439</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.drudis.2006.03.010</pubid>
                  <pubid idtype="pmpid" link="fulltext">16635806</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B59">
            <title>
               <p>Chemoinformatics research at the University of Sheffield: a history and citation analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Bishop</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Gillet</snm>
                  <fnm>VJ</fnm>
               </au>
               <au>
                  <snm>Holliday</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Willett</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Journal of Information Science</source>
            <pubdate>2003</pubdate>
            <volume>29</volume>
            <issue>4</issue>
            <fpage>249</fpage>
            <xrefbib>
               <pubid idtype="doi">10.1177/01655515030294003</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B60">
            <title>
               <p>Cheminformatics Academic Programs</p>
            </title>
            <url>http://cheminfo.informatics.indiana.edu/cicc/cis/index.php/Cheminformatics_Academic_Programs</url>
         </bibl>
         <bibl id="B61">
            <title>
               <p>Chemoinformatics--a new name for an old problem?</p>
            </title>
            <aug>
               <au>
                  <snm>Hann</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Current Opinion in Chemical Biology</source>
            <pubdate>1999</pubdate>
            <volume>3</volume>
            <issue>4</issue>
            <fpage>379</fpage>
            <lpage>383</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S1367-5931(99)80057-X</pubid>
                  <pubid idtype="pmpid" link="fulltext">10419846</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B62">
            <title>
               <p>Chemistry plans a structural overhaul</p>
            </title>
            <aug>
               <au>
                  <snm>Russo</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2002</pubdate>
            <volume>419</volume>
            <issue>6903</issue>
            <fpage>4</fpage>
            <lpage>7</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nj6903-04a</pubid>
                  <pubid idtype="pmpid" link="fulltext">12226620</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B63">
            <title>
               <p>Creating remotely accessible" virtual networks" on a single PC to teach computer networking and operating systems</p>
            </title>
            <aug>
               <au>
                  <snm>Stockman</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <publisher>ACM New York, NY, USA</publisher>
            <pubdate>2003</pubdate>
            <fpage>67</fpage>
            <lpage>71</lpage>
         </bibl>
         <bibl id="B64">
            <title>
               <p>The development and deployment of a multi-user, remote access virtualization system for networking, security, and system administration classes</p>
            </title>
            <aug>
               <au>
                  <snm>Border</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>ACM SIGCSE Bulletin</source>
            <pubdate>2007</pubdate>
            <volume>39</volume>
            <issue>1</issue>
            <fpage>576</fpage>
            <lpage>580</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1145/1227504.1227501</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B65">
            <title>
               <p>Making" Real" Molecules in Virtual Space</p>
            </title>
            <aug>
               <au>
                  <snm>Pirok</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>M&#225;t&#233;</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Varga</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Szegezdi</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Vargyas</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Dorant</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Csizmadia</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Journal of chemical information and modeling</source>
            <pubdate>2006</pubdate>
            <volume>46</volume>
            <issue>2</issue>
            <fpage>563</fpage>
            <lpage>568</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/ci050373p</pubid>
                  <pubid idtype="pmpid" link="fulltext">16562984</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B66">
            <title>
               <p>Chemoinformatics - an introduction for computer scientists</p>
            </title>
            <aug>
               <au>
                  <snm>Brown</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>ACM Comput Surv</source>
            <pubdate>2009</pubdate>
            <volume>41</volume>
            <issue>2</issue>
            <fpage>1</fpage>
            <lpage>38</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1145/1459352.1459353</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B67">
            <title>
               <p>Cheminformatics Education</p>
            </title>
            <aug>
               <au>
                  <snm>Warr</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <url>http://www.qsarworld.com/cheminformatics-education.php</url>
         </bibl>
         <bibl id="B68">
            <title>
               <p>Systems chemical biology</p>
            </title>
            <aug>
               <au>
                  <snm>Oprea</snm>
                  <fnm>TI</fnm>
               </au>
               <au>
                  <snm>Tropsha</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Faulon</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Rintoul</snm>
                  <fnm>MD</fnm>
               </au>
            </aug>
            <source>Nature Chemical Biology</source>
            <pubdate>2007</pubdate>
            <volume>3</volume>
            <issue>8</issue>
            <fpage>447</fpage>
            <lpage>450</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nchembio0807-447</pubid>
                  <pubid idtype="pmcid">2734506</pubid>
                  <pubid idtype="pmpid" link="fulltext">17637771</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B69">
            <title>
               <p>Cancellation of PAR-07-353 (Preapplication of Cheminformatics Research Centers [X02])</p>
            </title>
            <url>http://grants.nih.gov/grants/guide/notice-files/NOT-RM-07-010.html</url>
         </bibl>
         <bibl id="B70">
            <title>
               <p>2010 Funding for NIH, NSF - FASEB Office of Public Affairs</p>
            </title>
            <url>http://opa.faseb.org/</url>
         </bibl>
         <bibl id="B71">
            <title>
               <p>Chemical biology at the US National Science Foundation</p>
            </title>
            <aug>
               <au>
                  <snm>Colon</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Chitnis</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Collins</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Hicks</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Chan</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Tornow</snm>
                  <fnm>JS</fnm>
               </au>
            </aug>
            <source>Nat Chem Biol</source>
            <pubdate>2008</pubdate>
            <volume>4</volume>
            <issue>9</issue>
            <fpage>511</fpage>
            <lpage>514</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nchembio0908-511</pubid>
                  <pubid idtype="pmpid" link="fulltext">18711373</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B72">
            <title>
               <p>The Future of U.S. Chemistry Research: Benchmarks and Challenges - US National Academy of Sciences 2007</p>
            </title>
            <url>http://www.nap.edu/</url>
         </bibl>
         <bibl id="B73">
            <title>
               <p>Beyond the Molecular Frontier: Challenges for Chemistry and Chemical Engineering - US National Research Council 2003</p>
            </title>
            <url>http://www.nap.edu/</url>
         </bibl>
      </refgrp>
   </bm>
</art>