Table 2

Compounds-per-protein and per-document.

Database or subset

Document

count

Protein ID type

Total

proteins

Human

proteins

Cpds-per-protein

Cpds-per-document


GVKBIO

87747

Entrez Gene

3292

1468

604

22


GVKBIO journals

51810

Entrez Gene

2660

1146

239

12


GVKBIO patents

35937

Entrez Gene

1765

952

815

40


GVKBIO DD

26825

Entrez Gene

733

339

5

0.14


GVKBIO CCD

27286

Entrez Gene

1224

610

7

0.32


WOMBAT

10205

Swiss-Prot

1979

1095

91

18


DrugBank

n/a

Swiss-Prot

1625

1356

3

n/a


PubChem actives

n/a

RefSeq

72

n/a

104

n/a


PubChem PDB

n/a

RefSeq

818

n/a

14

n/a


BindingDB

1142

Swiss-Prot

297

97

112

19


MDDR

137754

n/a

n/a

n/a

n/a

1.4


DNP

7765

n/a

n/a

n/a

n/a

18


Column three is the type of protein identifier used for the count of all species (column four) and human proteins (column five). In columns six and seven the filtered compound totals are taken from Additional file 1. The compound ratios are calculated with respect to total proteins and documents. For boxes labelled n/a the information was either not applicable or not available. For reference we have included a compounds-per-protein calculation for the PubChem actives subset even though there are no document-protein links analogous to the other sources.

Southan et al. Journal of Cheminformatics 2009 1:10   doi:10.1186/1758-2946-1-10

Open Data