Financing Biotechnology Databases

A Workshop organised by The Biotechnology Information Strategic Forum, with support from DGXII of the Commission of the European Communities, and held at Purmerend, The Netherlands, May 1997


The workshop - Financing Biotechnology Databases was held at the Golden Tulip Hotel, Purmerend, The Netherlands in May 1997. It was organised by The Biotechnology Information Strategic Forum, with support from DGXII of the Commission of the European Communities.


Industrial Needs; investment in biotechnology databases, experiences from Pfizer -- Ian D Harrow

Ready access to gene sequence and related scientific literature information is vitally important for fuelling the discovery and development of innovative, new pharmaceuticals. Pfizer has attained its current competitive position by developing an internal informatics capability based on a global Information Technology (IT) infrastructure. Bioinformatics, being the application of IT to problems in biotechnology, provides the critical link between in house project scientists, engaged in therapeutic disease research, and a network of strategic collaborations with biotechnology companies, commercial and public database providers.

The company has established a team of IT and bioinformatics experts able to support the research staff and carry out research themselves. The bioinformatics team provide analytical tools, expert advice and a framework for database analysis. The services run within a firewall protected environment. Communication between staff is facilitated by a single corporate electronic mail system. Pfizer has a network of external strategic collaborations. These include public and commercial database providers.

Pfizer recognises that biotechnology databases should be:-

Below, three example biotechnology databases are described to illustrate industrial need.

Pfizer makes use of Entrez at NCBI (http://www.ncbi.nlm.nih.gov/Entrez/) which is a collection of databases which includes protein sequence, nucleotide sequence, the full MEDLINE database and additional files on protein 3D-structures and comparative genomics. It has a fast search engine, provides downloadable data and is free to all.

While Entrez is a basic level database, Pfizer also subscribes to added value databases to support their in-house effort. An example of such a database is the Yeast Protein Database (YPD) from Proteome Inc. (http://www.proteome.com/YPDhome.html). This contains the complete yeast genome, with entries linked to GenBank, SwissProt, SGD, MIPS, and Medline. There is expert annotation from reviews of the literature and the product is free for academics but sold at a subscription fee to industry. The database is customisable for corporate subscribers. YPD has grown rapidly as the yeast genome has been sequenced and the protein structures and functions are now following. It is clear that similar products will be produced for other organisms as their genomes are sequenced.

The Human Gene Sequence and Expression database (LifeSeqTM) at Incyte Pharmaceuticals Inc. (http://www.incyte.com/) is a commercial database that is being sold currently to 18 corporate subscribers, including Pfizer. This database provides added value information on, at present, more than 2 million fragments from expressed genes cloned from human normal and diseased tissues. It also provides access to clones as reagents for exploitation. The database is sold on a non-exclusive subscription with an exclusive satellite option.

Pfizer's strategy is to balance cost in proportion to expected impact on the R & D process. The three example databases: Entrez, YPD and LifeSeqTM span the cost/value spectrum. The cheaper the product the more variable can be its quality and the more general its information. Some of the more expensive databases are actually tailored to the company's needs and this provides an added level of exclusivity. The key point for a company like Pfizer is to ensure that the database has demonstrable value to the R & D process, as well as the ability to integrate with corporate IT infrastructure. Looking to the future, as genome sequencing projects approach completion, the emphasis moves from "sequence" to "function". This trend is clear from Yeast and the nematode, C. elegans. Therefore, it is increasingly important to place genes, and their products, in biological pathways, which is now termed "Functional Genomics". It is equally important, and critical for understanding the cause of human disease, for molecular information to be related to the biology of the cell, tissue and whole organism. Such trends and needs mean that to be successful in future, biotechnology databases will be expected to interact and be fully integrated.


Back to Workshop Contents Page