A Workshop organised by The Biotechnology Information Strategic Forum, with support from DGXII of the Commission of the European Communities, and held at Purmerend, The Netherlands, May 1997
The workshop - Financing Biotechnology Databases was held at the Golden Tulip Hotel, Purmerend, The Netherlands in May 1997. It was organised by The Biotechnology Information Strategic Forum, with support from DGXII of the Commission of the European Communities.
Introduction -- Jack Franklin
Biotechnology R&D is increasingly dependent upon databases, and the infrastructure connecting them. There are already many different biotechnology-relevant databases, and as both biotechnology and information technology, advance, there will be many more. Thus, new R&D based products are appearing with great regularity, and primary journals, secondary literature databases, and the associated document and alerting services, and combined products encompassing complex collections of different types of information, are also evolving in line with the electronic market place. There is no doubt that we will have a totally electronic environment, with the ability and need to navigate among many different forms of scientific data and information, by the next decade.
The databases and infrastructures needed to service biotechnology will cost a great deal of money. At present, most of the "new database money" (such as for nucleotide sequence libraries, and databases for protein sequences and protein structure) actually comes from R&D funds, as the products are designed and built to expedite research and so have their roots in original research problems. They are started by dedicated scientists who need the product for their research, and make it available to their own teams and then to larger groups of colleagues. Ultimately, the database may become part of the R&D information infrastructure.
Probably because bioinformatics has emerged as a "new" discipline, this form of initial funding is also seen as being "new", and there are many who see no reason why it cannot continue. However, this is not the case, and a large number of today's established "commercial databases" started in the same way; thus Excerpta Medica's EMBASE first saw life as Professor Pierre Vinken's private set of abstracts. He swiftly recognised that he could not maintain the database out of his research funds and formed a company to build, maintain and market his product. Household names such as ISI' s Current Contents have had similar beginnings, and the established databases in use today have almost all had to make that difficult transformation from "experimental" to "established" product.
R&D funding can be justified while the database is being researched and developed. However, a "living database" has to be secure: it is more than just a transient "collection" of data: and the content has to be updated, maintained, and continually verified and kept available. This last element is as important as any other, as users have to have faith in the longevity of a database. All this costs money and, as databases mature and leave the research sphere to become "products", they cannot be left to the vagaries of competitive grant funding. Furthermore, it is not fair to draw such "maintenance" money from R&D funds.
A database therefore needs stable funding. This can come from a central source or from customers, but there is no doubt that funding will become increasingly problematic. The number of biological disciplines using bioinformatics techniques is growing and the number of databases and techniques in use in the existing disciplines is also increasing. The complexity of the databases, and of the software needed to handle them, will also continue to increase and we are now speaking of a new infrastructure which is essential to the biotechnology world. The chances that this can be funded from the present R&D funds are small, the more so as few R&D granting authorities recognise the need for this sort of maintenance money anyway.
Superimposed on this financial need, is the debate as to whether and what data should be "free". There is general agreement that distinct roles exist for "free", "not-for-profit", and "commercial" databases, but the boundaries between these can become blurred. The definitions and rights of the "free", "publicly funded", "sponsored", and "commercial" sectors are therefore under open discussion. But these discussions must take place against the complete backdrop of users; for instance, industry has to patent new products to survive, and yet at the same time has to protect its data in this process. In other words, the need to patent makes the placing of working data in anything other than the most protected environment, impossible (the patent itself of course implies a particularly significant form of data disclosure).
This workshop will explore these questions and discuss how the different players might interact to develop an infrastructure where ALL users can access the data they require at fair and equitable prices commensurate with their funding infrastructure. If an industrial researcher has to look at three literature databases to ensure he/she has covered all the possible sources, then an academic should be able to do likewise; and the same story holds for the numerical and factual databases essential to biotechnology research. Biotechnology R&D is of enormous importance to Europe and it relies upon information for its success. The researchers who best manage to access and manipulate the required data will increasingly win the races for new drugs, foodstuffs and environmental solutions. Bioinformatics is fast becoming an essential "enabling technology" and so it must be underpinned with a stable and secure financial infrastructure.