A Workshop organised by The Biotechnology Information Strategic Forum, with support from DGXII of the Commission of the European Communities, and held at Purmerend, The Netherlands, May 1997
The workshop - Financing Biotechnology Databases was held at the Golden Tulip Hotel, Purmerend, The Netherlands in May 1997. It was organised by The Biotechnology Information Strategic Forum, with support from DGXII of the Commission of the European Communities.
Discussion
It is clear that the bioinformatics and biotechnology information world is facing many challenges, not the least of which is how to finance the huge growth in number and complexity of the databases in daily use. This will almost certainly require a combination of public and private funding and will demand a well planned infrastructure to ensure that databases can be developed, maintained and exploited for the long term.
The range of relevant biotechnology databases and related products is growing; where before we just dealt with literature and primary sequences, now we have secondary protein structure databases, added value digests and annotated databases, and combinations of data and information and software tools to modify and manage. Furthermore, we have a continued expansion of these products and tools across the biological sciences. There is a clear need to develop an overview of what science requires; information is increasingly important to R&D and the investments made in databases are such that those making those investments have to have a fair chance of obtaining a long term success.
The market is already served by a mixture of publicly funded and commercial products. It would seem logical to continue to use this combination for the future. However, care must be taken to define what public funds can best support, and, given that the profile for such funding will change with politics, extra care must be taken to ensure that alternative long term funding patterns are available to take over if public funding policies change.
This need is accentuated by the present, increasingly aggressive, policy of the American National Library of Medicine with regard to the free distribution of their MEDLINE and related databases over the WWW. By allowing more or less free entry and use of this service, the NLM is seriously distorting the market place by what can more or less be called dumping. While, ironically, many see this availability as a benefit to Europe (the American taxpayer pays) the action really threatens database producers and hosts all over the world, and increasingly restricts the accessible (i.e. abstracted primary literature) information to an American choice (MEDLINE does not cover all the relevant literature). Furthermore, there is a suspicion that the present free on the WWW policy may be a predatory move. The NLM has described it as experimental and so could stop when it has cleared a few competitors from the market. This would affect database hosts such as DIMDI as well as database producers. In such a scenario, the US will then be left dominating the market completely and will then be able to restrict the flow of information if they so desire (remember the restrictions the US government introduced with regard to distributing software during the Gulf War in 1991).
This policy is at odds with Americas claims for free markets and counters the honest competition between commercial and society publishers. In the long term the policy is probably not sustainable but it is a serious problem for the immediate future. The European response should be political and competitive: not to request subsidies but to ask the Americans to look at their own situation, and to prepare better products to compete with. Furthermore, the BTSF members might like to support Commission President Santers attempts to gain permission for the Union to act for individual member states on issues such as IPR. A common European approach might be better in cases where individual governments prefer to take what is available for the short term gain. Furthermore, the EC should indeed strive to ensure that it is never permissible for one country to stop others accessing information that has been built up in the international market. It is important to get users to support these issues - for instance, the American Chemical Society impose certain restrictions on the use of their data even though their database was first developed with the help of many British and German scientists and companies; this is often seen as a deviation, to say the least, of the spirit in which the programme was started even though it might, today, make economic sense.
It is also perhaps important to note that MEDLINE is not necessarily the best product; even though it is cheaper than its obvious competitors, it is certainly not the most used. Of the 64 available databases on DIMDI, EMBASE is the second most used and SciSearch and Biosis also beat MEDLINE. Therefore the above free policy might even be a ploy to persuade the American funders that more money is needed to improve MEDLINE! ( INSERM also uses ISIs Current Contents twice as much as MEDLINE).
In any case European producers have competed with MEDLINE for years, and should continue. Thus, questions such as what databases are relevant? and is MEDLINE key ? must be addressed. As there seems little chance of altering American practice, perhaps a better policy is to compete; as hard as this has to be given the subsidy situation. The key could lie in the journal market where Europe is leading the US. However, secondary services are essential here too: if an author knows that by publishing in a journal that is abstracted by MEDLINE he will get better recognition, he will do so. This could skew the journal market if the independent literature bases go under to this competition.
Another aspect that should be looked at in the competition situation is the development of software and services. Europe does not appear to put as much support into these areas as America with the result that we will become increasingly dependent upon them. Such a dichotomy also lures good programmers and staff away from Europe and reduces our ability to compete. The Member States and Commission should address the need for a good, total, infrastructure which includes training and development programmes. Europe cannot afford to let the same story as has happened with the computer industry happen again (here or elsewhere).
Despite, or perhaps because of, the addition of so many new databases, value and quality remain key issues. The primary journal remains central to the information chain and Europes leading position in the primary sector can be damaged if the publishers do not watch more carefully how things are developing. New ways of exploiting the Primary Sources are essential and Europe will suffer if it does not find them. Already many institutes are demanding that information is put on the university web site. Unless steps are taken to validate and check we will have a weaker system . People also want to do more with data and so require access and help with the sophisticated tools which are being developed to manipulate data and build ones own databases; and there is an increasing need to link primary and factual and other information resources and to manipulate data. Copyright and usage rules must allow this. The pharmaceutical industry is about to stop transferring copyright to publishers unless suitable arrangements for the use and re-use of data can be found. The publishers have to be aware of these dangers and a public debate is required.
The most challenging area for growth and novel development is probably the need to produce added value databases which examine the structures and functions of molecules. Such developments cost a great deal of money in terms of expert opinions and other curational skills. There is presently no long term stable funding mechanism for these databases. It seems likely that public funds can be used to start them in their research phase but other, market-led, funding is needed to maintain them. The Americans have succeeded in starting some high-cost databases which industry is buying and using. Europeans are perhaps inclined to stay at the small is beautiful stage although there are many examples of excellent added value databases that could be turned into essential success stories (for instance the PROSITE, PROFILES and PRINTS group).
New financing infrastructures will therefore have to furnish the research required to start databases and then cover many different types of database, including those that are derived from the primary data sets and thus require added value - which costs a lot of money. It is also essential that new databases are built so that they can interact technically with others, and be recognised as long-term, stable, additions to the information market. All these points require a central policy.
Europe needs to bundle its efforts and investment and so better alliances between academic and commercial publishers are needed. Some feel that too little is being done on this topic at present and there is some concern that commercial competition will delay rather than accelerate the process towards the development of generic tools to link and interrogate all databases. This might be so in an ideal world but there are many different ways to approach competition and, for instance, a publisher like Elsevier, feels it can serve the market and its shareholders needs better by, again for instance, buying a software company like MDL in order to open avenues from chemical structures to the primary literature. Other examples and liaisons will be found in biology.
The role of the central institutes such as the EBI is obviously crucial. There is a clear case for the public funding of databases where the main users are also in the public domain; and there is nothing politically wrong with then allowing this basic data to be used by others in added-value products which can indeed be charged for. However, politicians hate to fund non-voters and so there could be a danger in the global village that (European or American) public funds are threatened if the database use is from outside the supporting constituency. Against this, it is clear that such central services can best benefit from international cooperation and exchange and therefore funders can always make a case that they receive as much as they give. This could be the basis for continued working and cooperation. Furthermore, America needs our information and so, if Europe stopped funding, say, the EMBL Data Library, they would have to sell their information to America for them to use!
The PRINTS case is a classic example of how a R&D project is threatening its own future by its own success. Given the present situation, where no infrastructure funds are available for the maintenance of such projects, the meeting agreed that, eventually, such databases needed a publisher, not a grant. The obvious success of the database in terms of accesses points to the fact that it could probably start to earn the required money but issues of IPR and ownership would have to be solved. This stresses the need for database producers in the academic sector to register their interests and document the ownership. Universities are keen on asserting their ownership of projects but it is essential that this has to be done in a professional manner. The steps to be taken should be detailed and registered so that other database producers can turn to them for help.
Unfortunately, given that .... users will not pay while they do not have to .... the general conclusion was that while committed scientists continued to support their novel databases as they do, few outsiders would come in and help! It is also apparent that, unless secure funding can be found for such databases, the scientific community would have wasted an enormous amount of money and effort. The investment needed to make a database sufficiently effective for it to be missed if it is closed, is huge - in terms of staff input and money. In the late 1980 / early 1990s, hundreds of thousands of ecus of public funds were wasted in developing but not maintaining microbiology databases. The money came initially from R&D funds but no policy was in place to move funding onto a maintenance level. This waste must not be repeated in the molecular biology world.
One barrier to establishing such a fund/policy appears to be that a number of todays senior scientists still do not recognise the need for databases: in general they were trained in a previous age and do not see their importance, nor do they recognise the expense and skills needed to produce a secondary database from primary data. At the same time it is surprising that the same senior scientists have recognised the need for large, central, physical services, such as accelerators etc. Bioinformatics can be seen as a distributed central facility , without which biotechnology R&D will fail; perhaps one way to alert people to this need would be to set aside a percentage of every grant for the purchase of relevant information? Furthermore, databases only sell when they are seen and so a common central policy would help all concerned. Information is not free. Information is a commodity even if certain categories, for instance factual data, which comes from the academic community and is used by them (in the main) on the Free in, Free out policy, have to be kept separate. All other databases will, eventually, have to prove their worth and, once a database has become non-innovative, there is no longer a case for it to receive research money.
In addition to this required culture change, Europe should recognise that it does not, anyway, have a central centre like the NCBI in the USA for supporting such databases. The EBI has to compete for funds just like anyone else, for the same type of product in the same funding climate. Neither the UK nor the EC presently recognise the need for continuing funding for databases. The EBI has done a lot to educate senior scientists of the need for databases; more must be done by all scientists. Industry can play a central role by emphasising the need for such support and for illustrating the use of these resources.
Not all databases will survive in the open market. Some, like BIOREP, are clearly needed for central surveys and so could be funded by infrastructural money. AGREP has been produced for many years to support the Common Agricultural Policy. Useful or not, this is a database that is funded for a political need and so might not need to be kept alive when that political situation changes. If such databases are expected to earn their way, they might be seen as resources from which other commercial products can be developed. BIOREP might therefore be used to generate research reports which could be sold; i.e. use it as a tool to produce a product for sale.
A possible weakness is the fact that BIOREP only covers common data. But industrial users confirm that there is a market for this: it is what a company can do with the information that makes it important. Furthermore, one persons core information is anothers fringe and so databases like this can contribute to the general knowledge pool needed for added value activities.
The ETI experience proves that it is possible to persuade academics to deposit data into a central database that can then be exploited. The model clearly exchanges software and a final product and possible royalties for information and could be used in other areas. It is also worth noting that the common perception among academics is that a CD or other electronic product earns money while a journal publication is not seen in the same way; in fact the opposite is true! The ETI also has a successful joint venture with Springer. The CDs sell in differing numbers. Some subsidise the others and all scientists are therefore encouraged to place their material in the data pool.
The ETI and EBI both work with the cooperation and goodwill of the scientist. While the EBI currently feels that this is only possible in a non -profit environment, the ETI sells its products and so regains many of the costs incurred. This latter policy can therefore work given the correct environment and arguments.
Biodiversity will make as many demands on bioinformatics as molecular science has. The political implications of biodiversity are not yet being recognised by database producers and it is possible that as developing countries feel they have to manage their collections and related information in accordance with the Biodiversity Convention they will also claim copyright and other intellectual property.