A Workshop organised by The Biotechnology Information Strategic Forum, with support from DGXII of the Commission of the European Communities, and held at Hotel Val Monte, Berg en Dal, Nijmegen, The Netherlands, April 1994
The workshop - Strategic Issues in Biotechnology Information was held at Hotel Val Monte, Berg en Dal, Nijmegen, The Netherlands, April 1994. It was organised by The Biotechnology Information Strategic Forum, with support from DGXII of the Commission of the European Communities.
Introduction
The workshop began with a series of short talks designed to "set the scene"; in particular, the two components of this emerging information world - the emerging world on the Internet and the traditional world of literature products - were described and set in their respective environments. At the end of the workshop, a demonstration of the various information technologies that had been discussed was given at the CAOS/CAMM Center at Nijmegen University.
The bulk of the meeting was however spent in discussing a number of strategic issues that had been stated in a "discussion document" circulated previously to the participants. These issues had arisen from discussion with users and producers of biotechnology information and are given again here as the backbone to this report.
This document summarizes some points from these discussions. It does not represent the formal views of the meeting nor of the BTSF, but it will be used by the BTSF Members to further distil priorities and "future actions" before forming the basis of a final publication to be produced later this year.
The Workshop
Theme 1)
A) What problems might arise if the remaining European database hosts are taken over, or removed? All now choose the materials they wish to offer and so act as a filter to the user and offer a variety of selling services to the producer. How can database hosts be encouraged to develop market-niche skills so that quality and not quantity are promoted? A monopoly could be disastrous for the user and the producer alike, but too much competition in a small market will force the user to use the secure product and could force the rest to the wall.
What steps should producers take to ensure they have a way to reach their market? How can this market be best served? What guarantees are required, and might be obtained, to ensure that the present situation remains open and free? To whom should such a policy/request be addressed?
There was a general agreement that the present commercial host situation in Europe was not healthy. The economic stability of many European hosts was thought to be insecure (although as none of the invited host organisations attended this was difficult to ascertain) and there appeared to be a real danger that some of the remaining European organisations such as STN (which is thought to be already controlled to some degree by the American Chemical Society), could be "taken over" by their American cousins. The scenario where all data might eventually be stored in America is possible and while some participants felt that this presents no real danger to Europe others are convinced that European R&D industries have to have unfettered access to the basic information they need for their activities and that this can be best guaranteed by locating essential databases on European computers under European management and law.
In terms of accessing information from abroad, and from the USA in particular, it was recognised that while large industrial users can access information sources wherever they may be (for instance through American daughter companies) the plight of the SME is more acute. These companies already lack the infrastructures and support needed to make the best use of all the different information sources and any geographical restrictions might make this task more difficult. At present, few users believe that, in these days of better telecommunications, the major hosts will continue to store data on both sides of the Atlantic and so believe that the best the European market might hope for is that the distinct market wishes of both American and European, and other, user communities will be monitored and served by any (new) host that manages to assume a position of international dominance.
The statement "as long as there are 2 (European) hosts then there is a choice", was felt to be simplistic as 2 hosts would not offer the choice small producers require, nor would the lack of further competition stimulate the remaining hosts to keep abreast of the technical and software advances required to serve the market best. Furthermore, the major biomedical hosts are both national with their prime allegiance being to their national markets - Germany and France.
All the hosts might well do better if they paid more attention to market segments. Several commentators feel that there are clearly going to be difficulties for all the commercial hosts if they all continue to seek to cover large areas of science and business with more or less the same files. Furthermore, although the use of CD ROM and associated technologies has actually increased the use of on-line databases, the danger remains that this will eventually lead to a reduction in searching (evidence presented after the workshop confirmed that some major libraries are indeed now witnessing a reduction of on-line searching due to the use of CD ROM files of archived material).
In any case, there is still a very clear place for the traditional database host who offers the user a variety of added-value services. These hosts need to keep abreast with the technologies and desires of their customers, and must offer the producer well structured markets and good pricing scenarios.
Several critics wonder whether the days of the "shopping mall host" (such as Dialog and Datastar) are numbered, given the increasing ability to search many databases at once in a distributed form (see section 4). Thus, cross database searching is cutting the usage time in the individual databases and therefore royalties, a trend which is threatening some of the smaller database producers. If they have to seek other ways of reaching their markets this trend might mean that the role of the "super-host" changes in the coming years. No definitive conclusion was drawn on this point but a future scenario could well be one built upon a series of larger and smaller hosts, each with a defined and well researched mission statement and market niche, being used by the users via integrated networks and interfaces (although for this to happen the European networks will have to improve a great deal - see later concerns). These hosts might work far more closely with the producers and so we would see an environment of dispersed or federated databases. This would again help the specialist databases. Navigation between the various data resources will then become a greater problem; there will be a greater need for better locating databases not a lesser one; and Europe will have to pay attention to producing the required software for this to take place.
It is quite clear that the whole biotechnology R&D community, industrial
and other, would benefit from better coordination between the hosts, the
producers and the users. The BTSF will try to establish such links
in the coming months and will stress the need for the market demands of
the various niches to be specifically addressed.
Information professionals, in academia and the commercial world, require
good quality searching services and cannot envisage a world without the
"traditional" host. One of their key needs is quality control, and
the database hosts must continue to work with the information producers
to maintain an emphasis on scientific quality.
This emphasis on value-added services and quality-controlled material is one key difference between the traditional hosts and the new academic cyberspace environment, which offers a large and rapidly increasing amount of un-refereed and unstructured information. Most of this data is available for downloading to the user's own computer for further manipulation. However the emergence of X Mosaic and WWW products is showing clearly that structured, interactive, files are feasible, and the publishing world will surely make use of these technologies in the near future.
These technologies are already being used for interactive products such as catalogues. The biotechnology market has to obtain information and examples on many resources and it is clear that culture collections, media, cell lines, enzymes, probes etc., all offer market opportunities. Europe should make an effort to use these information tools to make known and to market the biotechnology products, services and resources available in Europe and thus ensure that users can be served from local markets.
In addition, business requires better access to information about regulations, patents, and partnerships, research opportunities etc. These at present form a separate line in the information market but must be integrated into the host environment in the coming years.
The gopher/WWW cyberspace world is complementary to that of the structured commercial hosts. Users will need to navigate through both; and quality control is required. How can this new varied environment be described and given quality? The present situation is increasingly un-refereed although the computer/node managers are fulfilling this role to some degree by placing themselves between their customers (users) and the servers - if they feel the service is serious, and offers good quality, then they flag the server so that people use it. Some commercial companies are also doing this by offering gateway packages through to the various free services. This sign-posting is essential to encourage greater use. The BTSF should concentrate upon these needs in a scientific way so as to combine the best from all environments and preserve the flexibility and freedom of the academic environment with the structured quality control of industry. Ultimately, information is essential to problem solving and everyone requires quality - poor quality information is inefficient and thus expensive; even if it is not charged for per byte, it costs a lot to send or collect.
New means of payment will also be required. Site licences will become more common. The BIDS experiment, [where users buy access to a database mounted on the Bath University computer in the UK on the basis of obtaining a "as much as one can eat"] is therefore worth further exploration: furthermore, perhaps an alliance between the academic hosts and the commercial producers could offer a basic package to which the specialist database could be added.
Industrial user groups are already negotiating with database producers on behalf of their users but the major problem is to include everyone - from sporadic users to major copiers of data. Parallels to the software licensing environment will have to be made and the information market will need to look closely at other electronic products to choose the best solutions for their problems. The BTSF should establish a study group to look closely at these problems in the coming weeks and months.
In conclusion the host world is facing many technological challenges. The user is faced with an increasing number of sites where data might be stored. The information producer is faced with the challenges of competition from freely available un-refereed academic material, and a declining number of hosts. Dialogue between all three — user, information producer and database host — is essential if Europe is to maintain a competitive and efficient environment which supports the R&D communities relying upon this data and information.
Theme 2)
B) While (telecommunication) connectivity is usually "acceptable" within national boundaries, Europe still suffers from poor links between countries. This is a political problem but it forms a real barrier for the successful development of an on-line market. What steps can be taken to improve the situation?
The basic question facing the industry (producers and users) is how to obtain an information infrastructure that can be used for everyone's maximum benefit; information is absolutely vital to R&D, especially in biotechnology and both the academic and commercial communities must be able to rely upon their data and information services if they are to compete. The fundamental item in any such emerging infrastructure are the telecommunications networks and these require drastic improvement if Europe is to compete internationally.
It is not possible to establish new infrastructures - the information-using communities, including biotechnology, will have to use the existing infrastructures in Europe, and exert pressure for their improvement. Much is said about how hard the European PTTs are working to improve European networks, but the specialised database producers offering value-added products based on the EBI databases admit that they cannot mount and operate the same databases and services in Europe as their counterparts in the US; solely because the European networks are too slow, too narrow and too expensive.
In short, the present European telecommunications infrastructure prevents European information producers carrying out what they need and want to do to keep abreast with market demand; numerous examples of European producers using the American networks to distribute their products were given as were details of users dialling into the US to access services they could not obtain here. It is obvious that the quality of Europe's networks has to be improved if European information producers are to continue to compete with their American competitors.
In addition to capacity limitations, partially overcome until now by improved data compression techniques, a major problem with European networks is reliability. The Dutch SURFnet monitors the connectivity (i.e. the ease of connecting between two sites) between the 24 European EMBnet nodes on a daily basis. This fluctuates enormously and, while in general it can be said to be satisfactory for file transfer (known generally as ftp - file transfer technology, where a file can be retrieved from a remote computer and be brought back to one's own machine for further manipulation), it is not adequate for guaranteed on-line interrogation. And it is getting worse. This reliability is a major cost problem as staff are required to maintain and trouble shoot and it is not surprising that, politically, the US see Japan as being able to compete on a network level but they do not appear to regard Europe as a threat.
Many future uses of networks will require the sending of images. Data compression has until now allowed this to take place but the expected increase in traffic will soon overload the existing networks to such an extent that extra bandwidth will be essential. In the bioscience world the expected increase in use of networks in medicine will seriously strain the present band width but all scientific disciplines will wish to send more graphic and image dependent material and the growth could become exponential in the next few months.
Networking is of such fundamental importance that the Commission must be stimulated to do all it can to improve Europe's position; the PTTs are clearly the main stumbling blocks, but also the main innovators, and the weak international and national attempts to break their monopolies have not basically worked as far as cross border interaction is concerned. The meeting was told that the situation would improve in 4 years but this is too late — Europe has to have superhighways.
The case was made that many private clinical trial and related health research companies, e.g. running the secure clinical trials for drug companies, have abandoned the wait for new secure public access lines and are using the commercial Tymnet service and other leased-line systems. Tymnet is basic, but secure, and can be accessed and relied upon throughout the world. The other answers presented - leased lines and Europanet for instance - were not felt to be long-term answers to the users needs.
Finally, governments are also poorly educated about the true situation both within their geographical boundaries and across borders. All too often they are too protective of their PTT above all other considerations. The French PTT is able to exploit its position to the cost of the user while British industry has not even been able to gain access to a secure form of the academic network. Pressure and education, through intelligent lobbying, are urgently required.
Clearer funding structures are also required. The commercial world feels that academia is given unfair advantage in being able to use the academic networks "free". This feeling is enhanced by many academics who appear to be against "paying for anything". This attitude comes naturally from the fact that there is a zero marginal cost at the point of use. However, the academic "forgets" the substantial overheads in staff and equipment universities contribute to such communal services as well as the fact that most institutes also pay considerable sums from their central funds for communal access (thus a medium sized Dutch university might pay around Dfl 1 million for accessing SURFnet and will also have a number of staff available to maintain the gateways and network connections needed to keep that node as part of the system). Such an informal support system is seen by many as being too risky for the long term future; the more so in an environment where national governments might change their national funding for international activities. While few will deny that the system has worked until now, many feel that better and more structured arrangements are required for the future, the more so given the obvious importance these networks will have. However, academics are also wary that these improvements, demanded by industry, will increase the cost of the academic networks to them or their institutes. There is potential for real conflict.
Another problem facing industrial users is security. Despite the claims by some industrial users that they had solved the problems of hackers and data loss, deep worries among the pharmaceutical and other industries remain. A well coordinated attempt to stop piracy and hacking is needed, although before this can be attempted it will be necessary to come to common agreements as to what is computer fraud etc. (there are many differences between European countries' attitudes towards computer fraud - in The Netherlands it is not recognised as a crime - which are causing additional concerns on open international networks).
Science, and the biosciences in particular, are still only small users of the networks and the total importance of the networking world for non-business applications will improve when, for instance, education and other non-research activities can be better supported on the networks. The USA is spending large sums of money to support the linkage of schools and hospitals. Europe needs to build similar infrastructures and the bio, and biotechnology, opinions do not seem to be listened to by Europe's networking powers (for instance no national PTT responded to the CEFIC Report — Bioinformatics in Europe - Strategy for a European Biotechnology Infrastructure). A louder voice is needed to persuade the powers that be, and especially the national PTTs, to improve cross border standards and quality.
The governmental representatives felt that industry was doing too little
to alert the politicians to their needs - get organised. The BTSF
should act as an industry voice and adopt intelligent lobbying techniques
to get their message across to those in a position to implement change.
The group must therefore organise; alone they will remain weak voices on
a stage dominated by the engineering and physical worlds. Infrastructure
development requires international action and the present EU rules and
regulations make it impossible to fund and develop new initiatives.
This is particularly surprising given the Commission's past and on-going
commitment to the establishment of a bioinformatics infrastructure (SEC
(91) 629, reproduced in the Bulletin of the EC Supplement 3/91 ....
(ii) the Community will, through its research programmes, information market
policy, and international collaboration, contribute to the development
of a biotechnology information infrastructure within the Community and
world-wide (including databanks, software, and electronic networks and
services).
Theme 3)
C) "Could biotechnology database producers make better use of the academic network infrastructure"? Several users have indicated their preference for the Internet based information infrastructure and have suggested that the commercial database producers mount their files on these services. At the same time, many commercial users are anxious to gain better access to the networks. Concerns about hacking and computer security prevent many companies from doing so but increased security from the host side could solve this worry in many cases. What policies are needed to encourage the use of these facilities?
The series of introductory talks set a scene of two worlds: the exciting, unregulated, rapidly expanding free world of the Internet, and a more serious, targetted and conservative world of the traditional value-added databases covering scientific information with good access and indexing tools. These are not mutually exclusive and they do complement each other; however, in general terms at least, neither seems to accept the strong points of the other.
The Internet user, wandering around cyberspace via gopher and World Wide Web, is able to access an amazing amount of information. Some centres (e.g. the Finnish EMBnet node) transfer some 6 Gbyte of data over the network per day and the exchange on a world-wide basis will be many times this.
In parallel with this growth in the amount of data being transported, many users are beginning to use the services of specialised centres who can be reached by remote log-in techniques. Specialised software and associated services are mounted at these nodes so that users can carry out specific tasks on generally available data. This data is regularly updated, sometimes daily, and the system offers the user a well integrated facility (a node can be any computer taking and distributing information. Here it is meant as a centre that handles a number of external users). Many of these services are financed from central funds and so no individual costs are charged, although this differs according to country and service.
These nodes use the Internet for communication. The growth of Internet has been spectacular, so large in fact that proper figures ar impossible to obtain. It is perhaps sufficient to say that the Internet is now a collection of more than 13000, primarily academic, networks and that Europe has some 25% of all Internet connections, said to be in excess of 50 million. The Internet showed a growth of 139% in 1993 (during 1993, Internet grew by more than 10% per month The Internet Society 1993 Statistics) and this growth is continuing at the present time.
The commercial, biotechnology-based, R&D communities are in a quandary. It is clear that, while the cyberspace environment is inhabited by the academic, the bench scientist wherever he or she may be is the true user: thus, if you want to remain at the cutting edge of research, you have to be linked into the Internet world but this is not seen as being "safe" in terms of access and security by many industrial R&D and computer managers. The dangers of hackers and piracy are not to be under-estimated but, as both the Glaxo and Pfizer representatives made clear, many companies have managed to establish fail-safe solutions to using the Internet which has allowed their scientists to participate just like any academic scientist. Overall however, better and securer access to the Internet-based databases needed for biotechnology R&D is essential as industrial companies, and many advanced academic research teams, cannot afford the risk of anyone breaking into their systems. Technological advancement, education, and support in this area is therefore essential; especially the education and support — without education users stagnate and the full use of the market remains under exploited.
The meeting accepted that better collaboration between the commercial and academic communities was essential. Groups such as the EMBnet Stichting are doing more for the commercial user and the BTSF feels that EMBnet might spearhead the provision of better services and support for industry.
In addition to the ftp material, industry requires refereed sources and added value software and services to get the best from the data on offer (see section 1, biotechnology information is not just free downloads of software and data). The role of the information professional is much clearer in industry, where the intermediary still has a clear function. The academic librarian is presently rarely involved with accessing factual information. This is a waste of a resource as, with the proper training the librarian, even without a scientific training, could help the scientific worker gain the best from the factual information files. The EMBnet community might be the best to carry out this training of non-scientific staff.
Industry also requires direction: a "story" has to be told or followed from its start to its end; and the "reader" must be able to enter, or pick up, the story at any point. The present uncontrolled growth of un-refereed data services is not helping information access and quality control will be an increasing problem. The academic community tends to feel that it might establish self-regulating quality control - a university might offer all its publications on-line after internal review - but the general feeling was that this would not be a universal solution as the R&D based biotechnology industries require data and information to have been validated (industries such as the pharmaceuticals may have to submit dossiers to regulatory authorities, the dossiers containing references to the literature and to scientific and clinical data that must all be valid. In the light of this and other related details of patent registration, the Nucleotide Sequence Databank has had to pay stricter attention to industry's needs e.g. the exact time of deposition of a sequence in terms of patent rights) and will continue to do so).
The meeting accepted that urgent steps to organise the key data services currently on offer are needed, and that aids to navigating through, as well as evaluating the quality of, the data bases on the public sector services are essential if maximum efficiency is to be obtained. The present secondary literature services will increasingly have to ensure that they guide the user through the networks so that all the material relevant to a story can be located and retrieved. This will almost certainly involve distributed databases and services.
A further problem concerning the use of networks concerns intellectual property issues. At what stage does using the networks infer public disclosure? Would a pharmaceutical company be able to send a sequence along these networks for, say homology checking, without affecting its patent status? These issues also need to be seen against traditional processes that will also be influenced by the use of network communications.
Theme 4)
D) What will be the role of the smaller database in the future? How will they be mounted, and can they be linked to other services/products to enhance the total value of the data/information? As data collections become larger and larger, the need to select the specific database before the detailed search will increase. What strategies can be effected to protect the smaller collection? How can one help organise the different players (from the academic and commercial, and the producer and the host worlds) into a coherent group for the common good?
I) Most database builders adhere to their own formats and standards which leads to confusion and inefficiency. Should not producers adhere to basic guide-lines (rather than intricate details of technology) such as "unique identifiers", and establish links to other database builders and nomenclature groups. Who might arrange this?
J) How can, and/or should, the current services be organised so that joint databases are produced which give a form of meta analysis? Should the database hosts do this (e.g. Datastar's ElseVirology) or should the producers join forces to ensure that they produce fully interlocking products for the disseminations services to use as best they can?
The meeting first clarified the term "Small" which is an artificial definition — the problem concerns specialised, or special interest databases.
These databases are increasingly necessary and useful because they offer "expertise for an appropriate area". They are aimed at specific user groups and are often very "state of the art". Despite this, due to their niche position, and possibly due to the fact that they are used at a specific part of the R&D cycle, (i.e. they can only be used by a few users), many will never obtain sufficient commercial use to cover their production and exploitation costs.
A major reason for this is that such databases require specialised staff adding value to basic records and data and are therefore expensive to produce. Seventy-five percent (75%) of these database costs are staff-related and, as such products are small in volume (hence the earlier misnomer), they often lack the size required to generate sufficient income from on-line searching. They also might lack the market size to allow extensive marketing and so require other techniques to maximise their use.
The EU had a policy of supporting these databases under previous programmes but recent funding decisions indicate that this is no longer the case. Furthermore, the European Commission, while having played an important role in stimulating the start of a number of databases, has not been, and is not, in a position to provide long-term support for research infrastructures. This might be short-sighted, given the need for Europe-wide activities, and there is certainly the need for a supporting structure for special databases which, while not able to support their own costs, are essential in promoting R&D.
The LiMB list published by the NCBI is a good indication of what biotechnology-relevant databases are available. Europe should ensure that all its projects are advertised; and might also try to produce a similar list if European products are found to lack coverage in this American list.
Specialised databases in particular will have to be user-driven, and so need to reach their maximum market. The producers need easily accessible facilities to mount their databases for interactive searching - mounting for ftp is not a problem. Europe has few database hosts willing to carry these products and also lacks a mechanism to allow database producers to test their materials (even ECHO is now charging to mount databases for testing and few R&D users use ECHO anyway).
These databases would receive more attention if mounted together: for better marketing/innovation and so that a user can switch from one data collection to another. This may mean organising them in clusters but in any case they must be compatible with "like databases". The producers should therefore look for flexibility, with good scientific validation and technological structure.
All these developments mean that the specialist database would benefit greatly, in terms of visibility and use, if it could be placed in a "tank" of data: perhaps in a federated or dispersed data base structure. However, even here, the producers require strict guide-lines and standards. The EBI and EMBnet might help as could the commercial hosts - especially if they mounted such files for cross-file searching (although this reduces the time users spend in such a database and so reduces their revenue) by offering guide-lines for data structures and in producing interfaces and gateways to allow the user to reach the different data products; and also by becoming a focal point to which the producers can turn.
Marketing will also have to be improved. The commercial database producers might help by cross referencing their products to special products and the EBI and EMBnet could advertise the availability of the products throughout the community and help with training courses etc.
Even if all these activities succeed, recovering production costs will be difficult. The bioinformatics nodes which offer one possible solution to the mounting of these products do not at present have the opportunity to charge on a per hit basis for their data (although some EMBnet nodes do charge for specialist data and services). Ordering and payment systems are essential if the Internet is to be used by those information producers who add value (and need to recover their investment). The present moves by many catalogue companies to use the WWW might open new avenues for information producers (this is still a major distinction between the users of the Internet and other services; see page 5).
Present infrastructures and funding/changing environments,offer little incentive for Europeans to open their data to others. Nevertheless, as electronic log books become more popular, people will maintain their data in database form and so this will lead to better collections and more potential for use.
An increasing number of databases will be required as bioinformatics progressed and that many of these would be worthy of central support - after all, if the CEC can maintain an on-going register of on-going biotechnology R&D, why should other "infrastructure databases", perhaps even more essential to that R&D itself, not be funded? The rules and guide-lines for funding in this area are not clear in this regard and more clarity and opportunity for debate with the funders is required.
[The formal position of the EC appears to be that it is not willing to fund databases for ever; but perhaps it should be debated that certain infrastructure databases are essential, even if they are not cost covering? The Commission does appear to have recognised this point as stated in their SEC (91) 629 reference (see Theme 2, page 5 ) and the research community should make it clear that some research databases cannot be market driven but a number of these will support R&D further down the research chain and as such will be infrastructural. There seems therefore to be a strong case for public funds to be made available to support these central resources so funding structure should be found that supports essential data services (if it can be found for nucleotides it can be found for other areas).]
Federated databases are also worth looking at. If the user was taken to a group of related databases and banks, then they would find more relevant information. The producer , of the specialist databases, would therefore receive more usage and more income. This would fit neatly into the "tank" idea stated earlier. It would also highlight the need for databases to adhere to some sort of software standardisation. Many users find it a huge job to maintain different forms of complementary databases and while new programmes such as SRS (Sequence Retrieval System) are helping, more is required. This is especially the case where the user is not a scientist but an information specialist.
Despite the need for some degree of database standardisation, few believe it is possible. The use of SRS and other programmes should mean that different databases can be better linked - the producer must therefore restrict himself to certain rules but allow the IT engineer to integrate.
Ultimately, the key to the small database might be the client-server model, where the database producer becomes the host. Much planning is required before this option will be open to all. The first step will be front ends that can handle multiple databases.
A completely different role for special databases involves the future of the journal: whatever it may be. The industrial user requires well validated material — most accept the journal as being such a product although on many occasions databases are far better: data can be validated journal articles are not — refereeing is still very much an arbitrary process. There will be the need for alternative sets of validated materials, especially if these involve patent and regulatory material. These products will have to be electronic and will be developed along the same lines as the specialised databases are today. Few participants disagree that the data will be stored in databases and that the scientific insight will be found in journals - however these are delivered. But relevant forms of the same kind of subject matter might increasingly be found in specialist databases.
There is a strong possibility that the next generation of user will have to search and use databases as scientists now use primary journals: to obtain the full range of factual and refereed data they require for the research story. There is therefore every need for the present secondary databases to expand their coverage to include specialist databases which might, on occasions, replace the primary journal as the repository for certain types of information. In that case Europe might design a "root" or "core" database which ties the specialist databases together.
A central pointer database, where the records point to other databases identifying where records relevant to the full R&D story are located, would solve many of the problems facing both the specialised data collections and the larger databases. Such a database should be built so that a searcher could start a R&D story at any point in the record chain. The Common Core Database in Biotechnology Group (CCDB) is looking at whether such a product can be built and will certainly take up contact with the specialised database producers so that as many sources as possible are inter-linked.
Theme 5)
E) What steps should European providers and database hosts take to ensure that they remain equal players in the provision of biotechnology information. Such discussions should take note of the fact that the secondary and related services are of utmost importance to the continued success of the primary products - the control of the abstracting/alerting service can determine the usage of the primary research article; especially in the coming environment of document delivery.
H) Biotechnology information is international. What possibilities exist for the main international databases to be integrated to a degree that the world's literature and related data is linked into one global product (or series of integrated products)? What difficulties exist and are these media (e.g. CD ROM) related?
Scientific information is international and European data is essential to any international effort — in terms of both quality and quantity. Despite some indications that America would like to "go it alone", and produce subsidised databases that become the "services of first choice", users, academic and industrial, everywhere in the world, require top quality and complete coverage and so there is and will be a constant need for European information. This fact also means that international projects should be possible.
There is a need for better structured cooperation between countries. At present most cooperative efforts take place through such organisations as the EMBL Data Library, or the Genbank, Advisory Group but the formal need for cooperation is not always recognised. CODATA has made several efforts to establish itself as a central focal point but these have failed in the biotechnology area and have even led to more confrontation (in for instance the Hybridoma Databank project).
Another problem is that Europe does not have any centrally funded major databases such as MEDLINE or AGRIS, having to rely upon privately funded core services which naturally cost more and are less available for others to use as "building blocks". This "unavailability" compounds the problems other database builders have in using a "core" product to link their materials to. However, the fact that a number of commercial "competitors" have banded together to form the BTSF is an indication of their realisation of the need for cooperation in this sector; especially where commercial and academic interests have to be measured and combined. The BTSF should continue its efforts to represent the industry and the user and in this regard further support the CEC in, for instance, its EC-US Task Force on Biotechnology in terms of outlining what the EU requires in this area.
Europe needs a better funding framework for long-term projects. At present the Commission of the EU is virtually the only European organisation willing to support European bioinformatics projects. As the Commission is more willing to support Europe-wide projects that utilise the efforts of individual countries (e.g. EMBnet) it might be useful for national activities in this area to work towards common goals; with support if possible from the Commission. Furthermore, the Commission should accept that it plays an essential role in supporting central services that cannot be produced according to the subsidiarity principle and should therefore seek ways of maintaining these, not always by the same contractor, as a central infrastructure.
Politicians, users and producers of information in Europe cannot claim that Europe needs better information infrastructures unless they are willing to support them. There is however an obvious need for some "rules of the game" for international information infrastructures in biotechnology. These should certainly be established between the EU and USA but might actually be better handled by a wider-reaching organisation such as the OECD. Such parameters will be needed in other scientific areas, such as the Human Genome. It would certainly help if the European Commission made a start in proposing these guide-lines, or initiated the actions needed to start the framing of such cooperative agreements.
Theme 6)
F) Image management technology, of great potential in the production of secondary databases, offers the secondary database producers the option of also sending, via telecommunications, the image of the article that is being captured for secondary processing. This allows a closer link between primary and secondary product. But, what repercussions might this have for copyright and use of primary products in secondary systems? To what degree will users want to use this technology for their own (internal) needs?
Furthermore, most critics agree that these developments will increase the chance that users can select individual articles and products and pay-on-demand. Strategically, this will reduce the pre-paid subscription income of the primary producers, but it will also shift the buying responsibility from library to individual. How will the market adjust to this? Who will control these expenditures? How will the total industry react to such financial changes? How will granting authorities finance these changes?
G) (Much/most) Factual data is currently free and there are deep reservations among the scientific community against having to pay for data that has been entered into the databases without financial gain. Unfortunately, some form of "sales income" might be required in the future. Base lines defining "free" are required. What should these be and should they not be linked to "freedom of access" i.e. such data can be accessed by everyone.
The premise "free in free out" is a worthy one. However, as the discussions elsewhere (e.g. on the networks and on specialist databases) shows, there is no such thing as free. There is no doubt that the raw basic data has to remain free if the core databases such as nucleotides are to remain comprehensive: i.e. scientists will only place data into the data bases if they retrieve it for free.
The cost of adding value can be charged for, and copyright and fees are a problem for the publishers. These will have to be handled and the BTSF should establish expert panels to examine the legal and copyright side of databases and electronic publishing and make recommendations to organisations such as the Commission and, on a wider scale, the OECD on the establishment of common rules.
It is clear that the present "just in case" primary information scenario
will change to a "just in time" and there is much evidence that publishers
will soon be able to store and disseminate their material in electronic
form. Many publishers are exploring the establishment of "electronic
warehouses" and of document delivery. In addition, the agents, that
presently sell-on journals i.e. supply their customers with a selection
of titles that they purchase from the publishers, are in an excellent position
to monitor and tailor primary information to the user. All this means
that new forces are operating in the scientific information environment
and new alliances will be required to ensure that the user receives
the required information at the right time and for the correct price.
The BTSF is in a good position to examine the needs of the user and should
continue to expand its links through the publishing industry; to include
the librarians and agents as well as the producers and users.