Building and Owning Biotechnology Databases

A Workshop organised by The Biotechnology Information Strategic Forum, with support from DGXII of the Commission of the European Communities, and held at Purmerend, The Netherlands, 22-23 September 1998


The workshop - Building and Owning Biotechnology Databases was held at the Golden Tulip Hotel, Purmerend, The Netherlands on 22-23 September 1998. It was organised by The Biotechnology Information Strategic Forum, with support from DGXII of the Commission of the European Communities.


The ETI model re-visited -- Wouter Los & Peter H. Schalk, ETI Biodiversity Centre, Amsterdam

Background information on ETI

After the Expert centre for Taxonomic Identification (ETI) started in 1990 as a research project at the University of Amsterdam, it expanded as a service for scientists and various user’s groups of taxonomic information. Presently, ETI is organized in a separate legal body, and hosted by the university. As a not-for-profit non-governmental organisation - for and by taxonomists - it operates under the auspices of UNESCO. Its mission is to promote, preserve and disseminate taxonomic and biodiversity knowledge, and to stimulate international scientific and educational cooperation in this field. Main activities are the development of new information and communication technology tools for taxonomy and biodiversity data management, electronic publishing, international networking, and training.

The Linnaeus software of ETI is a package of software modules that supports scientists in building databases, identification systems, geographic information systems, and other applications. By distributing this software package for free to cooperating scientists, ETI promotes the development of comparable database and expert systems, but more important, it facilitates the amalgamation of the work from different scientists in integrated database and expert services. See for more information the proceedings of the previous Workshop of the Biotechnology Information Strategic Forum, or ETI’s homepage .

The job to make the world’s information about taxonomic biodiversity electronically available is enormous. ETI, along with its partners around the world, is helping to speed this up. With about 1.4 million species so far described in thousands of separate journals and books since about two centuries ago, there is a lot to catch up. A special problem in taxonomic databases is that species are not real entities, but conceptual descriptions of what we assume are discrete units in nature. For this reason taxonomic databases are not name lists, but well defined sets of interrelated species with a precise diagnosis of the identity of each species. It may include supplementary information on each species, such as its biology (reproduction or ecology), its distribution, the deposited specimens, or molecular information from specimens. On top of this, ETI offers the possibility to build identification systems, geographic information systems, and other applications.

The contents of the final databases, including additional expert systems, depend on the input of contributing scientists. However, the international cooperation, as promoted by ETI, results in value added products at shared, thus lower, costs. ETI invests in software development and in supporting activities to help scientists build their knowledge systems and to distribute these at low costs around the globe. UNESCO has been very supportive in establishing other self-supporting ETI-branches in the world. These branches assist the local scientists and can better meet the regional needs and priorities. The combined efforts of individual scientists, networks, institutional cooperators and regional branches, have resulted in the last five years to an impressive growth of harmonized database building activities, and of digital publications.

Developments

Building taxonomic databases does not pay! The scientists that build their information and expert systems with ETI do not regard their results as commercial products. ETI’s charter is based on a non-for-profit approach, which implies that costs are only recovered to validate, reproduce and distribute the digital information. As the market for users of the very specialized taxonomic databases is small and mostly not rich, it also make no sense to aim for other approaches. Nevertheless, the production and distribution of databases and electronic tools impose responsibility on the producer/distributor. ETI has become more aware of these responsibilities and has developed codes of practice, and invested in various activities to operationalise its responsibilities, and especially with respect to copyrights and ownerships, to the obligations as information and communication technology (ICT) developer, to R&D, and to support and training of cooperating scientists.

Copyrights and ownership

ETI, as an organisation for and by taxonomists, has the task to assist scientists in compiling their expertise and to share it with colleagues and user’s groups. The supporting ICT tool is the Linnaeus II software package. It is clear that different legal rights (and obligations) play a role in the attempt to make an integrated digital information system available. The categories are: copyrights on the original primary data; copyright and licences on the softwares used; the intellectual property rights of the author(s) that compiled and assessed the information system; and the ownership of (and legal responsibility for) the final product. ETI puts much efforts in regulating these issues.

The development of taxonomic information started two centuries ago and has been written up and stored in different legal systems. Thus the ownership and copyright protection of the underlying data are not uniform and not practised in the same way. In the range from simple databases of taxonomic names up to detailed biological data, images, sounds and video, one cannot expect that these are freely available for distribution as part of an compilation. It is not uncommon that scientists, using data and information from various sources, assume that these are in the public domain and not protected. ETI, when receiving the contributions from its partners, has then to cope with the possibility that numerous items may not be further distributed without permission. Although ETI asks its partners to act carefully when documenting the sources and permits, it is still necessary to check these. It is now practice to arrange for written permits or contracts for the legal right of each third party. This may add up to more than 200 documents for any one database production. Sometimes it is difficult to trace the legal owner. As a disclaimer, ETI may ask the data provider to state that he acts as the legal owner of the data.

The majority of data mostly originate from one or a few scientists who want to use the electronic medium to disseminate their intellectual work. ETI then has a policy to share the copyright so that the ETI gets the right to disseminate the information in digital form for a specific purpose. This has the advantage that the scientist keeps full control of his work, but that ETI also can execute its responsibility for the electronic product, both in investments and in follow-up activities.

The intellectual investment of the scientists, the ‘authors’ who compile and build the final product, is acknowledged on each separate product. Each product, presently on CD-ROM, has also an ISSN number, so that the intellectual work can be traced and cited. In this way, ETI treats electronic publications in the same way as printed publications, including procedures such as independent quality reviews. The ‘authors’ are finally responsible for the quality and keep the rights on their intellectual work.

All the database information and derived analysing tools (such as identification systems) are built with the use of the Linnaeus II software package. The algorithms and software applications in this package are developed in ETI, and ETI feels obliged to keep the package up to date and to guarantee its functionality in changing hardware and operating system environments. For this reason, ETI always clearly states its rights with respect to its developed software.

As mentioned above, the whole set of copyright contracts and acknowledgements is sometimes extensive, but essential when making a database public. It is especially complex when it concerns older data that are owned by private bodies or even public organisations that assume that these data may have some commercial value. The approach is different for new research data that are generated by active scientists; in most cases these follow the habits established in the younger discipline of molecular biology, where it is general practice to deposit primary data in public databases. This is essential for the success of biodiversity databases in the Internet.

ETI also publishes third party information systems, such as species check-lists and image libraries, as there is an increasing demand for a regular outlet for this kind of information. For these systems, legal matters are further complicated by licence agreements on software. In addition, this service for third parties requires the availability of technical support (help desk) since users must have a contact address to solve their (technical) questions. ETI’s facilities for electronic publishing are becoming increasingly popular.

Research and development

The enthusiasm in the world of taxonomic science for the new opportunities that emerge in relation to database development and electronic tools is great. ETI is very happy with the feed-back from the scientific community with respect to suggestions to implement new ways to manage and analyse digitised data. As a result ETI is more and more involved in cooperative projects to implement these suggestions in modified or new software modules. As a matter of fact, almost all new digital products contain bits and pieces of new developments that are regularly included in new releases of the Linnaeus II package. However, these research activities require a lot of effort and cost. As a not-for-profit organisation, ETI is very active in attracting research grants and subsidies, since the research costs cannot be recovered from distributing electronic products.

The same holds for development costs in relation to the updating of the Linnaeus II software package and derived electronic systems, and to keep these compatible with the latest hardware and operating systems. In addition, ETI has to find funds to guarantee technical support and training for scientists and users that work with the ETI systems. Only in the case of special training courses, ETI asks a modest contribution for only the organisational costs.

ETI at the cross-roads

As a professional scientific organisation, responsible for the continuity of its public services for the scientific community, ETI has recently had to face some strategic choices within the context of developments and realities of costs as mentioned above. An independent scientific review committee concluded that ETI had to consider its position at an important cross- roads: either becoming an independent publisher, or finding a way to ensure its position as a public scientific facility. In agreement with its founding fathers, especially UNESCO, ETI and its cooperating partners around the world, firmly re-confirmed the latter approach. However, in view of the budget perspectives, this implied the following strategic choices.

Firstly, all cooperating scientists are still free to use ETI software (including support) in their own research projects. As is current practice, they will be encouraged to deposit their accumulated data (assembled in the Linnaeus II software) in ETI’s World Biodiversity Database (WBD). ETI will explore new ways to improve the public availability of the data in this database. By the end of 1999 the WBD will hold approximately 160,000 species, which is about 10% of the currently known species.

Secondly, ETI will support the production and distribution of integrated information and expert systems, only if the marginal costs to build these can be recovered. Marginal costs include software licences, specific software adaptations, debugging, quality control and reviews, distribution and packaging. It does not include organisational, research, and general overhead costs.

Thirdly, ETI and its cooperating scientists and institutions will actively seek new research grants, subsidies, and other financial support to keep its R&D activities, its supporting and training efforts, and overhead costs at a level of high quality. This implies that ETI and all taxonomists continue to be ready to compete for funds from research councils and other bodies. On one hand, this is the best guarantee for quality, but it also forces governments and other funding bodies to consider their position and responsibility with respect to the public availability of basic biodiversity information.


Back to Workshop Contents Page