Building and Owning Biotechnology Databases

A Workshop organised by The Biotechnology Information Strategic Forum, with support from DGXII of the Commission of the European Communities, and held at Purmerend, The Netherlands, 22-23 September 1998


The workshop - Building and Owning Biotechnology Databases was held at the Golden Tulip Hotel, Purmerend, The Netherlands on 22-23 September 1998. It was organised by The Biotechnology Information Strategic Forum, with support from DGXII of the Commission of the European Communities.


Potential Impacts on Research from Proposed U.S. Database IPR Legislation -- Paul F. Uhlir, National Research Council, Washington, DC. USA.

The views expressed here are my own and do not necessarily reflect those of the National Academy of Sciences or the National Research Council.

The success of our basic research and education system is predicated on the unfettered access and use of factual information; on a robust public domain for data; and on easy re-use, recompilation, and value adding applications of data. It is important to emphasize that practically all databases created in the pursuit of basic research and education are done for incentives other than economic¾ the creation of knowledge, the thrill of discovery, and the enhancement of professional status. The proposed legislation, however, places an overriding emphasis on protecting original investments and on enhancing purportedly necessary additional economic incentives to create new databases. At the same time, it undervalues the adverse effects on scientific and technical progress, as well as the more general economic and social costs inherent in restricting and discouraging the downstream applications and transformative uses of noncopyrightable databases. Specifically, H.R. 2281 currently includes the following problematic provisions, which mirrors many of the problems we identified in the Bits of Power report concerning the EU Directive on Database Protection:

So what does this mean in practical terms for the research and education community? This is not an easy question to answer definitively, since we do not have any actual experience with this law. What is certain, however, is that what may appear to be small changes to the law portend very negative impacts on our nation’s research enterprise and for public interest uses if data generally.

First, a number of serious constitutional defects with the proposed law have been pointed out not only by independent legal scholars, but by the Department of Justice. There are several provisions in the law that run afoul of the 1st Amendment and the Copyright Clause of the US Constitution, including the potentially unlimited duration of protection; the broad prohibition against the use of factual information by all data consumers, not just direct competitors; and a complete lack of public interest constraints on licensing, even on sole-source providers of data. Those of you who are interested in obtaining a better understanding of the important constitutional issues should read the Justice Department’s memorandum or the letter by the noted constitutional and intellectual property scholar, Prof. Marci Hamilton, of the Cardozo School of Law, that she sent to the Senate Judiciary Committee in early September (www.marcihamilton.com).

Second, the lack of any restraints on licensing, especially on sole-source data providers, presents the very real possibility of the abuse of market power in the distribution of factual information. In particular, we are concerned about the ability of data providers to override by contract even the limited exceptions that the current version of the bill grants to public interest users, including science and education. The unfettered freedom of contract, especially by mini monopoly data providers, without a concomitant duty to deal fairly and equitably with public-interest users, can lead to high prices for data and harsh and oppressive terms of both access and subsequent uses of data. This will especially disadvantage academic researchers who contribute so much to technological innovation and growth. In addition, there likely will be increased transactional and administrative costs associated with enforcing the different legal restrictions on newly obtained data, instituting new administrative guidelines regulating institutional acquisitions and uses of such data, and associated legal fees. Because universities and government agencies are inherently conservative, risk-averse institutions, they will err on the side of caution and place additional limits on what researchers and educators can do in acquiring and using data in order to avoid the possibility of costly litigation.

Third, related to the above point, the proposed exception for scientific, educational, and research users is really not much of an exception at all. It would hold this class of users liable for infringement if they use a substantial part of the database to harm the data provider’s market. Under H.R. 2281, a substantial part of a database may be measured quantitatively or qualitatively, depending on the economic value of that part to the owner. Thus the user cannot ever know with any certainty how valuable the part being used is to the owner, and the owner will always seek to limit uncompensated uses as much as possible. Further, the actionable harm to the data provider’s market can be as little as one offset sale. This threshold for potential liability of noncommercial, privileged users is simply too low, and it is our position that there is a valid, public policy need to immunize most nonprofit uses of legally accessed data from liability except when those users engage in unfair or market- destroying conduct.

The proponents of the legislation say that nothing prevents a user or competitor from independently creating an equivalent database. But many databases cannot simply be recreated from scratch. Data that are time- sensitive, unique, very old, or prohibitively expensive cannot be recompiled independently. In research, this includes virtually all observational datasets of transient natural phenomena, as well as data from very costly or labor-intensive experiments. Further, a basic underlying principle in research and education is that the creation of new knowledge should build on the base of existing data and information, and not be forced to duplicate previous factual compilations or discoveries in socially and economically inefficient ways. Protection of investments in factual databases is not the only interest that the law should seek to protect.

Fourth, the fact that the term of protection lasts 15 years, and potentially much longer in the case of dynamic on-line databases which are continuously updated, is especially troubling because of the very long lag time in the evolving public domain. Such long delays in unfettered access to data will undermine the value of many data sets for most fields of research, including for government policy development, and in other cases effectively remove them from comparative analysis with other, openly available, concurrent data sets. The 15-year period not only would grossly retard scientific and technical progress, but has no apparent justification in the rapidly moving commercial database industry either, where economic exploitation of most data products is typically measured in months and years, and even minutes and hours, rather than decades. The 15-year period appears to be completely arbitrary and has not been seriously compared with other, potentially shorter, periods of protection. The proposed legislation thus defeats a primary constitutionally mandated purpose of intellectual property laws, which is to establish a public domain that "promotes science and the useful arts," from which researchers, educators, and other downstream users can build on previous contributions to further knowledge.

Fifth, we are concerned about the incomplete exception from this law for government data and especially data created with government funding. This is one issue at least on which there appears to be unanimous agreement that government data and databases should be excluded from the protection of the proposed legislation. The federal basic research budget alone is estimated to be over $40 billion/year, of which a sizeable fraction is devoted to the creation, maintenance, dissemination, and analysis of scientific and technical data. Of course, the government at all levels produces data of other importance to our nation, including economic growth, public health and safety, regulatory requirements, cultural affairs, and many other functions. Therefore, not only all the organizations that have participated in the shaping of this legislation, but all citizens, have an interest in preserving full and open access to all government data that are not otherwise restricted by national security, privacy, or other legitimate limitations.

As currently written, however, the government exception can be circumvented in several ways. One is through contracts and grants in which the contractors or grantees are not expressly required either to provide their data back to the government for public dissemination, or to make the data publicly available themselves under appropriate terms and conditions. Absent such universal vigilance by the government, a lot of data produced as a direct result of public funding could end up under proprietary control of researchers or their institutions. Because a majority of noncopyrightable databases that are created with government funding in the United States are actually produced by non-government employees, whether in academia or industry, the failure of government agencies to enforce this exemption could have a far-reaching impact on full and open availability of publicly funded data. In fact, our government agencies may have no incentive to enforce and could view this as an opportunity for some cost recovery like their European counterparts. And as more university research is funded by private sources, the resulting data also will likely be removed from the public domain as income-producing products.

There also is the problem of other legislation that, combined with increased protection, can severely limit full and open access to our government data. For example, the proposed Commercial Space Act of 1997 encourages NASA to purchase space and earth science data collection and dissemination services from the private sector and to treat data as commercial commodities under federal procurement regulations. When coupled with strong protectionistic measures such as those contemplated by H.R. 2281, we could soon witness the passing of substantial amounts of data from the public domain of entire federal agencies.

It also remains unclear if the government concludes an arrangement with a private sector party to disseminate public data or information whether there will be adequate safeguards that either promote competition, or that require low-cost access for public-interest users.

Another major problem is raised in the instance of databases created from multiple public and private sources, or at data centers or large-scale data management activities involved in administering the dissemination of multiple-source data. The Administration believes that the transaction costs associated with such activities would increase. Moreover, as I have already mentioned, because many data providers are sole-source and the legislation greatly strengthens the legal and economic protections of these mini-monopolies, we also are concerned that the overall impact of the proposed legislation would be to raise the costs of data acquisitions to researchers and educators, not to mention other consumers. Those costs would then either be passed on the government and the taxpayer through increased research contract and grant requests, or diminish the resources available to the researcher and educator. More generally, the costs and restrictions on all downstream or transformative data users--whether in the public or private sector--would increase, thereby creating disincentives for socially and economically beneficial exploitation of factual data that have up to now been in the public domain.

In light of the very large, indeed most significant, government interests in the production, maintenance, dissemination, and use of data, we have urged the Administration to provide both a detailed analysis and specific legislative language that fully address all these, and perhaps other, issues, and that the Senate take no action on this legislation until this has been done.

Sixth, the proposed database law would severely discourage the re-use, recompilation, and value adding uses of data. Anytime someone uses data in a "collection of information" protected by the proposed law, that user becomes exposed to claims that he or she will have harmed the database originator’s actual or potential markets. Moreover, the database originator has no obligation to license value-adding or transformative uses, and if the originator is a sole-source provider (as frequently occurs, especially in specialized S&T niche markets), there is no incentive to bargain. As a practical matter, this means that once public-domain data are collected and used for one purpose, such as to prepare a compilation of poisons and antidotes, there will be a strong disincentive to use the same data for other purposes lest those uses violate the "harm to other markets" principle. By the same token, database recompilers or value adders incur the risk of lawsuits for infringement every time their new database resembles some pre-existing database, whether those data were used or not. The exception that permits anyone to make use of "insubstantial parts" of a collection of information is vitiated by the language inflicting liability for harm to the investor’s "actual or potential market." Because the user cannot know such matters in advance, the "potential harm" test emasculates the "insubstantial parts" exception in practice.

Finally, we believe that over time, scientific cooperation will be hurt. As scientists and their employing institutions become more accustomed to a new legal regime that encourages the commercial exploitation of their own research data sets, the cooperative culture that has become the hallmark of so many fields of science will be undermined. Universities have already indicated that they will want to commercially exploit databases because they obtained an exemption for state universities from the government data exception in the proposed legislation. If scientific institutions in one segment of the research community begin to try to commercially exploit their colleagues in other institutions or countries, the reaction will be either to emulate such behavior, or cut off cooperation. Either way, science will suffer. Furthermore, even if scientific data exchanges in established cooperative research programmes are allowed to continue among a select group of principal investigators and an approved class of associated researchers, it will become increasingly likely that other researchers outside the officially sanctioned group will be refused full and open access to the programme data. This, of course, would discourage interdisciplinary research and applications, contrary to the interests of technological innovation and the advancement of knowledge.

And if the simple exchanges of data and access to individual databases will become legally threatening or prohibitively expensive, imagine the potential transactional burdens that this could impose on data compilers or users who want to integrate data from multiple, or even hundreds, of different sources. This brings us to what may well be the most profound - and insidious - impacts of this proposed legal regime: the lost opportunity costs that will be repeated thousands of times each day across the basic research and education communities. For if scientists and engineers are faced with the choice of spending a lot of administrative time and a larger percentage of their valuable research grants for acquiring data as opposed to doing some other kind of work, they increasingly will opt to do something else. Either way, it will impede the use, reuse, and transformation of factual data, diminishing the supply of new information and knowledge needed for developing downstream applications and innovations in the larger economy. Throttling data at the source will not benefit anyone, with the possible exception of a new class of data monopolists.

It is for these reasons that the Academies have been actively engaged in the current legislative debate. While we support new legislation that would be specifically targetted at prohibiting commercial data piracy, based on true unfair competition law, and that would adopt a measured approach that balances the interests of data producers and disseminators on the one hand, and all downstream data users on the other, we oppose unwarranted attempts to create an exclusive property right in the building blocks of knowledge.
 
 

While on a lighter note:
 


An Internet Tale

(set sometime in the not too distant future)


 


The sound of loud footsteps stopped as my doorbell rang. "Mr. Uhlir?," the man at the door demanded as I let him in. "I am with the new Intellectual Property Police, and I have a most serious matter to discuss with you."

"The Intellectual Property Police?," I said with some surprise.

"Our records show that you recently purchased the Standard Comprehensive Prestigious English Language Collegiate Dictionary, First Golden Deluxe Electronic Database Edition, and that you read and signed the standard 21-page licensing agreement. Isn’t that so, Mr. Uhlir?"

"Yes, no, I uh don’t remember," I said meekly.

"Well, we have your digital signature in our double-encrypted, incorruptible, and 100% accurate electronic files, so any denials on your part are not only futile, but can and will be used against you in a court of law. Do you understand?"

"I guess so," I croaked.

"Furthermore," he continued, "our completely reliable and unassailable digital records indicate that on or about 8:31 a.m. and 45.59.2 seconds on October 31, you sent your 96-year old grandmother, Granny Uhlir, an electronic message that substantially infringed on the important rights of the owner of the Standard Comprehensive Prestigious English Language Collegiate Dictionary, First Golden Deluxe Electronic Database Edition and caused him grievous and irreparable harm, to wit, at least one potentially lost sale of his valuable database. And, I must point out, that doesn’t even take into consideration what further damages Granny Uhlir may have caused, which, you may be assured, we are still investigating."

"B-b-b-but," I stammered.

"Stop right there," the IPP man shouted, "before you get in more hot water than you already are! We consider the word ‘but’ an insubstantial part of the protected Dictionary Database, which you are generously allowed to use under the owner’s adhesion contract. You may recall that the Dictionary Database defines that as a contract that the owner can make stick. Add another ‘t,’ though, and you’ll be getting mighty close to more trouble."

"But," I blurted out in disbelief, "how can you do this?"

"Read the fine print, stupid," he continued. "You are prohibited from making any unauthorized extraction, use, reuse, transformation, republication, dissemination, or any other material distribution to any other person, company, organization, or legal entity including, but not limited to, your neighbor’s dog, of a qualitatively or quantitatively substantial part--as defined by the owner--of this valuable database in any manner that causes him any harm. The owner of the Dictionary Database is also allowed to monitor your use of this product in the digital environment in order to detect any such harmful infringements."

"But what did I do?," I asked indignantly.

"Your e-mail to Granny Uhlir used 403 words and 2 abbreviations, all found in the Standard Comprehensive Prestigious English Language Collegiate Dictionary, First Golden Deluxe Electronic Database Edition, of which at least five were complex, compound words that your Granny would no doubt have needed the Dictionary Database to understand. Moreover, in anticipation of this you actually explained what those words meant, using substantial portions of the definitions given in the Dictionary Database in direct violation of your license to use these words, which, I might add, were compiled with a lot of sweat. Some of these words are heavy, you know, and carry a lot of meaning.

"Although we did note one discrepancy in your definition of the word ‘constitutional,’ which does not appear in the Dictionary Database, what really nails this case down is that you used one of the intentionally misspelled words exactly as the owner misspelled it to catch felonious thieves like you."

I started to protest, but clammed up, afraid that I would only get into more trouble with the IPP. A sudden chill swept over me.

"That’s right," he said triumphantly, "it’s best to keep quiet. You can take it as a stern warning this time. Next time you cost the owner of the Dictionary Database another valuable sale to a good potential customer like Granny Uhlir, though, and we’ll have you thrown in the slammer. And don’t think we can’t do it! If you want to stay out of trouble, make up your own words and definitions."

"Oh, and just a bit of friendly advice," he added as he was leaving. "Tell your Granny to stop sending those recipies to her friends, if she knows what’s good for her. They look just like the recipies in the Standard Comprehensive All-Purpose Gourmet Cookbook, First Golden Deluxe Electronic Database Edition."


Back to Workshop Contents Page