Collecting and Storing Data
The reference map and sequence generated by genome research will be used as a primary
information source for human biology and medicine far into the future. The vast amount
of data produced will first need to be collected, stored, and distributed. If compiled
in books, the data would fill an estimated 200 volumes the size of a Manhattan telephone
book (at 1000 pages each), and reading it would require 26 years working around the clock
(Fig. 14: Magnitude of Genome Data).
Because handling this amount of data will require extensive use of computers, database development will be a major focus of the Human Genome Project. The present challenge is to improve database design, software for database access and manipulation, and data-
entry procedures to compensate for the varied computer procedures and systems used in different laboratories. Databases need to be designed that will accurately represent map information (linkage, STSs, physical location, disease loci) and sequences (genomic, cDNAs, proteins) and link them to each other and to bibliographic text databases of the scientific and medical literature.