Sequencing the white shark genome is cool, but for bigger insights we need libraries of genetic data

The headlines are eye-catching: Scientists have sequenced the genome of white sharks. Or the bamboo lemur, or the golden eagle. But why spend so much time and money figuring out the DNA makeup of different species?

Rough screening of whole genomes is useful to help identify genetic markers (sequences of genes) to better understand population-level processes. But the real and enduring value of whole genome sequencing is only realized when a lot of accurate, high-resolution genomes are amassed that can be compared with one another. This type of work is just getting started.

I am an evolutionary biologist at the Florida Program for Shark Research. Our research focuses on understanding how modern sharks and rays diversified over the course of their evolution to colonize the habitats they occupy today.

Blueprints without instructions

An organism’s genome – the complete catalog of its DNA – holds the blueprint for its design. Differences in the DNA sequences that make up genomes are responsible for the differences we see among individuals.

Identical twins are physically similar to one another because their genomes are identical. Siblings resemble each other because they inherit large stretches of their genomes from the same set of parents. And closely related species look more similar to each other than do those that are more distantly related, because their underlying genomes are more similar.

It follows that if we had a complete genome sequence for an organism, we would have all the information we’d need to understand how it works “from the ground up.” Indeed, this was the justification for the initial Human Genome Project

But an organism’s genomic DNA sequence can contain billions of nucleotides, or genetic building blocks. Trying to piece together what that organism might look like from its genome sequence would be like trying to make sense of thousands of concurrently transmitted telephone conversations from the “packets” of information that arrive at the receiving end of a fiber-optic telephone cable, without knowing anything about how the information was organized. The data is “all there,” but it’s hard to know what it means without an explicit interpreter. And scientists do not yet know how all of the information in genomes is organized, or how its activity is choreographed.

Bases are the part of DNA that stores information and gives DNA the ability to encode phenotype – a person’s visible traits. There are four types of bases in DNA: adenine (A), cytosine (C), guanine (G) and thymine (T). National Human Genome Research Institute, CC BY-ND

Learning by comparing

If it’s so hard to interpret information buried in genomes, why bother collecting the data? The answer is that if we compare genomes against one another, we can deduce which elements are responsible for particular traits.

For example, humans and chimpanzees have genomes that are approximately 98 percent similar. This means that the 2 percent difference between their respective genomes must somehow account for the differences in their appearance and associated traits. Comparing the genomes side by side allows us to identify the parts of the genome responsible for the observed differences.

Obviously, it is important to choose carefully which comparisons to make. Comparing a human genome with a duck-billed platypus genome isn’t going to tell us much about what makes humans – or duck-billed platypuses, for that matter – so “special.” The two species diverged about 150 million years ago, and there are so many differences in their genomes and in the traits they exhibit that it would be impossible to know which genomic differences were responsible for which traits.