Biologists have sequenced the last percent of the human genome, which until now could not be fully sequenced.

The international T2T consortium announced the successful completion of the work. Twenty years after sequencing the human genome, scientists have figured out the remaining portions – about eight percent – that were the most difficult to sequence. This is reported in a press release from the U.S. National Institutes of Health (NIH); an article about the work was published in the journal Science.

The human genome was almost completely defined back in 2003, as part of a global study of the Human Genome Project. For that time, the task was enormous, and hundreds of researchers from different universities and countries participated in its solution. They had successfully sequenced about 92 percent of DNA – the genes and sections between them that make up euchromatin. In cells, euchromatin is active, so it remains “unraveled” and twists into a more compact form only to divide.

In contrast, heterochromatin permanently maintains its compact form and does not encode proteins. It performs primarily auxiliary functions, maintaining the structure and integrity of chromosomes, ensuring their interaction with proteins, and the like. Heterochromatin is located, for example, in centromeres – the sites where a pair of sister chromosomes join together to form a recognizable “X” – and in telomeres, the end sites of chromosomes. This DNA is characterized by the presence of long repetitive sequences, the identification of which is a great challenge.

Recall that to sequence a strand of DNA, you have to cut it into many fragments, then determine the nucleotide sequence of each fragment and, finally, combine the resulting codes in the correct original order. But if the code is hundreds of short repeats that are indistinguishable from one another, such work becomes virtually impossible. That is why the Human Genome Project participants had to skip this small part of the genome, for the benefit of science and medicine it plays far from the main role.

However, a complete understanding of the structure of the genome requires at least a complete sequence, and over time, sequencing technologies have made great strides forward. Therefore, a new consortium Telomere to Telomere (T2T) began its work a few years ago with the goal of understanding heterochromatin regions. In 2021, its participants presented a “rough” result, and now – the final, covering the missing eight percent of the genome.

To do this, the biologists had to go for a little trickery, using DNA from a cell line with a hereditary disorder for sequencing, as a result of which they carry two identical copies of each chromosome (instead of one maternal and one paternal). Therefore, the T2T consortium has not completed its work: at least its members have yet to sequence the heterochromatin on the unpaired Y chromosome.

The importance of this work should not be underestimated. In the past, many sections of heterochromatin were indeed considered “junk” DNA, accumulated over billions of years of evolution and playing no role in the life of the human body. Today, scientists understand that these fragments have important functions, not only structural, but also, for example, regulatory, controlling the activity of euchromatin genes. Many severe diseases are associated with heterochromatin malfunction.