Salvage prospect for 'junk' DNA
By Paul Rincon
BBC News science reporter

A mathematical analysis of the human genome suggests that so-called "junk DNA" might not be so useless after all.

The term junk DNA refers to those portions of the genome which appear to have no specific purpose.

But a team from IBM has identified patterns, or "motifs", that were found both in the junk areas of the genome and those which coded for proteins.

The presence of the motifs in junk DNA suggests these portions of the genome may have an important functional role.

These regions may indeed contain structure that we haven't seen before
Dr Isidore Rigoutsos, IBM
The findings are reported in Proceedings of the National Academy of Sciences journal.
But they will have to be verified by experimenters in the lab, the scientists behind the work point out.
Dr Andrew McCallion, who was not an author on the new paper, commented: "Up until not so long ago, we were under the impression that the vast majority of information in the genome, if not all of it, was encoded in those stretches of DNA that encoded proteins.
"We now understand there is much more complexity involved," Dr McCallion, from the McKusick-Nathans Institute of Genetic Medicine at the Johns Hopkins University School of Medicine in Baltimore, US, told the BBC News website.
Lead author Isidore Rigoutsos and colleagues from IBM's Thomas J Watson Research Center used a mathematical tool known as pattern discovery to tease out patterns in the genome.
This technique is often used to mine useful information from very large repositories of data in the worlds of business and science.
Scrapheap challenge
They sifted through the approximate total of six billion letters in the non-coding regions of the human genome and looked for repeating sequence fragments, or motifs.
"One of the things that arises from this paper is that junk DNA may not be junk. But this needs to be verified," Dr Rigoutsos told the BBC News website.
The double-stranded DNA molecule is held together by chemical components called bases
Adenine (A) bonds with thymine (T); cytosine (C) bonds with guanine (G)
These "letters" form the "code of life". There are estimated to be about 2.9 billion base-pairs in the human genome wound into 24 distinct bundles, or chromosomes
Written in the DNA are 20-25,000 genes, which human cells use as starting templates to make proteins. These sophisticated molecules build and maintain our bodies
The researchers found millions of the motifs in non-coding DNA. But roughly 128,000 of these also occurred in the coding region of the genome. These were also over-represented in genes which are involved in specific biological processes.
These processes include the regulation of transcription - the beginning of the process that ultimately leads to the translation of the genetic code into a peptide or protein - and communication between cells.
Dr Rigoutsos said his team's work suggested, "a connection between a vast area of the genome we didn't think was functional with the part of the genome we knew was functional.
"The average lab does not have the resources to prove or disprove this, so it will need a lot of effort by lots of people," he explained.
Gene silencing
The paper in PNAS suggests that the actual positioning of the motifs is associated with small RNA molecules that are involved with a process called post-transcriptional gene silencing (PTGS).
"A human embryo starts out as a single fertilised cell and rapidly divides into a widely complex series of cells that become a human being," explained Dr McCallion.
"Every cell in that human being contains the same complement of genes and what makes each cell different is the precise way that genes are turned on and turned off."
PTGS turns genes off after the process of transcription has taken place. One way in which this occurs is through "RNA interference", which involves the introduction of double-stranded RNA molecules.
These trigger the degradation of another type of RNA molecule known as messenger RNA (mRNA), "down-regulating" the gene. During transcription, this molecule encodes and carries information from genes to sites of protein synthesis.
"These regions may indeed contain structure that we haven't seen before," said Dr Rigoutsos.
"If indeed one of them corresponds to an active element that is involved in some kind of process, then the extent of cell process regulation that actually takes place is way beyond anything we have seen in the last decade."