Making sense out of the genome sequencing

When reading this news I was reminded of an old wisdom thought saying: "be careful on what you ask because you might get it".

Indeed, we have been looking for faster and cheaper ways to sequentialise the genome, and now we are getting real close to be able to the affordability of genome sequencing for a growing part of the population. Along with that we are flooded with data (the codons representation A-C-G-T) and we have the problem of what to do with that.

Extracting meaning from the genome sequence is tough. Hence the shift of focus, from the sequencing to the analyses of the sequences. And here is where the news from the Nationwide Children's Hospital comes up.

Investigators at NCH have developed a method (with related algorithms and software), called Churchill, that allows the extraction of (few) meaningful information from a sequenced genome in a matter of hours, rather than weeks. It is a bit like searching a needle in the haystack. Only, in this case, you don't know that you are searching for a needle, you are searching for something you don't know what it might be.

In case you have doubt, this is not about medicine (although medicine/doctors are the clients) but about ICT. The problem is to find a way for parallel processing of data so that the "discovery" time can be slashed down to few hours, rather than the several weeks it takes today.

This is what the team at NCH has managed to do. Come up with a parallel computation algorithm that can scale, reaching an amazing sensitivity of 99.7% and an accuracy of 99.99%.  They have tested their method on the 1000 genome project, using the Amazon Cloud services (AWS) and managing to analyse 1088 genomes samples in just seven days identifying millions of significative variants. Impressive.

And a further proof (if ever one was needed) that ICT is pervasive, so pervasive indeed to go inside the nuclei of our cells to unravel the code of life.

