Just few days ago I met with a professor in Milan who told me about his work on creating a language to access genomes to study, through data mining, their meaning. This is becoming possibile because more and more genomes are being sequentialised (the Broad Institute in Cambridge, Massachusetts, has stated that in October 2014 they have sequenced the equivalent of one genome every 32 minutes which translated into 200 TB of data).
A single genome does not require that much storage space, about 1 GB by compressing the information. It is only when you need to study it that it grows in size reaching 100GB. Of course when you are interested in finding differences and similarities to extract meaning you are talking about millions of genome and then the storage size (as well as the processing capacity) skyrockets.
Here comes Google Genomics, a cloud service that for as little as 25$ a year would let me store my genome (unfortunately I do not have my genome sequenced yet, but in a few years why not?) forever (that is as long as I pay the yearly fee but I guess this fee may come down and eventually might be 0).
Google provides researchers with an immense storage and processing capacity. Finding a connection among genomes is a processing intensive task that would require a very good computer about ten minutes. By having the processing done by Google engines you have 10,000 cores doing the search in parallel so that a new connection is found in about two seconds (look at the video).
Google Genomics started just 18 months ago, in 2013, and provides researchers with API to access data and perform searches as Google searches the web every day.
The US National Cancer Institute is planning to move its 2.6 PB of data containing the genomes of thousands of cancer patients into the cloud, using the ones of Google and Amazon at a cost of about 19M$.
Moving the genomes into the Cloud provides opportunities to start ups to create special browsers that help researchers to search and analyse genomes. Companies like Tute Genomics, Seven Bridges and NextCode Health are taking for good the message of Google: you can build your company in our Cloud!
Now, I really think this is amazing: information begets information, and a lot of information creates business opportunities.
Everything is fine then. Well, here I start to have some concerns... It will become so easy to access the code of "lives" (yes, in the plural) and so easy to manipulate it that as we are today confronted with million of apps tomorrow, may be in two decades, we might find ourselves confronted with a variety of life forms that will be artificially created (not taking into account someone cloning a person that has already gone in terms of atoms). The code of life is, in a way, like a programming code, but differently from a programming code, just by looking at it you cannot really foresee what is going to happen once it starts processing, what its phenotype will look like.
And this might lead to problems we have never seen (nor imagined) before.