February 23, 2016
It’s been less than 150 years since Camillo Golgi invented a staining technique that allowed scientists to view neurons using a light microscope, and technological advances in neuroscience have advanced at a rapid pace ever since.
Innovative methodologies, technologies, and tools have given researchers in the field an extraordinary opportunity to collect both novel types and greater amounts of data, more so than ever before. However, the complexity and sheer size of today’s datasets are creating a bottleneck for neuroscience’s analytical and interpretative capabilities.
Consider this: In 2007, the basic model of the first-generation iPhone had 4 gigabytes of memory, and people in many cities waited for days outside the Apple store to buy such an innovative device. In 2015, a neuroscientist can generate 8 terabytes of data ― enough to fill approximately 2,000 first-gen iPhones ― in just one hour of measuring neural activity in a small fish brain.
If a single experiment involving a small brain requires that much storage in one hour, what does a similarly data-intensive experiment in the much larger human brain look like? What about six months of data? Or the amount of data collected in five years (the average length of a National Institutes of Health [NIH] research grant)? How will neuroscientists store all of that data? More importantly, how will neuroscientists analyze it comprehensively?
The experiment with the small fish brain illustrates an issue that many individual labs are, or will soon be, facing. The sheer amount of data is rapidly outpacing the capacity and capabilities of traditional computational tools available to most labs.
This neuroscience data problem extends to the institutional level as well. The NIH’s Human Connectome Project, which seeks to chart human brain circuitry, expects to eventually generate approximately one petabyte of data (approximately 250,000 first-gen iPhones), and the Brain Research through Advancing Innovative Neurotechnologies (BRAIN) Initiative, a multi-institution project on brain function, will likely need yottabytes of storage.
According to The Atlantic, a yottabyte of data, the largest unit of data measurement we currently use, could fill the states of both Delaware and Rhode Island with a million datacenters (or the equivalent of 250 trillion first-gen iPhones). The amount of data produced, and more crucially the computing power needed to analyze it, will only continue to grow.
In short, neuroscience is facing historic data challenges. Researchers will need cutting-edge data management and analytics approaches to fully capitalize on the vast collection of data available.
Booz Allen’s neuroscientists possess comprehensive experience across numerous neuroscience disciplines. In collaboration with our data scientists, we are working to integrate big data management with advanced analytics to uncover powerful, data-driven insights from these large and multifaceted data sets. For example, Booz Allen has recently applied this advanced, integrated approach to data analysis to genomics, another field facing similar challenges to neuroscience.
We developed a novel computational platform based on open source tools from Hadoop, a distributed storage framework for cluster computing, and, using this, have been able to significantly decrease import, query, and retrieval time relative to traditional computational tools. With this system, we can upload approximately 2,500 genomes in a day, with an estimated query time of less than an hour -- even with one million human genomes in the database.
Cluster computing solutions through which multiple computers execute a single job, like Hadoop and Spark, are the future of science research. When combined with a cloud service, these platforms will be able to scale from the state of data today to the state of data in the future—from terabytes to petabytes and beyond.
Understanding the human mind remains one of the biggest research challenges—and one of the greatest opportunities—facing the global biomedical community today. Our neuroscientists continue to forge ahead, applying a unique combination of advanced analytic capabilities with deep scientific expertise that will allow the development of flexible, custom solutions to address long standing research needs.
Technology and data continue to make staggering leaps, and capitalizing on the ability of the neuroscience research community to integrate and adjust to these rapid advances will be fundamental to the field’s success.