All Blog Posts
January 23, 2018

Preclinical Drug Discovery: Too Much Data?

Posted By
Aureus Tech Systems

Preclinical drug discovery has been, up until the last few years, a field in which a step-by-step approach to answering questions involved careful thought, experiment, replication, and publishing. This process brought new compounds, reactions, and potential novel therapeutics to clinical researchers. The work was careful and detailed, done by hand, replicated by peers to confirm. As in many fields, expertise and mastery came about over time, with a combination of continuous education and experience until intuitive understanding of tools, methods, materials, and processes graced the work of the masters in the field.

The Changing World of Preclinical Research 

That world of pipettes, reagents, and microscopes has changed dramatically. Tools, methods, materials, and processes have all been automated using high throughput screening and data banks of materials. Modern preclinical drug discovery now involves researchers who have to find the important data from the floods of available data, and the key piece of information in a sea of information. They need to understand and employ complex advanced statistics to narrow focus and determine validity and reliability of data points. Their work is, for many, long hours sitting in front of a computer. For those who miss their pipettes very much, they are probably learning to knit while the data rolls by on their screens. Humans still need to use their hands.

The industry has transformed and is nearly unrecognizable. What are the human impacts of this massive digital transformation? We now have the potential to rapidly identify and screen for novel therapeutics, and move the drug discovery and clinical research process from decades to years. GSK has been working on their malaria vaccine for thirty years, at a cost of $365 million. When they began that development process, these new technologies, the methods and materials, were not available. The vaccine will be piloted in Africa in 2018.

We have the potential to mesh the science of genomes and genetic research with biochemistry in a way that was unavailable even several years ago, through open-source sharing of materials and results through large data bases and high throughput screening technologies. University systems and the federal government, through NIH, have begun to make the technologies and the data bases available to researchers without the unwieldy process of negotiating intellectual property rights through every step. This open-source sharing of both the robotics and laboratory tech, and the increasingly complex data bases of biochemical genomics, means the job of researchers has moved from discovery of data points in the lab through experimentation to finding the key pieces of data that have already been identified in the haystack of data. Haystacks, many of them. Haystacks that could reach to the moon. 

High Throughput Screening

How is a protein, a cell, or a biomolecular pathway modulated or impacted, and by what? A chemical or a biological agent, heat or cold, genetic changes, such as the genetic fusion that triggers tumor growth? Some combination of these, in a certain order? High throughput screening has the potential to unlock these complex relationships and interactions through speed and automatic data imaging and collection. 

A microliter plate is a plastic plate with wells, and each well contains something to be tested. A plate can contain up to 3456 wells. The robotics tech adds the reacting agent in amounts, and at a speed, that are not possible with humans. After preparing the  plate with these reagents, over three thousand individual experiments, the plate is imaged and read for action or change. These active wells, or individual hits, are then moved to a new plate for additional experiments. 

The automated analysis of the plates is programmed by the researcher, and can include multiple types of imaging and data analysis. The HTS systems can run thousands of these plates through preparation, adding reagents, incubating, screening, imaging, and analyzing results. The job of the researcher, then, is to design the original experiment, and then deal with the floods of data coming out of the high throughput screening. 

Dealing with Big Data

It is a nightmare idea for clinical researchers that the solution has been identified, but is sitting among huge piles of data, waiting to be recognized. How do we approach the reams of data being produced by the high throughput screening for genetic, biological, or chemical agents that modulate even single proteins or pathways? What the robotics and this astounding new tech can do is give us the data. Then we have to ask specific questions. This is the new intuitive expertise that clinical researchers who are experts in their fields bring to the table. 

Big Data and Machine Learning are algorithm and neural network-based computer systems that take large amounts of disparate data, and find connections and patterns. Based on these complex patterns, they can, if asked, make predictions about future behavior. What they cannot do is think for us, or figure out what we really need to know. We have to ask the right questions, in the right order, to reach an understanding of the potential impact of a biological or biochemical research result. 

The challenge for scientists is our way of problem solving. We have traditionally asked questions that we have the potential to answer. We ask questions that we can break down into pieces; questions that are linear, and can lead from point A through to point Z in a step-by-step process. It is not our standard practice to ask questions that seem too big for us, or that are too complex to be broken down into parts. 

But the huge amount of data we have been presented with, through this new technology and practice in preclinical drug discovery, gives us a challenge that we need to address by a new way of asking questions. Our questions need to be algorithms, given to neural networks to answer, and they need to be written very specifically, and in exacting detail. The machines only answer what we ask. If the questions contain bias or inconsistency, so will the answers. 

Our questions need to be ones that we cannot answer ourselves. This is an exciting and fearful new way of approaching science! This feels for many like closing our eyes and taking a step into the unknown, and hoping we will find a road or a path or even just a tightrope when we put our foot down.