Effective Analytics Model For Pharma R&D

Omar Gadir, Ph. D., Founder/CEO, Iteru Systems

adjusted-analytics-2
Some phama companies started investing in data analytics trying to overcome their R&D crisis. Like in other industries, analytics starts and ends with data. Before deciding how to implement analytics it is important to examine biomedical data to find out the analytics model that provides the best results.

In this article “Subject Matter Expert- Drug Discovery” is referred to as researcher, and “analytics” refers to an assortment of technologies including data mining, classification, statistical analysis, machine learning and AI.

Biomedical Data
Biomedical data is very complex. Extracting insights from it is a challenging task for the following reasons:

  1. Many processes in the life sciences are extremely entangled, with multiple variables operating at the same time to produce different, unintended or unknown results.complexity-a
  2. Multiple data sources (publications, internal, genomic, patents, etc.)
  3. Multiple formats (PDF, images, custom formats, industry formats, etc.)
  4. Ambiguities regarding gene/protein names. For instance, the breast cancer protein BRCA2 has 12 aliases. What adds to the complexity is that a single protein can have hundreds different interactions with other proteins.
  5. There is lots of noise and errors.

Extracting Information from data
The current approaches for extracting insights from biomedical data are inadequate. They cover a fraction of a percent of the total data. Even within that fraction of a percent many documents are filtered out, not all data formats are covered and the multiplicity of protein/gene aliases are intractable. To obtain accurate and meaningful insights all documents should be analyzed.

Existing approaches rely on one or a combination of the following:

  1. Document selection is based on scant information from abstracts & bibliographical data (PubMed/Medline, Thompson Reuters, etc.) This excludes thousands of relevant documents. Modern search tools index and parse the whole content and extract all metadata.
  2. Slow and erratic manual procedures. Researchers have to search for documents and read them. Out of hundreds or thousands of documents they can only read a few. New technologies provide automated processing of up to millions of documents in minutes.
  3. Proprietary software for search and analytics.
  4. Human-curated databases.
  5. Bioinformatics.

The first two approaches ignore many important documents that may have vital information. Moreover, manually extracting the relationship between different data sets and understanding the entanglement insight-bbin the data is prohibitively difficult. It is beyond the ability of the human brain. As for the third approach, there is no published information about proprietary software and how it works. However, there is no evidence that it covers all data, deals with all formats or solves the problem of multiple entity names.

Human-curated databases are playing important roles in information extraction from biomedical data. Curation is error-prone and is time consuming. It takes months to assimilate new data into existing curated data. Sometimes modifying curated databases to include new concepts and discoveries is extremely tedious and is ignored.

Concerning the fifth approach, bioinformatics is an interdisciplinary field. It combines computer science, statistics, mathematics and engineering to analyze and interpret biological data. It has become an important part of many areas of biology and has been used extensively in DNA and protein sequence analysis. However, it is difficult to find researchers trained in all the above fields. Other problems include the heterogeneity of how data are analyzed and annotated and the lack of connectivity among the available data. Moreover, many times bioinformaticians do not have good understanding of the data they are dealing with, which could prevent them from obtaining optimal results. Unfortunately, bioinformatics is not pervasive across an organization as it is not accessible to lab researchers. There is a big discrepancy between the numbers of lab researchers and bioinformaticians in any research facility.

No One Understands Biomedical Data Better Than Pharma Researchers
R&D related to drug discovery is an intricate process. To quote PhRMA, “after learning more about the underlying disease pathway and identifying potential targets, researchers then seek to narrow the field of compounds to one lead compound – a promising molecule that could influence the target and, potentially, become a medicine. They do this in a variety of ways, including creating a molecule from living or synthetic material, using high-throughput screening techniques to select a few promising lab_researcherpossibilities from among thousands of potential candidates, identifying compounds found in nature, and using biotechnology to genetically engineer living systems to produce disease-fighting molecules.”

Pharma researchers understand published biomedical data. They also know the data they generate in the lab and how it is organized. During the process of drug discovery they have to find answers to many difficult questions. Because of the complexity of biological processes, the entanglement of data, with multiple variables operating simultaneously to produce different results, each researcher formulates his/her own approach. This provides many options to solve R&D problems within the same organization.

Pharma Researchers Are The Key To Analytics
As mentioned before, analytics start and end with data. In this respect only researchers know the data to be mined, how it was produced, how to interpret it and what to look for. They know how to determine accuracy of extracted insights and their relevance. When solutions are not handy, they need to interactively interrogate the data to decipher ambiguity.

Computer analytics provide a good solution to extracting insights from biomedical data. One of the problems with existing analytics products is their complexity and the need to hire hard-to-find data analytics-dscientists. The products put the burden of data mining and analytics solely on data scientists who have zero knowledge about biomedical data and it is very difficult for them to understand it. Pharma experts have to explain to them what data they want to mine. Bearing in mind the complexity of biomedical data and diverse objectives of analysis, this is time consuming and could take many iterations. Ultimately the researcher may not get the best extracted data. It is not surprising that about 80% of the data scientist’s time is spent on data mining. Moreover, whenever new data is added, or the scope of data mining is increased, the process has to be repeated. In conclusion putting the burden of data mining and analytics solely on data scientists is not productive.

Democratization Of Pharma Analytics
The most effective solution to pharma R&D analytics is to empower researchers by putting data and analytics directly into their hands and enable them to perform data mining and analysis themselves. Empowerment makes R&D analytics pervasive across an organization. This is the very essence of democratizing data. While a researcher mines the data and performs analytics, a data scientist becomes spot problems solver, getting involved when needed.self_service-a Democratization of pharma analytics is attainable by providing the following:

  1. Self service automated data mining tools. Once the researcher decides the type of data to be mined, the whole process is automated. The tools should handle all data formats from different sources.
  2. Self-service data analytics.

In pursuing drug discovery using data analytics, there are many options when executing the required steps. A researcher has to decide how to mine the data required for his analysis. He also has to formulate the procedures to be executed at different stages of drug development. Sometimes he may have to pause to examine the data to gain better understanding before proceeding to the next step. To obtain better results he may add new parameters and repeat previous procedures. With a data scientist at the helm, the researcher has to explain to the data scientist at all stages what is needed. This slows the R&D process, introduces errors and lowers the quality of the extracted insights.

The number of researchers developing new drugs is far more than the number of data scientists and bioinformaticians by order of magnitude. By democratizing data pharma companies will have up to tens of thousands of researchers who can get accurate and meaningful insights from the data using different procedures. This increases the odds of finding solutions to difficult problems like cancer.

Iteru’s Data Democratization Is The Solution
iteru-logo-aaMany pharma companies started to consider investing in analytics. Some of the big companies decided to partner with data analytics companies. Products of the analytics partners are very expensive. Data mining and analytics are performed by data scientists and every change in the objective of analysis requires hefty consultancy fees. Iteru provides a much better and less expensive solution based on the democratization of pharma analytics. The product satisfies security requirements of the pharma industry.

Leave a Reply

Your email address will not be published. Required fields are marked *