AI Drug Discovery and Therapeutics

AI Requirements and Challenges

Artificial intelligence (AI) has the potential to revolutionize pharma R&D and pharma digitization. It is adept at analyzing immense complex interactions between a large number of variables in vast datasets to drive actionable insights. The effectiveness of an AI solution depends on the nature of the problem and the required accuracy. Very high accuracy remains the biggest challenge to one aspect of R&D which is drug discovery.

AI accuracy requires solving formidable problems of biomedical data. The data is messy, difficult to integrate and is stored in inaccessible silos. Integration is important because complementary information about biology exists in different data types.  Products dealing with biomedical data are not scalable and lack automated classification and labeling. Manual human curation and classification, used extensively by pharma, are inconsistent, time-consuming, and prone to error. It is not suitable for large and complex data sets.

Accuracy also depends on AI algorithms. Many AI products rely on open source AI algorithms. Such algorithms provide an accuracy between 60% – 70% which is adequate for applications like sentiment analysis and marketing trends. It is too low for drug discovery.

Hyped new futuristic solutions based on Generative AI have their challenges. Generative AI provides answers without any supporting evidence and is non-deterministic as it provides different answers for the same question. Moreover, it has inherent limitations in accuracy. Studies indicate that its accuracy is between 40% – 78%. Because of the above, Generative AI does not satisfy the stringent requirement of drug discovery.

Iteru built a platform to address all problems cited above. The platform is scalable and its functionality is expandable. More data types and more AI algorithms can be added enabling it to provide solutions to many pharma digitization problems such as patient care, document management and security, and others.

Drug discovery Needs Very High Accuracy

Accuracy is very crucial for drug discovery and therapeutics. Unlike others, Iteru decided to set a metric of 90% for the accuracy of extracting biomedical entities from raw data. Here, biological entities refer to items associated with cancer, such as oncogenes, proteins, pathways, receptors, etc. With careful design, the target of 90% is attainable. To attain that target, Iteru performed the following:

  • Design accurate proprietary AI algorithms.
  • Improve the quality of the data by designing carefully crafted text extraction methods and proprietary data cleansing algorithms.
  • Build a platform scalable to one petabyte of data. More data leads to better accuracy.
  • Integrate all biomedical data, non-multimodal and multimodal, in one place (a data lake).
  • Provide classification algorithms and automated data labeling to isolate data related to the subject of analysis. Using AI to analyze PubMed’s 37 million documents, or data of similar size, is prohibitively expensive and time consuming. It is important to reduce data prior to feeding it to an AI algorithm. Isolating data should follow a rigorous process to ensure that all needed data is included. There is no manual solution to this problem

Iteru Provides Verifiable Accuracy

Iteru executes a cascade of proprietary AI algorithms. It has to implement its own algorithms because open source algorithms lack accuracy. At each step of data analysis Iteru ensures that 90% accuracy is attained. Iteru could be the only company that provides verifiable accuracy of extracting biomedical entities.

Getting Pharma Scientists Involved

It is a textbook notion that to attain very high AI accuracy, understanding of the data is a MUST. Iteru automates data mining and AI analysis to provide an out of the box platform usable by bio scientists, who understand the data, and have no experience in AI or data science. There is lots of entanglement and ambiguity in biomedical data. For instance, cGMP is involved in heart failure, regulating blood pressure, cardiovascular health and prevention and treatment of breast cancer. A bio scantiest, because of his/her understanding of the data is best suited to refine the objective of analysis to be used by AI to provide desired results. In the objective of analysis, he/she can specify whether the analysis pertains to heart failure, blood pressure, cardiovascular health or cancer. He/she can use the platform to interrogate the data to gain more understanding and add more refinement to the objective of analysis by including oncogenes, pathways, receptors, etc. Refinements remove statistical bias and increase accuracy. Software engineer and data scientists use the initial objective of analysis provided by a bio scientist, but they cannot interrogate the data or effectively refine it.