According to a new FDA contract notice, the agency is interested in procuring a new artificial intelligence-based Optical Character Recognition system to extract data from PDFs obtained during remote regulatory assessments. While the notice contains relatively few details, it appears that the agency is currently looking into leveraging it to benefit a biologics-specific system.
A new contract notice seeks software support to leverage AI-based data extraction technology.
- Quick background: Optical Character Recognition (OCR) technology systems convert physical, printed, or scanned documents into machine-readable text, as defined by IBM. OCR software can also harness artificial intelligence (AI) to implement more advanced recognition methods.
- In a new contract notice, FDA seeks sources that could furnish “ThinkTrends Software Support for Optical Character Recognition (OCR) PDF Extraction.” On its webpage, ThinkTrends bills itself as a codeless generative AI automation platform. The company highlights several applications relevant to life sciences regulation, including pharmacovigilance and quality assurance and regulatory compliance management. In addition, some tools have been tailored specifically for federal agency infrastructure. Per the contract notice, the FDA is looking for: “an AI-based OCR (Optical Character Recognition) function that provides a PDF extraction solution. The solution shall be able to extract data tables from bioanalytical reports in a PDF format at greater than 90% accuracy level to allow for reviewers to access data in a format which is amenable to filtering specific data sets, performing calculations, and other analyses that are critical to their processes.”
- The use case appears to be in Remote Regulatory Assessments (RRAs). RRAs were widely used during the Covid-19 public health emergency (PHE), and are “remote assessments of an FDA-regulated establishment and/or its records can help determine compliance with applicable FDA requirements, inform regulatory decisions and verify information submitted to the agency.” While they do not replace inspections, RRAs are considered an “additional regulatory tool.” The FDA issued a draft guidance on RRAs outside of the Covid-19 PHE context in July of 2022, which stated that all RRA-associated documents should be submitted in electronic format, and even paper documents should be scanned as searchable Portable Document Format (PDF) files for submission. A few months later, the agency updated its Compliance Program Guide (CPG) on preapproval inspections to feature information on RRAs, [ read AgencyIQ’s analysis here], but has not provided additional updates on RRAs since that time.
- Per the RRA guidance, the type of information that would be requested during such an assessment “will typically be similar to what FDA would request during an inspection” – which can include bioanalytical reports that provide information about the bioanalytical methods used in pharmacology, bioavailability (BA) and bioequivalence (BE) studies that require pharmacokinetic, toxicokinetic or biomarker concentration development (or nonclinical studies that require toxicokinetic or biomarker concentration data). From the contract notice, it appears that the FDA wants to look into using AI to extract data from these reports (see a template here) to support RRAs.
What does the FDA want to do, anyways?
- AgencyIQ has a question about the project described. The contract notice specifically states that the AI-based PDF extraction solution will be incorporated “as part of the BEST platform.” While the notice itself does not spell out “BEST,” AgencyIQ would assume it means the Biologics Effectiveness and Safety (BEST) System, a branch of the FDA’s active surveillance initiative for drug products, Sentinel. The BEST Platform, as part of Sentinel, has recently been doing some interesting work in its Innovative Methods (IM) Initiative to look at ways in which AI and automation can enhance the FDA’s work in active surveillance, including using semi-automated tools for review of potential adverse events cases and better scanning an extraction of Electronic Health Record (EHR) data. However, as far as AgencyIQ knows, the BEST (Sentinel) Platform is not currently being considered as part of the inspections process. Nevertheless, the contract notice states that: “The BEST platform described above is an example of RRA.”
- In AgencyIQ’s thinking, the contract notice could also indicate that the FDA is working to build out its capabilities related to its new authority to conduct RRAs of facilities involved in clinical trials. Under the recent Food and Drug Omnibus Reform Act (FDORA) legislation, Section 3612 gave FDA additional authority related to Bioresearch Monitoring (BIMO) inspections. These inspections are used by FDA to assess sites involved in clinical research, and involve activities such as inspections, data audits and more. A core focus of BIMO is on clinical data integrity and uncovering mistakes or outright clinical trial fraud. FDORA gave FDA the authority to request access to “any electronic information system” used by a company to hold, process, analyze or transfer clinical records, as well as gives FDA the authority to record or copy this information. It also extends FDA’s explicit authority to include not just clinical and non-clinical studies conducted in advance of approval, but also all postmarket safety activities, “other clinical investigation[s]” of a drug or device, or “other submissions … [for] which the Secretary determines an inspection under this paragraph is warranted in the interest of public health.”
- Based on that new authority, we think this system could be the way that FDA is working to make better use of the information it receives. For example, the notice calls for any system to integrate with the FDA’s “Study Data Platform loading module,” and to be able to extract information from “bioanalytical reports.” Of course, there’s always the possibility that the FDA is developing another Platform that they have, efficiently, decided to call “BEST.”
- However, it’s also worth noting a line from a July 2022 press release from FDA Commissioner Robert Califf on RRAs that could explain the FDA’s intent and the relation to BEST. “Over the last two years, we’ve performed more than 1,470 domestic and more than 600 foreign entity establishment RRAs. As a result of these RRAs, we’ve identified unreported adverse events, gathered information to add products that appear to be violative to import alerts, evaluated the status of companies correcting issues from a previous inspection and helped the agency make regulatory decisions for product premarket submissions.” (Emphasis added). It may well be the case that FDA is interested in assessing and analyzing adverse event reports received during an RRA without waiting for the company to submit them at a later date – an approach which could explain why the notice mentions BEST.
- The contractor looking to provide the OCR must be capable of meeting rigorous – though far from flawless – performance standards. OCR should extract data from text and tables with over 90% accuracy, among other metrics. The contractor should be able to process batches of documents with a 24-hour turnaround, sharing real-time status updates and error reports. In addition, the contract is associated with several technical and project management deliverables. In addition to installing, configuring, and training the software and algorithm, the contractor will “develop a streamlined and efficient flow of extracted data.” Along the way, the contractor will develop status reports and comprehensive documentation of the work plan and code.
- What’s next? As the contract progresses through administrative processes, AgencyIQ anticipates publication of a Statement of Work (SOW) document with additional details on the specific use cases for the technology. AgencyIQ would also note that the FDA is still looking for Congress to expand its authority over RRAs, including a specific request for authority to allow the FDA to require firms to engage in RRAs, and specifically remote interactive evaluations. In general, it appears that the agency is looking to bolster its own technical capabilities to conduct RRAs – and make those assessments more efficient – even as it’s asking Congress to expand such programs.
To contact the author of this item, please email Amanda Conti ( [email protected]) and Laura DiAngelo ( [email protected])
To contact the editor of this item, please email Alexander Gaffney ( [email protected])