What life sciences companies need to know about NIST’s new AI guidance

The National Institute of Standards and Technology (NIST) last week released four new documents on artificial intelligence, mostly focusing on generative AI (GAI). The documents seek to both expand NIST’s existing frameworks to directly address the unique aspects of GAI and outline the novel questions still to be addressed in this field. NIST’s work is also likely to affect FDA’s future policymaking efforts in this space.

BY LAURA DIANGELO, MPH | MAY 8, 2024 8:17 PM CDT

FDA, NIST and their work on artificial intelligence (AI):

  • The National Institute of Standards and Technology (NIST) is part of the U.S. Department of Commerce. As its name implies, NIST is tasked with developing best practices and standards for measurement and evaluation of core economic and scientific topics. NIST develops these standards for a wide array of disciplines – “from the smart electric power grid and electronic health records to atomic clocks, advanced nanomaterials and computer chips, innumerable products and services rely in some way on technology, measurement and standards provided by the National Institute of Standards and Technology.”
  • Other agencies also rely on NIST’s standards to provide methods and guidance for granular and complex topics. NIST’s work is often referred to in other agencies’ policies, and NIST will work with agencies to develop standards on their specific topic areas. For example, NIST and FDA are currently working on a project to develop consensus standards for regenerative medicines, the first batch of which were released in December 2023. The agencies have also worked together on topics such standardization in digital imaging and medical displays and cell and gene therapy manufacturing.
  • Digital health and NIST: The FDA often cites NIST’s work in digital health topics. For example, the concept of a Secure Product Development Framework (SPDF) is a core component of FDA’s cybersecurity guidance, with FDA referring to NIST’s Cybersecurity Framework as a baseline source of information for which most developers should already be familiar, and which is directly cited in the guidance. NIST also maintains a Secure Software Development Framework (SSDF), which is intended to help inform more granular considerations under the cybersecurity framework, and does reflect the types of information FDA requests in its SPDF. Similarly, while the agency’s guidance on the conduct of decentralized trials doesn’t “endorse any specific method” for remotely confirming patient/participant identify digitally, it does point sponsors to NIST’s Digital Identity Guidelines. Notably, industry has long urged FDA to leverage NIST’s work when possible, in order to reduce potential duplication and differing standards across industries (e.g., health and tech), or delays while the FDA works to establish its own systems. As an example, respondents to FDA’s recent discussion paper on AI/ML methods in drug manufacturing pointed to NIST’s work in this area, which is already underway.
  • NIST’s work on Artificial Intelligence (AI) and the recent AI Executive Order (EO): Developing measurement methods and standards for AI has recently been a key priority for NIST; this work received a boost from the White House in late 2023, under President Biden’s Executive Order on “Safe, Secure, and Trustworthy” AI development. Under that EO, the administration tasked NIST with building out a variety of “guidelines and best practices, with the aim of promoting consensus industry standards, for developing and deploying safe, secure, and trustworthy AI systems.” These activities would then be supportive of other agencies’ and Departments’ work under the EO. This is already well within NIST’s wheelhouse, which launched its Trustworthy and Responsible Artificial Intelligence Resource Center (AIRC) in March 2023. The AIRC coordinates work on the NIST AI Risk Management Framework – the first version of which was issued in January 2023 and accompanied by a “Playbook,” or companion resource that offers an implementation guide. That EO included some life sciences-specific standards and strategy development, including directing HHS to develop a new strategy on regulating the use of AI or AI-enabled tools in drug development and establish an HHS AI task force, which would focus on safety related to the various health-related uses of AI (e.g., development, deployment, real-world monitoring).

NIST has released four new initial draft documents on AI

  • Up first: Meet NIST’s work so far. As cited above, NIST maintains both its Risk Management Framework (RMF) for AI and the Secure Software Development Framework (SSDF). The RMF itself is also accompanied by a playbook – the first version of which was released in January 2023 – that is intended to help industry implement the suggestions and recommendations in the RMF. NIST’s SSDF v is “a set of fundamental, sound, and secure software development practices based on established secure software development practice documents” from other established sources in the field, and are intended to support software development lifecycle (SDLC) models to “explicitly address software security in detail” – similar to FDA’s expectations for SPDFs, per the cybersecurity guidance. These documents, along with the Privacy Framework and NICE Workforce Framework, accompany NIST’s Cybersecurity Framework (CSF).
  • The first new documents from NIST, published on April 29, are companions to the AI RMF and SSDF. These new documents focus on mitigating the risks of generative AI (GAI), which accompanies the RMF, and reducing threats in training data for AI, which accompanies the SSDF.
  • What follows is somewhat abbreviated, with our recap focused only on matters focused on (or related to) the life sciences sector.

First, applying the NIST RMF to GAI (NIST AI 600-1)

  • The first: Mitigating the risks of generative AI (GAI). This document is intended to “be a companion resource” for entities using NIST’s AI RMF. While the RMF discusses risk considerations for AI more broadly, the new document is an AI Profile of risks specific to GenAI, or technologies that “emulate the structure and characteristics of input data in order to generate synthetic content” – for example, a chatbot or image generator. The NIST team identified 12 specific risks “that are novel to or exacerbated by the use of GAI,” and offers developers a way to “frame and execute risk management plans” through the lens of these risks in particular through a set of corresponding “actions,” broken down by the risk. Most of the 80-page guidance lists specific “actions” in tables to address these different risks, recommending granular action items related to governance, oversight, scientific integrity, mitigating bias, and ensuring “operator and practitioner proficiency.” Notably, NIST is requesting comment on the list itself.
  • Number one: Chemical, Biological, Radiological and Nuclear (CBRN) information. NIST flags concerns that GAI “may increasingly facilitate eased access to information related to CBRN hazards” or the development of “chemical and biological design tools” that can “predict and generate novel structures” for new biohazards or chemical weapons. Similar to AI used to provide early-stage development information for therapeutics, NIST identified the risk of the same technology type being used to invent novel CBRN.
  • Number two: Confabulation, or “a phenomenon in which GAI systems generate and confidently present erroneous or false content to mee the programmed objective of fulfilling a user’s prompt,” also sometimes known as “hallucinations.” The NIST document specifically cites health care-related examples, such as “a confabulated summary of patient information reports” that could harm patients; another example would be a GAI-enabled clinical decision support (CDS) tool that invents clinical guidelines.
  • Number three: Dangerous or violent recommendations. While the NIST document focuses on the manipulation of individuals or inciting users to violence, dangerous or violent recommendations from GAI for life sciences use cases could include Large Language Model (LLM) outputs that, without appropriate context or clinical expertise (and human oversight), lead to medication errors among queriers.
  • Number four: Data privacy. GAI systems “implicate numerous risks to privacy,” including biometric and health data, should they “leak, generate, or correctly infer” this information. NIST’s document cites training data for GAI as the main source of these privacy concerns, including circumstances in which GAI models could “correctly infer” information from new queries (e.g., location, political leaning, age) based on the training data even without it being disclosed by a user. Key issues here include “harmful bias and discrimination… based on predictive inferences.” As a key life sciences use case here, MIT published research in 2022 that found AI would infer race from medical images “that contain no indications of race detectable by human experts,” which has downstream impacts for the way that algorithm may make decisions about next steps (e.g., for example, a 2019 study published in Scientific American de-prioritized Black patients because the algorithm was using “health costs as a proxy for health needs” – in effect, lower health spending on/for Black patients made the algorithm assume they needed less care than comparable White patients). Further, the protection of patients and research participants is a key concern for the FDA, including safety related to data privacy; the impact of a technology that can “correctly infer” protected information could have on research participant safety has not yet been applied to the FDA’s regulatory systems.
  • Number six: Human-AI configuration. The way that human users interact with AI “can contribute to risks for abuse, misuse, and unsafe repurposing by humans,” and there are “varying levels” of interaction between humans and AI technologies. NIST cites both risks associated with human aversion to GAI content, as well as automation bias – notably, automation bias (in which humans over-rely on technologically-created content) has long been a concern for the FDA in regulating AI-based technologies. In medical device regulation, this is called out as part of the “human factors” (or usability engineering) considerations for AI-incorporating devices.
  • Number seven: Information integrity. GAI can facilitate the production or spread of “false, inaccurate, or misleading content at scale,” including but not necessarily exclusive to confabulations. NIST’s document particularly calls out the risks associated with the broad potential for dissemination of incorrect information, such as “deepfakes” or bad-faith information that can erode public trust. In life sciences, the spread of misinformation and disinformation predominantly around product safety or basic scientific concepts has been a concern for FDA Commissioner Califf; from a life sciences perspective writ large, GAI information about safety issues such as recalls or other corrective actions would likely be a concern in this area, or public trust and understanding of other communications from either the FDA or the product manufacturer, such as labeling.
  • Number eight: Information security. As NIST describes, there are two information security risks from GAI, both that it may offensively “discover or enable” new cybersecurity risks, and then that it may expand the attack surface (i.e., vulnerability) of technologies. LLMs can support novel attack methods and uncover new vulnerabilities and write code to exploit them. Further, GAIs can be subject to “prompt-injection,” where attackers exploit the input prompts to manipulate the GAI. Finally, “another novel cybersecurity risk to GAI is data poisoning,” in which a training set is specifically manipulated to tamper with outputs.
  • Number nine: Intellectual property. “GAI systems may infringe on copyrighted or trademarked content, trade secrets, or other licensed content,” which are “often” part of the training data sets. Therefore, there’s a risk of the GAI output violating intellectual property laws.
  • Number eleven: Toxicity, bias, and homogenization. These risks include hate speech, “denigrating or stereotypical content,” and upholding of societal biases; similar to the example of AI algorithms de-prioritizing Black patients above, NIST cites GAI outputs that assume CEOs, doctors, or lawyers are men. This could result in output homogenization, which then could create “singular points of failure of discrimination or exclusion that replicate to many downstream applications,” or in FDA parlance, a lack of generalization.
  • Number twelve: Value chain and component integration. As NIST describes, “GAI system value chains often involve many third-party components such as procured datasets, pre-trained models, and software libraries,” which may diminish transparency and accountability downstream.
  • NIST is looking for comment: Specifically, NIST is looking for comment on what should go in the document’s glossary (including “novel keywords”), whether the risk list is comprehensive or should be further stratified into groups, and whether the actions identified for each of the 12 risks are appropriate.
  • As noted, a majority of the document is taken up with tables with specific action items to mitigate (the identified) risks(s), which range from general (e.g., establish processes to remain aware of evolving risks) to technical and specific (e.g., “Integrate digital watermarks, blockchain technology, cryptographic hash functions, metadata embedding, or other content provenance techniques within AI-generated content to track its source and manipulation history”). These action items are largely focused at getting ahead of risks – documenting every step, conducting audits and oversight, planning ahead for how developers would both identify and address if a GAI has gone beyond its bounds, and ensuring that all of these processes and procedures are appropriately documented within the organization and across sectors.
  • NIST also announced a challenge on GAI: NIST GenAI is “a new program to evaluate and measure generative AI technologies,” which will “issue a series of challenge problems designed to evaluate and measure the capabilities and limitations of generative AI technologies.” NIST will use the challenge problems as a way to pressure test key questions around GAI and the expected human response in order to build guardrails and measurement methods. Registration will open this month.

The second document, an accompaniment to the Secure Software Development Framework, also focuses on GAI (NIST Special Publication (SP) 800-218A)

  • The second new document from NIST: Reducing threats to the data used to train AI systems. This is intended to serve as a supplement to the SSDF, or as the document calls it an “SSDF Community Profile,” which applies the SSDF practices to the particular use case – in this case, GAI. “While the SSDF is broadly concerned with securing the software’s lines of code, this companion resource expands the SSDF to help address concerns around malicious training data adversely affecting generative AI systems.” The companion resource addresses “dealing with the training data and data collection process, including a matrix that identifies potential risk factors and strategies to address them.”
  • The document is targeted at organizations that produce (develop) GAI models, software’s that use GAI, or those that acquire a product or service leveraging GAI. This doesn’t mean just GAI technologies, but also “dual use” foundation models, or models that will use both GAI and other types of models. The focus is on the way that these models are developed, including sourcing data and designing, training, fine-tuning, and evaluating models and the way they’re incorporated into other software.
  • The recommendations are in section 3 of the document, which lays out the SSDF Community Profile for AI Model Development. The Profile itself is defined as a chart, leveraging the SSDF structure to specifically apply considerations for AI model development. As in the SSDF, these are broken into the four key groups of practices (Prepare the organization; Protect the Software; Produce well-secured software; Respond to vulnerabilities) and tasks for implementing those practices. The Community Profile goes deeper to assign priority levels (high, medium, low) and then the “recommendations, considerations and notes” per each task for the practices outlined in SSDF specifically to address AI model development concerns.
  • The recommendations particularly focus on documentation and transparency, including appropriate documentation of policies and guidelines related to every aspect of AI development, such as how developers will define integrity and provenance and keeping certain parameters (e.g., weights) separate from training and testing data. Overall, they focus on ways to analyze any training data used to identify the risk or potential for them to be in some way unfit – for example, poisoned or tampered with, with unmitigated bias, or training dataset homogeneity. As in the GAI document, the new SSDF Community Profile emphasizes the need for continuous monitoring for both performance and security.
  • Again, NIST is seeking comment: The document requests input on whether the additions to the SSDF are reasonable, the content of the recommendations, and whether there is additional content that would be useful for organizations. Further, NIST is requesting comment on what other security-related considerations should be added (e.g., cybersecurity or reproducibility), options for providing examples, expanding the Profile to other AI technologies, and what other guidance or templates industry would find useful.

Third: Reducing risk from synthetic content (NIST AI 100-4)

  • Third: Reducing risks posed by synthetic content (promoting transparency in digital content). This is the longest of the new documents, offering 95 pages focused on synthetic content – or the types of content created or altered by AI. Of those 95 pages, about 40 focus on the content of the guidance (A cautionary note: the document includes a section specifically on sexual abuse of both adults and children), while there’s a 20-page series of appendices recapping the information in the document. These include NIST’s list of the current standards that exist on synthetic content (e.g., ISO/IEC risk management standards for AI), a repository of existing technical tools and synthetic detection datasets, examples of how they’ve been deployed, and examples of how test methods v and evaluations can be deployed in this space. Finally, this document does include a draft glossary of key terms.
  • What’s in the document? “NIST AI 100-4 lays out methods for detecting, authenticating and labeling synthetic content, including digital watermarking and metadata recording, where information indicating the origin or history of content such as an image or sound recording is embedded in the content to assist in verifying its authenticity. The report does not focus only on the dangers of synthetic content; it is intended to reduce risks from synthetic content by understanding and applying technical approaches for improving the content’s transparency, based on use case and context,” per NIST. The document includes discussions of the technical approaches for transparency in synthetic content, but also emerging technologies and methods, and “selected opportunities for further development.”
  • Notably, the report doesn’t seem to reach any firm conclusions on best practices just yet: “This report is a resource to promote understanding and help to lay the groundwork for the development of additional, improved technical approaches to advancing synthetic content provenance, detection, labeling, and authentication,” it concludes.
  • NIST is requesting comment: “Comments are especially requested” on several key points in the report, including the way that NIST has defined the current methods and whether the report captures the digital content transparency technical landscape. NIST is also looking for information about testing and evaluation techniques and additional standards development approaches. Each section in the report has an “Additional Issues for Consideration” section, and NIST is actively seeking information about its presentation of those ideas and what next steps could be.
  • Key concepts: NIST identified both provenance data tracking and synthetic content detection as the current approaches to enhance digital content transparency – in effect, looking into the inputs (provenance) and the outputs (detection) as transparency mechanisms.
  • Provenance data tracking: These methods include, primarily, digital watermarking and metadata recording. NIST also walks through the efficacy of provenance data tracking for specific types of content – for example, NIST determined that provenance tracking is “further developed for images than for any other medium,” but adoption remains low, while “text is considered by far the most difficult modality when it comes to maintaining provenance.” Audio and video authenticity methods also remain under development – and add the complexity of time domains. At a high level, NIST concludes that there is significant research needed in this area to establish best practices.
  • Digital watermarking, which involves “embedding information into content (image, text, audio, video) while making it difficult to remove” and can be either covert (machine readable) or overt (perceived by humans). The document lists some preliminary best practices for digital watermarks, including the attributes that are “most effective” in a digital watermark, design choices for watermarking techniques, technical methods for designing covert watermarks and the additional issues for consideration of using digital watermarks. These include concerns about technical tradeoffs and tampering, as well as trust and scale of these watermarks. Going forward, NIST cites ongoing work and new opportunities in understanding how embedded watermarks are perceived, as well as emerging techniques like statistical watermarks.
  • Metadata is a type of receipt on digital content, generated when the digital content “is created, uploaded, downloaded, or modified.” Recording metadata can help indicate where content comes from. The NIST document highlights different recording techniques, such as digital fingerprints, and breaks down the different types of metadata (e.g., descriptive, structural) and how it can be authenticated. As next steps in synthetic content identification, NIST notes that “metadata can be used to verify the origins of content and how the history for a piece of content may change over time,” and describes methods for content authentication using metadata. As far as additional issues that need to be considered with this method, NIST flags privacy concerns, trustworthiness and integrity (and public perception of these data), security, and quality and management of metadata. Going forward, “further research is needed” to understand what impact metadata recording will have on public trust, as well as evaluations of the utility of such recordings and address risks that accompany recording of metadata.
  • Next, detecting synthetic content. This can include using techniques based on provenance data, as well as automated content-based detection and human-assisted detection. Again, NIST walks through the various benefits and limitations of these methods in different mediums, and concludes that “Existing detectors primarily emphasize discriminating between synthetic content and human-produced content.” Although some methods – including DARPA SemaFor – have “made some progress… there is still room for improvement,” NIST concludes, particularly related to how noise (both literal noise and additional factors that could corrupt content, like reformatting) impact these detection methods.
  • Testing and evaluating provenance data tracking and the detection techniques: Again, NIST concludes that more research is needed after walking through the different evaluation methods, stating that “More socio-technical research and evaluations to understand how people interact with digital content transparency approaches across various types of systems and in varied environments across the Internet will be helpful to design and implement techniques effectively.”

Finally: Global AI Standards (NIST AI 100-5)

  • Fourth: A plan for developing global AI standards. This document, NIST AI 100-5, describes the plan NIST is developing to engage with other governments and standards development organization on standards for AI. NIST further explains its plans for effectuating on this work, including holding meetings internally and with other governments, and expanding on current relationships.
  • Top priorities: “Urgently needed and ready for standardization.” NIST cites several existing standards (e.g., ISO/IEC 22989:2022 “AI concepts and terminology”) but also flags outstanding topics not covered in those resources. Per NIST, the following topics are “urgently needed and ready for standardization”: terminology and taxonomy, measurement and mitigations for risks and safety issues, testing, evaluation, verification, and validation (TEVV), mechanisms for enhanced transparency (including synthetic content identification, see above), risk-based management of AI systems, security, and transparency.
  • Needed, but not ready for standards development just yet: These issues include energy consumption of AI models, incident and recovery plans, conformity assessments (and compliance procedures), datasets, and channels for upstream reporting.
  • Needed and nowhere near ready for standards development: NIST identified two key areas in this bucket, including techniques for interpretability and explainability and the topic of human-AI integration (i.e., the way humans and AI systems interact).

Analysis and what’s next

  • Quickly: “These publications are initial drafts, which NIST is publishing now to solicit public feedback before submitting final versions later this year.”
  • The documents are very much a jumping off point for NIST, particularly those addressing GAI issues. They do provide a helpful landscape analysis of what methods and best practices do exist, and what kind of evaluation methods can be used on GAI content as of now. The ranking of different evaluation methods for assessing different mediums of synthetic content (e.g., image, text) may be particularly helpful, particularly the specific concerns that NIST has called out as limiting the utility of those methods.
  • The documents on applying the NIST RMF for GAI and SSDF considerations for GAI (and dual-method models) demonstrate a path for building out best practices. These documents are intended to help provide more granular recommendations on GAI topics in a more targeted and contextualized way, while building and expanding on the existing frameworks under the RMF and SSDF; in effect, logically applying new concepts and considerations to the standing best practices. This is likely to help developers navigate the new information within the baseline of the original (or, as applicable, updated) NIST documents. The 12 distinct risks identified in NIST AI 600-1 so far, for example, may help developers build out oversight systems early in development that account for future theoretical risks, or the risks that regulators may be concerned about in the future.
  • What can we expect from FDA? As NIST acknowledges in its standards document, best practices on conformity assessments and compliance with regulatory expectations fall under the bucket of “needed, but requiring significant foundational work.” That said, FDA guidance on GAI issues is likely a longer-term priority, even as NIST will continue working to help establish that baseline. FDA is expected to release some new guidance on AI in 2024, including policies on pre-determined change control plans (PCCPs) and total product lifecycle (TPLC) approaches for AI-based products.


To contact the author of this item, please email Laura DiAngelo ( ldiangelo@agencyiq.com).
To contact the editor of this item, please email Alexander Gaffney ( agaffney@agencyiq.com)

Key Documents and Dates

Copy link
Powered by Social Snap