Join us in Atlanta on April 10th and explore the landscape of security workforce. We will explore the vision, benefits, and use cases of AI for security teams. Request an invite here.
As the demand for generative AI continues to grow, concerns about its safe and reliable deployment have become more prominent than ever. Enterprises want to ensure that the large language model (LLM) applications being developed for internal or external use deliver outputs of the highest quality without veering into unknown territories.
Recognizing these concerns, Microsoft today announced the launch of new Azure AI tools that allow developers to address not only the problem of automatic hallucinations (a very common problem associated with gen AI) but also security vulnerabilities such as prompt injection, where the model is tricked into generating personal or harmful content — like the Taylor Swift deepfakes generated from Microsoft’s own AI image creator.
The offerings are currently being previewed and are expected to become broadly available in the coming months. However, Microsoft has not shared a specific timeline yet.
With the rise of LLMs, prompt injection attacks have become more prominent. Essentially, an attacker can change the input prompt of the model in such a way as to bypass the model’s normal operations, including safety controls, and manipulate it to reveal personal or harmful content, compromising security or privacy. These attacks can be carried out in two ways: directly, where the attacker directly interacts with the LLM, or indirectly, which involves the use of a third-party data source like a malicious webpage.
To fix both these forms of prompt injection, Microsoft is adding Prompt Shields to Azure AI, a comprehensive capability that uses advanced machine learning (ML) algorithms and natural language processing to automatically analyze prompts and third-party data for malicious intent and block them from reaching the model.
It is set to integrate with three AI offerings from Microsoft: Azure OpenAI Service, Azure AI Content Safety and the Azure AI Studio.
But, there’s more.
Beyond working to block out safety and security-threatening prompt injection attacks, Microsoft has also introduced tooling to focus on the reliability of gen AI apps. This includes prebuilt templates for safety-centric system messages and a new feature called “Groundedness Detection”.
The former, as Microsoft explains, allows developers to build system messages that guide the model’s behavior toward safe, responsible and data-grounded outputs. The latter uses a fine-tuned, custom language model to detect hallucinations or inaccurate material in text outputs produced by the model. Both are coming to Azure AI Studio and the Azure OpenAI Service.
Notably, the metric to detect groundedness will also come accompanied by automated evaluations to stress test the gen AI app for risk and safety. These metrics will measure the possibility of the app being jailbroken and producing inappropriate content of any kind. The evaluations will also include natural language explanations to guide developers on how to build appropriate mitigations for the problems.
“Today, many organizations lack the resources to stress test their generative AI applications so they can confidently progress from prototype to production. First, it can be challenging to build a high-quality test dataset that reflects a range of new and emerging risks, such as jailbreak attacks. Even with quality data, evaluations can be a complex and manual process, and development teams may find it difficult to interpret the results to inform effective mitigations,” Sarah Bird, chief product officer of Responsible AI at Microsoft, noted in a blog post
Enhanced monitoring in production
Finally, when the app is in production, Microsoft will provide real-time monitoring to help developers keep a close eye on what inputs and outputs are triggering safety features like Prompt Shields. The feature, coming to Azure OpenAI Service and AI Studio, will produce detailed visualizations highlighting the volume and ratio of user inputs/model outputs that were blocked as well as a breakdown by severity/category.
Using this level of visibility, developers will be able to understand harmful request trends over time and adjust their content filter configurations, controls as well as the broader application design for enhanced safety.
Microsoft has been boosting its AI offerings for quite some time. The company started with OpenAI’s models but has recently expanded to include other offerings, including those from Mistral. More recently, it even hired Mustafa Suleyman and the team from Inflection AI in what has appeared like an approach to reduce dependency on the Sam Altman-led research lab.
Now, the addition of these new safety and reliability tools builds on the work the company has done, giving developers a better, more secure way to build gen AI applications on top of the models it has on offer. Not to mention, the focus on safety and reliability also highlights the company’s commitment to building trusted AI — something that’s critical to enterprises and will eventually help rope in more customers.