Understanding the Human Factor in AI Safety and Security Testing: Developing Guidelines
Short description: Generative AI needs to be tested for safety and security. With the advances of automated testing, the question remains whether humans (including experts) can understand and act on the automated test results. We are interested in Swiss organizations including NGOs and companies, of both small and big size having real-world experience with Human-In-The-Loop. We are also interested in academic partners with expertise in HCI or security frameworks or having a general background in the proposed topic.
Call owner: Dr. Rebecca Balebako – Privacy Engineer – We envision starting the project/the work as soon as possible in 2024. If you are interested, fill out this form.
Requested partners: Academia, Industry, Public
Proposed challenge: AI apps and models need to be tested for safety and security concerns. Current efforts are underway to understand whether safety and security tests can be generated and run automatically. Existing automated testing tools can use LLMs and generative AI. While this allows for more thorough testing, it can also generate a deluge of data. When security teams are overwhelmed by data, it can be hard to cull through to find the pressing and concerning issues. This hinders effective responses. While automated security tests can identify vulnerabilities in AI apps and models, little work has been done to consider the human element in the security review process.
This project aims to address this critical gap by:
- Building a community of experts: Bringing together academics, industry practitioners, and security professionals to share knowledge and experiences of human-in-the-loop for AI safety testing.
- Developing human-centric guidelines: Creating best practices for designing tools and interpreting test results that prioritize human experience, allowing them to respond to AI test results and security warnings.
- Establishing measurable outcomes: Defining metrics to assess the effectiveness of tools and human performance in handling AI test results
Background:
The rapid advancement of artificial intelligence (AI) presents unprecedented opportunities for societal and economic progress. However, the potential risks cannot be ignored. This proposal focuses on the challenge of helping humans (developers, security experts and testers) fix and respond to automated security tests of AI. The list of potential harms from generative AI includes (but is not limited to) prompt injection, privacy leaks, toxic responses, bias, and misinformation. The rapid advancement of LLMs has outpaced the development of effective safety testing methodologies. AI red teaming has emerged as one potential option for testing against specific harms. Red teaming is a term borrowed from cybersecurity, and before that, the US military. Red teams run adversarial tests on the AI model or agent and evaluate whether the response was safe or appropriate.
Automated LLM red-teaming can use generative AI to create adversarial tests, and to assess the security and robustness of the response. For example, red team LLMs can generate many variations of adversarial queries using different formulations or languages. Using LLMs to generate and evaluate queries allows the tests to scale. Furthermore, models can also score the results of the tests, deciding whether a model output was toxic, or leaked PII.
Several tech companies are already building and testing the automated red-team approach. Additionally, several companies are providing LLMs or test suites automated testing including: Microsoft PyRIT, Zurich-based Lakera.AI, and Paris-based Giskard.AI. All of these efforts emphasize the need for human reviewers; none claim that the automated tests alone are sufficient.
However, automated testing tools demand significant human effort to analyze and interpret. This information overload can lead to missed critical issues. Humans need good information and good design to fix security. When security experts get too many alerts, they cannot respond to them (called alert fatigue). The result may be that the extensive testing, but missed opportunities to find and fix issues.
There are no standards or best practices for the usability of AI safety testing. More work is needed on the human element of the automated red team tests. This proposal focuses on understanding the role of human reviewers and security workers in understanding and analyzing the results from automated “red teaming” and automated AI audits.
Project Method and Goals: By focusing on the human-machine interface, this project will enhance the ability of organizations to effectively manage AI risks and ensure system security.
Method: Establish a community of experts to facilitate conversation. Assemble relevant
players in the space, focused on those in Switzerland and Europe.
- Include academics and industry experts in AI security, usable security, and HCI.
- Get feedback from existing companies and consultants deploying or running AI safety tests.
- Use a virtual, asynchronous community to develop a shared vocabulary for this issue.
- Potentially host a workshop to discuss what solutions exist for humans who need to analyze and respond to automated red-team or responsible AI results.
Outcome: Write and share guidelines for best practices for including humans in the security and safety testing results.
- Create best practices for designing tools to help people respond to AI tests
- Define metrics can be used for measuring the usability of developer or security responder experience? Alternatively, what metrics should be used to understand human fatigue and likelihood of missing results.
If you are interested in participating, please reply latest by October 25th, 2024 and send us the feedback form attached, stating
- What is the competency that you could bring in?
- Do you have specific experience that might be relevant in the project context?
- What is the contribution to the project goals that you want to bring in?
The next steps will be:
- Based on your proposed contributions, the call owner will decide about the partners to continue on shaping the idea.
- Afterward: an additional workshop or/and definition of the concrete implementation plan.
The rules of the game: Decisions on whom to invite for the first meeting, and whom to select for the workshops and final innovation team will be made by the organization, based on the provided information. The goal is to set up an optimal innovation team for reaching the goals, not to create a team with as many partners as possible.