Intentional By Design: Are AI Tools Safe for Survivors?
June 3, 2026
Made with in Raleigh, NC, USA
© Our Wave 2026. All rights reserved.
Show resources for
PanamaJune 3, 2026

This is the third article in our Intentional by Design series. Read the first article on the technology behind our community platform and the second article on the philosophy guiding every tech decision we make.
When someone experiences sexual harm, the path to support is not linear. Disclosure can feel impossible. Asking questions out loud, whether it’s to a friend, a clinician, or a hotline, can take years. So when an AI chatbot answers privately, in seconds, with no follow-up questions about who you are, there is a certain appeal that is evident.
Survivors of sexual harm are already turning to consumer AI tools for support. A nationwide survey of 645 participants, including 259 survivors of sexual assault, found that at least one in four were using AI chatbots to help process their experiences.
But almost no one is checking whether these AI tools are actually safe for an already vulnerable population. There is no trauma-informed standard built for the kinds of questions survivors actually ask.
The way people access information online is shifting fast. Last month, Google announced that its AI Mode has crossed one billion monthly users, just one year after its launch. A study found that when an AI summary appears in search results, users click through to source material only 8% of the time.
People used to scan result pages and make their own judgement call. Now, many read a single AI-generated paragraph and stop. ChatGPT, Claude, Gemini, and Grok have built products around that behavior.
For survivors, this means the answers they receive could be the only answers they see. That’s why evaluating these tools through a trauma-informed lens matters now in a way it didn’t even a year ago.
Frameworks like the NIST AI Risk Management Framework give us a way to evaluate AI safety in general. But what’s missing is a validated benchmark on AI safety for survivors of sexual violence.
Survivors are using AI for some of their most consequential decisions. They may be asking the chatbot whether to report, where to go for a SANE exam, if their experience “counts” as assault, and generally how to interpret what just happened to them. AI could answer any of these questions inaccurately or simply deliver an answer in a tone that minimizes harm.
“We are not seeing a lot of information around the different variables that we need to think about when we’re integrating AI systems. How safe are they? How do they handle crisis response? How are we thinking about privacy? How do we think about their quality when it comes to responses to different identities?”
– Kyle Linton, Co-Founder and Executive Director
We’re in the early stages of designing a trauma-informed benchmark for evaluating how leading consumer AI chatbots respond to survivors of sexual violence. As part of this evaluation, we’re in conversation with academic researchers and survivor-serving organizations.
We are looking at evaluating the four most widely used consumer AI chatbots. These are the ones survivors are most likely to encounter, and therefore the ones whose responses carry the most weight.
ChatGPT (OpenAI)
Claude (Anthropic)
Gemini (Google)
Grok (xAI)
An AI benchmark project like this would evaluate how these four systems perform across various areas including:
Crisis response: How models handle mentions of self-harm, immediate danger, or safety needs
Consistency across identities and experiences: Whether responses hold the same level of care for survivors from different backgrounds and types of harm
Safety guardrails for unlabeled experiences: How models respond when a survivor hasn’t yet named what they’ve been through
Most AI evaluation relies on test inputs and automated scoring. While it’s efficient, it misses how survivors actually phrase questions and interpret responses. A trauma-informed benchmark for AI tools would include direct human review by survivors.
This commitment to human review isn’t unique to a benchmarking project like this. It’s the same standard we apply across every survivor-facing tool we build.
“We have an ethical obligation to make sure that the delivery of any content that’s survivor-facing is read by a human with expertise and validated before it goes out. Not only for the accuracy, but also for the language and the voice we’re delivering it in. We don’t let things go out automatically until they’ve been appropriately tested, vetted, and piloted.”
– Dr. Laura Sinko, Director of Research and Survivor Support
Whether or not current AI tools are safe for survivors of sexual harm is not a question we can or should answer alone. Researchers across the global trauma recovery community are already coming to conclusions and discovering patterns in this work.
Dr. Sachiko Kita, Representative Director of the Institute of Trauma Recovery in Japan and a board member of the MiStory consortium, recently built and tested a custom AI tool designed to classify survivor recovery states from interview transcripts.
When she ran it on transcripts from 17 Japanese survivors, the agreement rate with trained human interviewers was very low. Looking at where the AI tool diverged, her team noticed a consistent pattern. The tool tended to read survivors who were struggling as hopeful.
“Even when survivors repeatedly described painful looping experiences and overwhelming emotions, if there was a single hopeful statement at the end, the AI tended to interpret this as ‘the survivor is overwhelmed but still hopeful,’ and classify them in a more recovered state.”
– Dr. Sachiko Kita, Institute of Trauma Recovery
She points to a structural limit in these AI tools. They read text but miss the nonverbal context, emotional tone, and interpersonal dynamics that survivors often communicate through.
Rebecca Wong, MSW, an Our Wave research volunteer and Project Coordinator at NYU’s Center on Violence and Recovery, sees data privacy as one of the main risks survivors face from current consumer AI tools. Because AI responses are written to be affirming, they can create what researchers have called a false “social bond,” an illusion of safety that encourages survivors to share even more sensitive information.
The fix doesn’t start with how we test AI tools. Instead, it starts with how AI tools are designed in the first place.
“For consumer-facing AI tools to meaningfully support survivors’ healing, the design process around AI needs to be inverted. Rather than seeing what is technologically possible and then building guardrails around these products, a trauma-informed focus needs to be incorporated at every step of the design process. Shaping tools that can contribute to healing must begin by asking survivors what their priorities, values, and concerns are, and then building tools to achieve that vision.”
– Rebecca Wong, MSW
The findings and recommendations that come from a project like this help create change for our own systems, the broader field, and AI providers.
Every AI-assisted feature on our platform, from our Q&A agent to the research synthesis tools supporting Harbor, improves when we have a clearer picture of where these models fall short.
Findings would give other organizations in the GBV and sexual violence field an evidence base they can use to help those they serve. The outputs we’re envisioning include a public benchmarking report, an open data and code repository with privacy protections, and a peer-reviewed academic manuscript.
OpenAI, Anthropic, Google, and xAI are all expanding their AI Search features. Work like this could:
Bring findings directly to their trust and safety teams
Establish a replicable methodology that can be re-run as the models update
Survivors deserve to know whether the tools they’re already turning to are safe. Current research raises concerns, but a complete answer needs trauma-informed evaluation. What we do know is that survivor voices must be built into how these AI tools are designed from the start.
If your organization is interested in this work, we’d love to talk. For a closer look at how we govern AI internally, read our AI Policy. To explore the community survivors have built, visit our platform.
The next article in this series will go deeper into our digital field work, including what survivors are searching for online today and what’s still missing.
It depends on the model, the question, and the moment. No validated, trauma-informed benchmark currently exists to evaluate how leading AI tools, like ChatGPT, respond to survivors of sexual violence. Our Wave is exploring what one could look like.
A trauma-informed AI benchmark evaluates AI model responses against criteria built from survivor-centered clinical principles and direct survivor input. The rubric we’re proposing would score models on accuracy, safety, trauma-informed language, resource provision, harm avoidance, trustworthiness, and accessibility.
Beyond inaccurate answers, survivors face risks around data privacy. Most AI chatbots are not HIPAA compliant unless they explicitly say so, and conversations with them can be discoverable in legal cases. AI responses are also designed to be affirming, which can create a false sense of safety that encourages users to share more than they otherwise would.
This is the third article in the Intentional By Design series, exploring the technology, philosophy, and research being built at Our Wave.
Our Wave depends on your generous contributions for our continued success. Give today and support us as we work to support survivors of sexual harm and domestic violence.
Read stories Give todayUpdates, events, and ways to help out. Directly to your inbox.
PanamaOur Wave is a 501(c)(3) nonprofit organization and an anonymous service. For additional resources, visit the Our Wave Resources Hub. If this is an emergency, please contact your local emergency service.