Name
46908 - A Hybrid Machine Learning Approach to Safe and Consistent Crisis Detection in Digital Platforms
Date
Thursday, November 20, 2025
Time
11:50 AM - 12:00 PM (EST)
Description

Digital platforms are increasingly used for communication, self-expression, and chronicling of lived experiences. From classical social media platforms like Facebook to forums such as Reddit, or digital journaling applications such as our own Mirror Journal from the Child Mind Institute, users regularly express themselves online. While the most social of these environments provide transparency such that friends or family members may be able to notice when a loved one is experiencing distress and intervene, that is far from ubiquitous. Platforms such as Reddit or journaling apps provide no obvious avenue for crisis detection or intervention by community members or care professionals. Unfortunately, this lack of transparency and connection overlaps significantly with where individuals experiencing distress may go to express themselves — for example, the SuicideWatch Reddit community has over half a million active users. Natural language processing (NLP) and AI tools such as Large Language Models (LLMs) may provide an opportunity to process posts or entries in this space and present helpful resources to users experiencing or conveying severe distress. However, the lack of transparency in the performance of LLMs, their use of data, and their regular refusal to perform certain tasks severely limits our ability to use these tools in a way that is consistently safe for users.

In this presentation, we will demonstrate and walk through our multi-level risk and crisis detection tool that we developed and have actively deployed in the Mirror Journal digital journaling application. This approach leverages open datasets and NLP modelling to build a stable and deterministic foundation for assessing risk based on concerns of self-harm or suicidal ideation. Once we developed a fine-tuned model for binary classification of self-harm, we worked with clinicians to annotate 1,000 journal entries with a status of no risk, concerning, or high risk, and used this as the basis for further fine-tuning. Finally, we use local-LLM agents to provide additional coverage that expands this evaluation to domains where less public data is available (e.g., sexual violence), to provide a secondary safeguard. Once entries have gone through this evaluation flow, we cater to users’ experiences accordingly. In the case of no-risk entries, users are provided with relatively unstructured local-LLM-generated summaries of their journaling entries for reflection. In the case of concerning entries, these summaries are heavily constrained to rely on strict ‘grammars’ that limit how the LLM can respond, including limiting responses to clinically-approved templates, and providing explicit pointers to care and support. In high risk cases, users receive no summary and are directed urgently to their care and support page, where they are further directed to reach out to their trusted contacts or care services. In the two months since being integrated into our application (1,000 daily active users), approximately 10% of entries have been flagged as concerning, and 2% flagged as high risk, with a total of 300 individuals reaching out to 9-1-1, 9-8-8, or a crisis helpline.

Gregory Kiar
Location Name
Regatta Room