Monika Jotautaitė
AI Safety Researcher
About me
I'm an AI control researcher, working on monitor red-teaming at Apollo Research.
Previously, I was an independent AI researcher leading a 2 person team, focusing on MonitoringBench. My work was supported by Co-Efficient Giving (formerly Open Philanthropy) and the Berkeley Existential Risk Initiative.
Since the January preview, MonitoringBench has been used for monitor evaluations in Anthropic's Claude Mythos Preview Risk Report and OpenAI's Auto-Review for Codex.
Research
My work
UK AISI Bounty Program: As an evaluations scientist, I designed and implemented multiple evaluation proposals that were accepted for the UK AISI bounty program. The evalautions I worked on include evaluating models on the following capabilities: LLM elicitation, online gambling, collusion in AI debate and decreasing test-time token usage evaluations as well as the SmartBackdoor paper. I was also a technical program manager with the ASET Benchmarks program mentoring a team of engineers in cybersecurity eval implementation in Inspect at Arcadia Impact.
I organize Women in AI Safety London, a series of networking events. To receive updates on events and opportunities, join our mailing list. If you're interested in organizing a local event, you can apply here.
I occasionally teach at ML4Good bootcamps as a head teacher or a TA. I created new materials on LLM evaluations and RL. Find upcoming programs at ml4good.org/upcoming.
I created AI Safety materials for GirlsWhoML (slides). If you'd like to run this lecture series at your university, reach out here.