报告题目:Can We Trust AI With Power? From Chatbots to Coworkers: The New Science of AI Safety
报告时间:2026年5月25日上午10:00-11:00
报告地点:开发区校区综合楼五楼第二会议室(开通腾讯线上会议)
报告人:任小蕾
报告内容:
AI systems are evolving from passive chatbots into autonomous agents that browse the web, write code, and complete multi-step tasks. METR reports that the longest task a frontier AI agent can complete has grown from 4 minutes in early 2024 to over 16 hours today — doubling roughly every 105 days. This shift changes AI safety's central question: no longer only whether AI says the right thing, but whether AI can be safely trusted with real-world power.
This talk explores that question through four concrete stories — a model that answers confidently without accountability, an AI code generator that introduces vulnerabilities at scale, an agent hijacked by hidden webpage instructions, and a deepfake that attacks human trust rather than software — and uses them to motivate a five-layer AI Safety Stack spanning model alignment, agent control, runtime monitoring, evaluation science, and governance. The talk closes with open research directions for students, and the question that may define the field: if a capable AI agent were trying to appear safe, would we be able to tell?
报告人简介:
Dr. Xiaolei Ren is an Assistant Professor and doctoral supervisor at the Macau University of Science and Technology (MUST), working at the intersection of program analysis, compiler optimization, and AI system security. His research focuses on the safety and verifiability of AI-generated code, LLM and agent security, and security analysis and verification of software systems.
A first-author recipient of the ACM SIGPLAN PLDI Distinguished Paper Award, Dr. Ren has published at top-tier venues including PLDI and FSE, holds two FDCT grants as Principal Investigator, and serves as founding Vice Chair of the IEEE Computer Society Macau Chapter. His work is driven by a question that grows more urgent as AI systems begin to act: can we build verification infrastructure around AI — making its outputs checkable, its tool use auditable, and its failures detectable before they cause harm?