Speakers
Azalia Mirhoseini
Keynote Speaker
Self-improving AI and the Future of Computing Systems
Azalia Mirhoseini is an Assistant Professor of Computer Science and founder of Scaling Intelligence Lab at Stanford University. Her lab develops scalable and self-improving AI systems and methodologies towards the goal of advancing artificial general intelligence. She has spent several years in industry AI labs, including Google Brain, Anthropic, and Google DeepMind. Her past work includes Mixture-of-Experts (MoE) neural architectures, now commonly used in leading generative AI models; AlphaChip, a pioneering work on deep reinforcement learning for layout optimization used in the design of advanced chips like AI accelerators (TPUs) and data center CPUs; and research on inference-time scaling laws. Her research has been recognized through the MIT Technology Review 35 Under 35 Award, Okawa Foundation Research Award, Best ECE Thesis Award at Rice University, publications in flagship venues such as Nature, and coverage by various media outlets, including MIT Technology Review, IEEE Spectrum, The Verge, The Times, ZDNet, VentureBeat, and WIRED.
Ion Stoica
Keynote
How AI is Disrupting Systems Research
Ion Stoica is a Professor in the EECS Department and holds the Xu Bao Chancellor Chair at the University of California, Berkeley. He is the Director of the Sky Computing Lab and the Executive Chairman of Databricks and Anyscale. His current research focuses on AI systems and cloud computing, and his work includes numerous open-source projects such as vLLM, SGLang, Chatbot Arena, SkyPilot, Ray, and Apache Spark. He is a Member of the National Academy of Engineering, an Honorary Member of the Romanian Academy, and an ACM Fellow. He has also co-founded several companies, including LMArena (2025), Anyscale (2019), Databricks (2013), and Conviva (2006).
Hanson Wang
Invited Talk
Coding Agents at Scale with OpenAI Codex
Hanson Wang is a research engineer at OpenAI, where he focuses on the Codex models integrated into ChatGPT. With Codex, users can delegate coding tasks to parallel agents working autonomously in the cloud to analyze the codebase and generate pull requests. Hanson worked on training the first codex-1 model launched in May and has been continuously iterating on the model since then. Prior to joining OpenAI, he co-founded a startup building AI analyst agents, and worked on ML infrastructure at Meta. Hanson holds a degree in Computer Science from the University of Waterloo.
Vinod Grover
Invited Talk
The Essence of CUDA and AI for GPUs
Vinod Grover is a Sr. Distinguished Engineer at NVIDIA, where he has worked since 2007. He led the team that created the CUDA C++ language and compiler, helping make GPU computing faster and easier across many fields. Since 2017, he has applied language and compiler ideas to accelerate deep-learning models, leading a small group focused on performance and developer productivity. He also continues to advance GPU architectures and the CUDA programming model. Previously, he held engineering, research, and management roles at Sun Microsystems and Microsoft. He holds a bachelor’s in physics from IIT Delhi and a master’s in computer science from Syracuse University.
Neeraja Yadwakar
Invited Talk
TBD
Neeraja J. Yadwadkar is an assistant professor in the department of ECE at UT Austin. She is a Cloud Computing Systems researcher, with a strong background in Machine Learning (ML). Her works straddle the boundaries of Systems and ML. Specifically, advances in systems, machine learning, and hardware architectures are about to launch a new era in which we can use the entire cloud as a computer. On the other hand, new ML techniques are being developed for solving complex resource management problems in systems. Similarly, systems research is getting influenced by properties of emerging ML algorithms, and evolving hardware architectures. Bridging these complementary fields, her research focuses on using and developing ML techniques for systems, and building systems for ML.
Rahul Arya
Invited Talk
Advances in LLM Serving Efficiency at Scale
Rahul Arya is a research engineer at Google DeepMind contributing to the training and inference performance of Gemini models. He previously worked on the XLA:TPU compiler.