Workshop on ML for Systems at NeurIPS 2025, December 6, San Diego Convention Center, Upper Level Room 5AB
Workshop on ML for Systems at NeurIPS '25, Dec 6, Upper Level Room 5AB

Speakers

Azalia Mirhoseini

Keynote Speaker

Self-improving AI and the Future of Computing Systems

Azalia Mirhoseini is an Assistant Professor of Computer Science and founder of Scaling Intelligence Lab at Stanford University. Her lab develops scalable and self-improving AI systems and methodologies towards the goal of advancing artificial general intelligence. She has spent several years in industry AI labs, including Google Brain, Anthropic, and Google DeepMind. Her past work includes Mixture-of-Experts (MoE) neural architectures, now commonly used in leading generative AI models; AlphaChip, a pioneering work on deep reinforcement learning for layout optimization used in the design of advanced chips like AI accelerators (TPUs) and data center CPUs; and research on inference-time scaling laws. Her research has been recognized through the MIT Technology Review 35 Under 35 Award, Okawa Foundation Research Award, Best ECE Thesis Award at Rice University, publications in flagship venues such as Nature, and coverage by various media outlets, including MIT Technology Review, IEEE Spectrum, The Verge, The Times, ZDNet, VentureBeat, and WIRED.

Ion Stoica

Keynote

How AI is Disrupting Systems Research

Ion Stoica is a Professor in the EECS Department and holds the Xu Bao Chancellor Chair at the University of California, Berkeley. He is the Director of the Sky Computing Lab and the Executive Chairman of Databricks and Anyscale. His current research focuses on AI systems and cloud computing, and his work includes numerous open-source projects such as vLLM, SGLang, Chatbot Arena, SkyPilot, Ray, and Apache Spark. He is a Member of the National Academy of Engineering, an Honorary Member of the Romanian Academy, and an ACM Fellow. He has also co-founded several companies, including LMArena (2025), Anyscale (2019), Databricks (2013), and Conviva (2006).

Hanson Wang

Invited Talk

Coding Agents at Scale with OpenAI Codex

Hanson Wang is a research engineer at OpenAI, where he focuses on the Codex models integrated into ChatGPT. With Codex, users can delegate coding tasks to parallel agents working autonomously in the cloud to analyze the codebase and generate pull requests. Hanson worked on training the first codex-1 model launched in May and has been continuously iterating on the model since then. Prior to joining OpenAI, he co-founded a startup building AI analyst agents, and worked on ML infrastructure at Meta. Hanson holds a degree in Computer Science from the University of Waterloo.

Vinod Grover

Invited Talk

The Essence of CUDA and AI for GPUs

Vinod Grover is a Sr. Distinguished Engineer at NVIDIA, where he has worked since 2007. He led the team that created the CUDA C++ language and compiler, helping make GPU computing faster and easier across many fields. Since 2017, he has applied language and compiler ideas to accelerate deep-learning models, leading a small group focused on performance and developer productivity. He also continues to advance GPU architectures and the CUDA programming model. Previously, he held engineering, research, and management roles at Sun Microsystems and Microsoft. He holds a bachelor’s in physics from IIT Delhi and a master’s in computer science from Syracuse University.

Neeraja Yadwakar

Invited Talk

TBD

Neeraja J. Yadwadkar is an assistant professor in the department of ECE at UT Austin. She is a Cloud Computing Systems researcher, with a strong background in Machine Learning (ML). Her works straddle the boundaries of Systems and ML. Specifically, advances in systems, machine learning, and hardware architectures are about to launch a new era in which we can use the entire cloud as a computer. On the other hand, new ML techniques are being developed for solving complex resource management problems in systems. Similarly, systems research is getting influenced by properties of emerging ML algorithms, and evolving hardware architectures. Bridging these complementary fields, her research focuses on using and developing ML techniques for systems, and building systems for ML.

Rahul Arya

Invited Talk

Advances in LLM Serving Efficiency at Scale

Rahul Arya is a research engineer at Google DeepMind contributing to the training and inference performance of Gemini models. He previously worked on the XLA:TPU compiler.

What To Expect

The ML for Systems workshop presents cutting-edge work on ML in computer systems and aims to develop a unified methodology for the field.

Machine Learning (ML) for Systems describes the application of machine learning techniques to problems related to computer systems. By leveraging supervised learning and reinforcement learning (RL) approaches, machine learning can replace longstanding heuristics that currently drive many of these systems. This includes a wide range of topics, including multi-objective tasks such as designing new data structures 1, integrated circuits 2, 3, or design verification 20, 21, as well as implementing control algorithms for applications such as compilers 12, 13, 19, databases 8, memory management 9, 10, or ML frameworks 11. While the systems community increasingly recognizes the importance of ML in solving a variety of different systems problems 23, ML for Systems remains an emerging area without widely established best practices, methods and strategies for the application of state-of-the-art machine learning techniques 22. The goal of this workshop is to provide an interdisciplinary venue for ML and Systems experts to push this boundary and start new directions within the ML for Systems area.

Workshop Direction

In previous 6 editions, we showcased specific approaches and frameworks to solve problems, bringing together researchers and practitioners at NeurIPS from both the ML and systems communities. While breaking new grounds, we encouraged collaborations and development in a broad range of ML for Systems works, many later published in top-tier conferences 11, 13, 14, 15, 16, 17, 18. This year, we plan to continue this path while encouraging work in key emerging areas such as Large Language Model (LLM) training and serving, and unifying benchmarks on key problems such as scheduling and compiling through a competition.

Recently, the rise of Large Language Models (LLMs) has presented new opportunities and challenges within the domain of computer systems. Our community is well-positioned to produce science and stimulate discussion for adapting to the new paradigm, especially how LLMs can be used to solve systems problems, and using ML to address systems issues that emerge from LLM training and serving. Additionally, as the field matures, we emphasize on keeping the research open, and the science reproducible. To that end, we are supplementing our main program with a competition track to crystallize the field’s progress.

Workshop Goals

NeurIPS provides a unique opportunity to bring together systems researchers and researchers from other sub-areas of ML who had not previously considered applying their techniques in a computer systems context. We see the goal of our workshop as solving the following two objectives:

  • Opening up connections between research areas that were not previously considered, connecting the ML and Systems communities, growing the scope of ML for Systems work and unlocking new research opportunities.
  • Developing best practices, methodologies and benchmarks for the ML for Systems field.

To build commonalities on the topic of LLMs interacting with computational systems, we specifically include seminal talks on emerging trends on training and serving LLMs from seasoned researchers and practitioners as a part of our invited speakers. Our call for papers also includes topics at the intersection of Systems and LLMs.

Our program will include a variety of speakers and poster sessions from selected papers. We invite researchers to submit relevant papers through our call for papers.

Organizing Committee

Contact Us

Contact us at mlforsystems@googlegroups.com.