I am an Assistant Professor in the Department of Computer Science at ETH Zurich. I am a member of the ETH Systems Group, where I lead the Efficient Architectures and Systems Lab (EASL).

I work on computer systems for large-scale applications such as cloud computing services, data analytics, and machine learning. The goal of my research is to improve the performance and resource efficiency of cloud computing while making it easier for users to deploy and manage their applications. My research interests span operating systems, computer architecture, and their intersection with machine learning.

Before joining ETH, I spent a year as a Research Scientist at Google Brain. I completed my Ph.D. in Electrical Engineering at Stanford University, advised by Professor Christos Kozyrakis. My dissertation was on the design and implementation of fast, elastic storage for cloud computing. My Ph.D. was generously supported by the Microsoft Research PhD Fellowship and Stanford Graduate Fellowship. I earned my M.S. in Electrical Engineering at Stanford University in 2015. I graduated from the Engineering Science program at the University of Toronto in 2013, where I earned my Bachelor of Applied Science and Engineering.

If you are interested in joining the EASL research group, please email me (aklimovic@ethz.ch) with your CV. See below for research focus areas.

Research Focus Areas

Cloud computing is undergoing a fundamental shift, stimulated by an exponential growth in data, users, and an increasing demand for cloud services that can automatically allocate and scale computing resources for jobs. An emerging wave of cloud computing, called serverless computing, enables users to focus on writing code for their applications while cloud providers manage resources based on application demands. On serverless computing platforms, users can simultaneously launch thousands of tiny, short-lived tasks and pay only for the resources their tasks actually consume per ~10ms time interval, as opposed to paying for pre-allocated virtual machines that have fixed ratios of compute, memory, and storage.

Research topics: What should an operating system for serverless computing look like? Scheduling millions of short-lived tasks to satisfy performance requirements and achieve high resource utilization poses interesting challenges. Serverless computing encourages a high degree of resource sharing across tenants, which poses performance and security isolation concerns. In addition, it is not yet clear what is the right abstraction for users to specify application performance requirements.

Machine learning (ML) jobs are an increasingly important class of applications in the cloud. Across domains such as image understanding and text translation, scaling machine learning models to a large number of parameters has been shown to dramatically improve accuracy when sufficiently large datasets are used. While significant work has focused on optimizing hardware and software for ML computations, data management is a common bottleneck. As organizations collect massive amounts of data, storing and ingesting data at this scale poses several challenges.

Research topics: How should we design distributed storage systems for machine learning to optimize end-to-end model training and inference? How can we avoid moving large amounts of data across the network — should we instead move computation closer to the data (near-storage computing)? How can multiple tenants safely share datasets and models with good performance guarantees?

Many of today’s computer systems use heuristics and hints to make decisions (e.g., to decide which resources to allocate for a task or which data to keep in a cache). As software applications and hardware platforms become more and more heterogeneous, designing heuristics is increasingly difficult. Yet due to growing heterogeneity, automating resource and data management is increasingly important. One promising approach is to learn resource management strategies by training machine learning models using system data collected while profiling or running applications.

Research topics: How can we leverage machine learning models to make systems-level decisions when such decisions often need to be made at microsecond timescales? How should we design APIs to make replacing or supplementing heuristics with machine learning model inference practical in computer systems?

Research Group

PhD students:

Masters students:

Bachelor students:

  • Dmitrii Ustiugov, now Assistant Professor at Nanyang Technological University, Singapore
Masters thesis students: Bachelor thesis students: Research assistants:


[VLDB] SHiFT: An Efficient, Flexible Search Engine for Transfer Learning
Cedric Renggli, Xiaozhe Yao, Luka Kolar, Luka Rimanic, Ana Klimovic, Ce Zhang.
Proceedings of the International Conference on Very Large Databases (VLDB), 2023.

[ATC] Cachew: Machine Learning Input Data Processing as a Service
Dan Graur, Damien Aymon, Dan Kluser, Tanguy Albrici, Chandramohan A. Thekkath, Ana Klimovic.
To appear in the Proceedings of the USENIX Annual Technical Conference (ATC), 2022.

[MLSys] Plumber: Diagnosing and Removing Performance Bottlenecks in Machine Learning Data Pipelines
Michael Kuchnik, Ana Klimovic, Jiri Simsa, George Amvrosiadis, Virgina Smith.
Proceedings of the Conference on Machine Learning and Systems (MLSys), 2022.

[VLDB] tf.data: A Machine Learning Data Processing Framework
Derek G. Murray, Jiri Simsa, Ana Klimovic, Ihor Indyk.
Proceedings of the International Conference on Very Large Databases (VLDB), August 2021. Presentation Video.

[VLDB] Modularis: Modular Relational Analytics over Heterogeneous Distributed Platforms
Dimitris Koutsoukos, Ingo Müller, Renato Marroquín, Ana Klimovic, Gustavo Alonso.
Proceedings of the International Conference on Very Large Databases (VLDB), 2021.

[ATC] SONIC: Application-aware Data Passing for Chained Serverless Applications
Ashraf Mahgoub, Karthick Shankar, Subrata Mitra, Ana Klimovic, Somali Chaterji, Saurabh Bagchi.
Proceedings of the USENIX Annual Technical Conference (ATC), July 2021.

[SIGMOD] Towards Demystifying Serverless Machine Learning Training
Jiawei Jiang*, Shaoduo Gan*, Yue Liu, Gustavo Alonso, Ana Klimovic, Ankit Singla, Wentao Wu, Ce Zhang.
Proceedings of ACM SIGMOD/PODS International Conference on Management of Data (SIGMOD), June 2021.

[TOS] RAIL: Predictable, Low Tail Latency for NVMe Flash
Heiner Litz, Javier Gonzalez, Ana Klimovic, Christos Kozyrakis.
ACM Transactions on Storage (TOS), Volume 1, Issue 1, January 2021.

[ATC] OPTIMUSCLOUD: Heterogeneous Configuration Optimization for Distributed Databases in the Cloud
Ashraf Mahgoub, Alexander Michaelson Medoff, Rakesh Kumar, Subrata Mitra, Ana Klimovic, Somali Chaterji, Saurabh Bagchi.
Proceedings of the USENIX Annual Technical Conference (ATC), July 2020.

[SPMA] Serverless Clusters: The Missing Piece for Interactive Batch Applications?
Ingo Müller, Rodrigo Bruno, Ana Klimovic, John Wilkes, Eric Sedlar, Gustavo Alonso.
Workshop on Systems for Post-Moore Architectures (SPMA), April 2020.

[ATC] Unification of Temporary Storage in the NodeKernel Architecture
Patrick Stuedi, Animesh Trivedi, Jonas Pfefferle, Ana Klimovic, Adrian Schuepbach, Bernard Metzler.
Proceedings of the USENIX Annual Technical Conference (ATC), Renton, WA, July 2019.

[Thesis] Fast, Elastic Storage for the Cloud
Ana Klimovic. Doctoral Dissertation, Stanford University, June 2019.
I presented my thesis work at several seminars, including the WICARCH seminar (presentation video).

[OSDI] Pocket: Elastic Ephemeral Storage for Serverless Analytics
Ana Klimovic, Yawen Wang, Christos Kozyrakis, Patrick Stuedi, Animesh Trivedi, Jonas Pfefferle
Proceedings of USENIX Operating Systems Design and Implementation (OSDI), Carlsbad, CA, October 2018.

[ATC] Understanding Ephemeral Storage for Serverless Analytics
Ana Klimovic, Yawen Wang, Christos Kozyrakis, Patrick Stuedi, Jonas Pfefferle, Animesh Trivedi
Proceedings of the USENIX Annual Technical Conference (ATC), Boston, MA, July 2018.

[ATC] Selecta: Heterogeneous Cloud Storage Configuration for Data Analytics
Ana Klimovic, Heiner Litz, Christos Kozyrakis
Proceedings of the USENIX Annual Technical Conference (ATC), Boston, MA, July 2018.

[MLSys] Learning Heterogeneous Cloud Storage Configuration for Data Analytics
Ana Klimovic, Heiner Litz, Christos Kozyrakis
Non-archival proceedings of the inaugural Systems and Machine Learning conference (MLSys), Stanford, CA, February 2018.

[HotStorage] Understanding Rack-Scale Disaggregated Storage
Sergey Legtchenko, Hugh Williams, Kaveh Razavi, Austin Donnelly, Richard Black, Andrew Douglas, Nathanael Cheriere, Daniel Fryer, Kai Mast, Angela Demke Brown, Ana Klimovic, Andy Slowey, Antony Rowstron
Proceedings of the USENIX Hot Topics in Storage and File Systems (HotStorage), Santa Clara, CA, July 2017.

[ASPLOS] ReFlex: Remote Flash ≈ Local Flash
Ana Klimovic, Heiner Litz, Christos Kozyrakis
Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Xi'an, China, April 2017. Memorable Paper Award (awarded at NVMW'18).

[TOCS] The IX Operating System: Combining Low Latency, High Throughput, and Efficiency in a Protected Dataplane
Adam Belay, George Prekas, Mia Primorac, Ana Klimovic, Samuel Grossman, Christos Kozyrakis, Edouard Bugnion
ACM Transactions on Computer Systems, Volume 34, Issue 4, January 2017.

[EuroSys] Flash Storage Disaggregation
Ana Klimovic, Christos Kozyrakis, Eno Thereska, Binu John, Sanjeev Kumar
Proceedings of the 11th European Conference on Computer Systems (EuroSys), London, UK, April 2016.

[OSDI] IX: A Protected Dataplane Operating System for High Throughput and Low Latency
Adam Belay, George Prekas, Ana Klimovic, Samuel Grossman, Christos Kozyrakis, Edouard Bugnion
Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Broomfield, CO, October 2014.
Best Paper Award.

[FPT] Bitwidth-optimized Hardware Accelerators with Software Fallback
Ana Klimovic and Jason H. Anderson
IEEE International Conference on Field-Programmable Technology (FPT), pp. 136-143, Kyoto, Japan, December 2013.


Slides from selected talks:

Open Source Software

  • Cachew: a system that enables efficient distributed input data processing for ML training jobs (builds on tf.data)
  • Pocket: a distributed, elastic data store for ephemeral data, designed for serverless computing applications
  • ReFlex: a software system that enables fast, predictable access to remote Flash storage


ETH Zurich:Stanford University:


As a professor:As a student:Awards that my students have received:


Program Committees:
  • HotOS'23
  • EuroSys'23
  • OSDI'22
  • NSDI'22
  • SOSP'21
  • ASPLOS'21
  • HotOS'21
  • EuroDW'21
  • OSDI'20
  • VEE'20
Conference Organizing Committees:
  • ASPLOS'22 Publication Co-Chair
  • EuroSys'22 Needham Dissertation Award Committee
  • SOSP'21 Poster Co-Chair
  • SoCC'20 Sponsorship Chair
  • SOSP'19 Student Research Competition Judge

About Me

Outside of research, I enjoy:

  • Sports: tennis, volleyball, skiing, swimming, squash, ...
  • Travel: exploring new places and cultures
  • Music: piano, guitar, singing, violin
  • Art: painting, sketching