Karim Abdel Sadek Berlin :)

Karim Abdel Sadek

PhD Student in Computer Science at UC Berkeley

About Me

Hi, I'm Karim! I am a first year CS PhD student at UC Berkeley. I am lucky to be advised by Stuart Russell and to be part of CHAI and BAIR. My work is partially supported by the Cooperative AI PhD Fellowship.

Before starting my PhD, I spent time at the Center for Human-Compatible AI and at the Krueger AI Safety Lab, University of Cambridge, where I was fortunate to be supervised by Michael Dennis, Micah Carroll and David Krueger.

I completed my MSc in AI at the University of Amsterdam in 2025. Before my MSc, I graduated with a BSc in Mathematics and Computer Science at Bocconi University. I was fortunate to be advised by Marek Eliáš, working on online learning. During my BSc, I spent the Spring '23 semester at Georgia Tech, supported with a full-ride scholarship.

Please reach out if you are interested in my research, you want to have a quick chat, or anything else! You can e-mail me at karimabdel at berkeley dot edu.

Research

I am broadly interested in Reinforcement Learning, Cooperative AI, and AI Safety. My research style is a mix of both theory and practice, with the goal of designing methods which are both well-founded and that have the potential to scale empirically. Recently, I have been mostly excited about designing better ways to do inverse RL and preference learning, designing protocols with the right incentives in human-AI collaboration, and in more foundational topics in RL theory and game theory.

News

Selected Publications

Goal Misgeneralization

Mitigating Goal Misgeneralization via Minimax Regret

Karim Abdel Sadek*, Matthew Farrugia-Roberts*, Usman Anwar, Hannah Erlebach, Christian Schroeder de Witt, David Krueger, Michael Dennis

RLC 2025

TL;DR: We use minimax regret to train RL agents that are robust to goal misgeneralization.

Safe generalization in reinforcement learning requires not only that a learned policy acts capably in new situations, but also that it uses its capabilities towards the pursuit of the designer's intended goal. The latter requirement may fail when a proxy goal incentivizes similar behavior to the intended goal within the training environment, but not in novel deployment environments. This creates the risk that policies will behave as if in pursuit of the proxy goal, rather than the intended goal, in deployment—a phenomenon known as goal misgeneralization. In this paper, we formalize this problem setting in order to theoretically study the possibility of goal misgeneralization under different training objectives. We show that goal misgeneralization is possible under approximate optimization of the maximum expected value (MEV) objective, but not the minimax expected regret (MMER) objective. We then empirically show that the standard MEV-based training method of domain randomization exhibits goal misgeneralization in procedurally-generated grid-world environments, whereas current regret-based unsupervised environment design (UED) methods are more robust to goal misgeneralization (though they don't find MMER policies in all cases). Our findings suggest that minimax expected regret is a promising approach to mitigating goal misgeneralization.
Caching and MTS

Algorithms for Caching and MTS with reduced number of predictions

Karim Abdel Sadek, Marek Eliáš

ICLR 2024

TL;DR: We design online algorithms that achieve optimal consistency-robustness tradeoffs using only a reduced number of predictions.

ML-augmented algorithms utilize predictions to achieve performance beyond their worst-case bounds. Producing these predictions might be a costly operation – this motivated Im et al. (2022) to introduce the study of algorithms which use predictions parsimoniously. We design parsimonious algorithms for caching and MTS with action predictions, proposed by Antoniadis et al. (2023), focusing on the parameters of consistency (performance with perfect predictions) and smoothness (dependence of their performance on the prediction error). Our algorithm for caching is 1-consistent, robust, and its smoothness deteriorates with the decreasing number of available predictions. We propose an algorithm for general MTS whose consistency and smoothness both scale linearly with the decreasing number of predictions. Without the restriction on the number of available predictions, both algorithms match the earlier guarantees achieved by Antoniadis et al. (2023).