Hey there! I'm currently pursuing my PhD at the University of Warsaw, and I'm absolutely in love with Mathematics and Computer Science - especially Reinforcement Learning. If you're curious about what I've been working on, or just want to learn a little more about my research, experience, and interests, you've come to the right place!
My primary research interests revolve around planning algorithms, variations of DQN, and environments with discrete observations. I am particularly drawn to simple yet effective strategies that offer significant advantages, such as Hindsight Experience Replay or Scaling by Resetting. In contrast, I'm not overfond of bootstrapping, despite it is often a reasonable approach.
In my view, truly challenging problems require agents to rely on planning to select actions, and I find that this process is much more efficient when agents use subgoals instead of low-level actions.
I have also developed a keen interest in Natural Language Processing, which I have been studying since before it became a mainstream topic. I have previously worked as an NLP researcher in several companies. I remain fascinated by the prospect of learning a Q-function that is text-based, with prompt-based Bellman updates.
Below, you will find a list of my recent publications.
Michał Zawalski, William Chen, Karl Pertsch, Oier Mees, Chelsea Finn, Sergey Levine
CoRL 2024
We train a vision-language-action policy to autoregressively generate textual reasoning (the "embodied chain of thought," or ECoT) in response to commands and observations before it chooses a robot action. Such reasoning contains high-level reasoning steps (task, plan, and subtask) to encourage the model to "think carefully" and low-level features (movement, gripper position, and labeled object bounding boxes) to get the model to "look carefully".
Read Paper Project website
Michał Zawalski, Michał Tyrolski, Konrad Czechowski, Damian Stachura, Piotr Piękos, Tomasz Odrzygóźdź, Yuhuai Wu, Łukasz Kuciński, Piotr Miłoś
ICLR 2023, notable-top-5%
We propose a novel subgoal-based planning algorithm Adaptive Subgoal Search. It adaptively chooses from subgoals with different horizons. Our method benefits both from the efficiency of planning with longer-term subgoals and from the reliability of shorter-term ones. We show that this approach not only works well in hard problems, but also generalizes to out-of-distribution instances.
Read Paper Project website
Konrad Czechowski, Tomasz Odrzygóźdź, Marek Zbysiński, Michał Zawalski, Krzysztof Olejnik, Yuhuai Wu, Łukasz Kuciński, Piotr Miłoś
NeurIPS 2021
We propose an algorithm for efficient planning in complex tasks. Instead of searching the space by taking atomic actions, we propose to use high-level subgoals for a faster and deeper search. Our method shows strong results in complex reasoning environments: Sokoban, the Rubik's Cube, and INT (proving inequalities).
Read Paper
Michał Zawalski, Błażej Osiński, Henryk Michalewski, Piotr Miłoś
AAMAS 2022
We propose a simple yet effective algorithm for multi-agent reinforcement learning, based on
Adaptive Subgoal Search, ICLR 2023
MA-Trace, AAMAS 2022
NeurIPS 2023
G-Research, Research Internship
I verified the performance of speech recognition models on financial audio data. I built a pipeline for evaluating and fine-tuning transformer-based models for speech recognition and audio classification.
Google Zurich, Software Developer Internship
During this internship, I worked in the Shopping Ads Data Quality team. I was responsible for estimating the impact of new enforcement that filters images violating the standards.
Samsung, Research Internship
I worked in the Advanced Natural Language Processing team. The main area of my research was efficient handwriting recognition.
PhD in Artificial Intelligence
University of Warsaw
MS in Computer Science
University of Warsaw
BS in Computer Science, BS in Mathematics
University of Warsaw
High School Degree
XIV LO im. S. Staszica
Algorithms and data structures classes,
University of Warsaw
Algorithms and data structures classes,
University of Warsaw
Individual programming project classes,
University of Warsaw
Preparing algorithmic contests and workshops on introduction to algorithms,
XIV LO im. S. Staszica
Extracurricular algorithmic classes,
XIV LO im. S. Staszica
Preparing algorithmic contests,
XIV LO im. S. Staszica
Extracurricular algorithmic classes,
XIV LO im. S. Staszica
Finalist, 17th place (team of 4)
Finalist, 13th place (team of 3)
Finalist, 2nd place (team of 3)
Finalist, 13th place
8th place (team of 3)
Finalist, 22nd place (team of 4)
10th place (team of 3)
Finalist, 15th place
12th place (team of 3)
20th place (team of 3)
Laureate, 6th place
Laureate, 12th place
Laureate, 19th place
Laureate, 25th place
I enjoy taking part in algorithmic contests, optimisation challenges and bot programming tournaments.
My favourites include: Coordinate precision, RTL, Faust 2.0, Unpickable, Sky color, Meta-analysis, Location reviews, and others.
I use Cinema 4D for modeling.
Puzzles are fun!