Hello, I am

Michał Zawalski

RL Researcher
Read my CV
My selfie

Who am I?

Hey there! I'm currently pursuing my PhD at the University of Warsaw, and I'm absolutely in love with Mathematics and Computer Science - especially Reinforcement Learning. If you're curious about what I've been working on, or just want to learn a little more about my research, experience, and interests, you've come to the right place!

My Research

My primary research interests revolve around planning algorithms, variations of DQN, and environments with discrete observations. I am particularly drawn to simple yet effective strategies that offer significant advantages, such as Hindsight Experience Replay or Scaling by Resetting. In contrast, I'm not overfond of bootstrapping, despite it is often a reasonable approach.

In my view, truly challenging problems require agents to rely on planning to select actions, and I find that this process is much more efficient when agents use subgoals instead of low-level actions.

I have also developed a keen interest in Natural Language Processing, which I have been studying since before it became a mainstream topic. I have previously worked as an NLP researcher in several companies. I remain fascinated by the prospect of learning a Q-function that is text-based, with prompt-based Bellman updates.

Below, you will find a list of my recent publications.


Embodied Chain-of-Thought thumbnail
Robotic Control via Embodied Chain-of-Thought Reasoning

Michał Zawalski, William Chen, Karl Pertsch, Oier Mees, Chelsea Finn, Sergey Levine
CoRL 2024

We train a vision-language-action policy to autoregressively generate textual reasoning (the "embodied chain of thought," or ECoT) in response to commands and observations before it chooses a robot action. Such reasoning contains high-level reasoning steps (task, plan, and subtask) to encourage the model to "think carefully" and low-level features (movement, gripper position, and labeled object bounding boxes) to get the model to "look carefully".

Read Paper Project website
Adaptive Subgoal Search thumbnail
Fast and Precise: Adjusting Planning Horizon with Adaptive Subgoal Search

Michał Zawalski, Michał Tyrolski, Konrad Czechowski, Damian Stachura, Piotr Piękos, Tomasz Odrzygóźdź, Yuhuai Wu, Łukasz Kuciński, Piotr Miłoś
ICLR 2023, notable-top-5%

We propose a novel subgoal-based planning algorithm Adaptive Subgoal Search. It adaptively chooses from subgoals with different horizons. Our method benefits both from the efficiency of planning with longer-term subgoals and from the reliability of shorter-term ones. We show that this approach not only works well in hard problems, but also generalizes to out-of-distribution instances.

Read Paper Project website
Subgoal Search thumbnail
Subgoal Search For Complex Reasoning Tasks

Konrad Czechowski, Tomasz Odrzygóźdź, Marek Zbysiński, Michał Zawalski, Krzysztof Olejnik, Yuhuai Wu, Łukasz Kuciński, Piotr Miłoś
NeurIPS 2021

We propose an algorithm for efficient planning in complex tasks. Instead of searching the space by taking atomic actions, we propose to use high-level subgoals for a faster and deeper search. Our method shows strong results in complex reasoning environments: Sokoban, the Rubik's Cube, and INT (proving inequalities).

Read Paper
MA-Trace thumbnail
Off-Policy Correction for Multi-Agent Reinforcement Learning

Michał Zawalski, Błażej Osiński, Henryk Michalewski, Piotr Miłoś
AAMAS 2022

We propose a simple yet effective algorithm for multi-agent reinforcement learning, based on V-Trace. Despite its on-policy nature, the computations can be distributed to many workers with nearly perfect speedup and a negligible impact on the quality of training.

Read Paper Extended abstract Project website

Conference Awards

Notable-Top-5%
Paper

Adaptive Subgoal Search, ICLR 2023

Best Poster Award

MA-Trace, AAMAS 2022

Top Reviewer

NeurIPS 2023

My Experience

Work

2022, June - September

G-Research, Research Internship

I verified the performance of speech recognition models on financial audio data. I built a pipeline for evaluating and fine-tuning transformer-based models for speech recognition and audio classification.


2019, July - September

Google Zurich, Software Developer Internship

During this internship, I worked in the Shopping Ads Data Quality team. I was responsible for estimating the impact of new enforcement that filters images violating the standards.


2017, July - October

Samsung, Research Internship

I worked in the Advanced Natural Language Processing team. The main area of my research was efficient handwriting recognition.

Education

2020 - Present

PhD in Artificial Intelligence
University of Warsaw


2018 - 2020

MS in Computer Science
University of Warsaw


2015 - 2018

BS in Computer Science, BS in Mathematics
University of Warsaw


2012 - 2015

High School Degree
XIV LO im. S. Staszica

Teaching

Oct 2022 - Feb 2023

Algorithms and data structures classes,
University of Warsaw


Oct 2021 - Feb 2022

Algorithms and data structures classes,
University of Warsaw


Mar 2021 - Jun 2021

Individual programming project classes,
University of Warsaw


August 2016

Preparing algorithmic contests and workshops on introduction to algorithms,
XIV LO im. S. Staszica


Sep 2015 - Jun 2016

Extracurricular algorithmic classes,
XIV LO im. S. Staszica


August 2015

Preparing algorithmic contests,
XIV LO im. S. Staszica


Sep 2014 - Jun 2015

Extracurricular algorithmic classes,
XIV LO im. S. Staszica

Read my complete CV

Download

Competition Awards

Google Hashcode, 2020

Finalist, 17th place (team of 4)

XII Microsoft Bubble Cup, 2019

Finalist, 13th place (team of 3)

XI Microsoft Bubble Cup, 2018

Finalist, 2nd place (team of 3)

Potyczki Algorytmiczne, 2018

Finalist, 13th place

Central Europe Regional Contest, 2017

8th place (team of 3)

Google Hashcode, 2017

Finalist, 22nd place (team of 4)

Polish Academic Championships in Team Programming, 2017

10th place (team of 3)

Potyczki Algorytmiczne, 2017

Finalist, 15th place

Polish Academic Championships in Team Programming, 2016

12th place (team of 3)

Polish Academic Championships in Team Programming, 2015

20th place (team of 3)

XXII Polish Olympiad in Informatics, 2015

Laureate, 6th place

LXV Polish Mathematical Olympiad, 2015

Laureate, 12th place

XXI Polish Olympiad in Informatics, 2014

Laureate, 19th place

LXV Polish Mathematical Olympiad, 2014

Laureate, 25th place

My Interests

Sport
SPORT

In particular: Football, Snooker, and American football

CTF
Capture The Flag

Exploit and enjoy!

Drawing
DRAWING

It takes some time until I'm satisfied with the result...

Hiking
HIKING

My favourite outdoor activity.

The Lord of the Rings
THE LORD OF THE RINGS

Best movie ever, I know it by heart. Literally, check me!

Competitive programming
COMPETITIVE PROGRAMMING

I enjoy taking part in algorithmic contests, optimisation challenges and bot programming tournaments.

XKCD
XKCD

My favourites include: Coordinate precision, RTL, Faust 2.0, Unpickable, Sky color, Meta-analysis, Location reviews, and others.

Board games
BOARD GAMES

Perfect entertainment, I often play with my friends.

3D modeling
3D MODELING

I use Cinema 4D for modeling.

Recreational mathematics
RECREATIONAL MATHEMATICS

Puzzles are fun!

Card tricks
CARD TRICKS

Nice brain teasers. Pay attention to what you believe!