My research ambition is to build autonomous agents that can solve a wide variety of complex tasks and continuously learn new ones. I believe this calls for decision-making systems that can continually learn state and action abstractions and use them to quickly generalize to new tasks. Toward this goal, I work on various aspects of reinforcement learning. I am fortunate to be advised by professor Sergey Levine at UC Berkeley. During my undergrad I was advised by professors George Konidaris and Michael Littman at Brown. Please check out my selected work below.
CV | Google Scholar | Github
Publications
Conferences
|
Autonomous Improvement of Instruction Following Skills via Foundation Models
robotics
autonomous improvement
language-conditioned skills
VLM
Zhiyuan Zhou*,
Pranav Atreya*,
Abraham Lee,
Homer Walke,
Oier Mees,
Sergey Levine,
CoRL, 2024. [website] [arXiv] [code] [dataset]
Can robots self-improve by collecting data autonomously🤖? We introduce SOAR, a system for large-scale autonomous data collection 🚀 and autonomous improvement📈of a multi-task language-conditioned policy in diverse scenes without human interventions . |
|
Tiered Reward: Designing Rewards for Specification and Fast Learning of Desired Behavior
behavior specification
reward design
Pareto optimality
fast learning
Zhiyuan Zhou,
Shreyas Sundara Raman,
Henry Sowerby,
Michael Littman
Reinforcement Learning Conference (RLC), 2024. [paper] [website] [code] [thread]
Do you need a reward function for your goal-reaching task? Use Tiered Reward! We prove that Tiered Reward guarantees to lead to an optimal policy, and show that it can lead to fast learning in various deep and tabular environments. |
|
Characterizing the Action-Generalization Gap in Deep Q-Learning
action generalization
DQN
Zhiyuan Zhou,
Cameron Allen,
Kavosh Asadi,
George Konidaris
Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM), 2022. [arXiv] [poster] [code] We introduce a way to evaluate action-generalization in Deep Q-Learning using an oracle (expert knowledge of action similarity), and shows that DQN's ability to generalize over actions depends on the size of the action space. |
|
Designing Rewards for Fast Learning
reward design
Interactive RL
Henry Sowerby,
Zhiyuan Zhou,
Michael Littman
Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM), 2022. (Oral) [arXiv] [poster] [oral at RLDM at 1:20:00] What kind of reward functions make RL fast? We advocate for rewards with big action gaps and small "subjective discounts". We present an algorithm to design these rewards. |
School Journal
|
Policy Transfer in Lifelong Reinforcement Learning through Learning Generalizing Features
lifelong RL
transfer learning
attention
Zhiyuan Zhou (Advisor: George Konidaris)
Undergraduate Honors Thesis, Brown CS, 2023. [pdf] [code]
Introduces an approach to learn state features that generalize across tasks drawn from the same distribution. We use an attantion mechanism to learn an ensemble of minimally overlapping state features, leading to an ensemble of policies. We then use a bandit algorithm to learn to identify the generalizing feature in the ensemble and capitalize on that to learn a transferable policy. |
|
Improving Post-Processing on Video Object Recognition Using Inertial Measurement Unit
object recognition
Hidden Markov Models
Kalman Filter
Inertial Measurement Unit
Zhiyuan Zhou,
Spencer Boyum,
Michael Paradiso
Brown Undergraduate Research Journal, Spring 2022. [paper on page 29] [code] How to improve the accuracy of object recognition in videos if given per-frame inertial measurements of the camera. We propose two way to do so. |