Allen Zhu

allenzhu@berkeley.edu

I am an applied scientist at Aurora, where I work on graph net models for motion simulation. Previously, I completed a Master’s in Machine Learning at CMU and worked as a research assistant.

Notes

Projects

Interpretable RL

Advisor: Abhinav Gupta

We demonstrate a reinforcement learning method that allows robots to learn human concepts without direct supervision, which enables new ways for people to train, debug, and interact with robots. Our key idea is to use a vision encoder that produces semantic segmentations, predicting a discrete category for each input pixel. From reward signal, the encoder learns categories of objects independent of visual appearance. In the foodbot environment shown, the agent learns to collect good objects (foods) and avoid bad objects (animals).

Our method makes it obvious whether mistakes are caused by errors in perception or errors in behavior. Further, it’s possible to teach robots at a conceptual level, e.g. “avoid the spikes”. Similarly, our approach makes it easier for robots to learn new objects, transfer to new environments, or learn new tasks.

More details: presentation.

Planning-based Estimators for Distance Learning

Advisor: Ruslan Salakhutdinov

Human behavior is organized hierarchically, breaking hard problems into subtasks. We don’t think about the thousands of movement primitives required to get a glass of water—we think about first walking over, and then reaching for the glass. Empirically, we find that hierarchical approaches to RL also scale better.

In this project, we studied distance learning, which learns distances d(s, s’), and behavior cloning, which learns goal-conditioned policies π(s; g). We modeled 50 points along a line and analyzed a dataset of random walks along that line. A naive function approximation would take pairs of states and predict the number of steps between the two states. We showed that the naive estimator only works for short distances, because the number of samples of a pair (s, s’) falls off exponentially with the distance between them. But applying a hierarchical approach, we can instead plan waypoints and sum the distances along shorter segments, which reduces variance and improves accuracy.

Environmental Justice

Advisor: Khalid Kadir

We analyzed California’s cap-and-trade program, a market-based mechanism for reducing air pollution. We found that low-income communities and communities of color are disproportionately exposed to hazardous air pollutants, which can increase the risk of lung cancer, stroke, etc. These findings led to more equitable environmental regulation. More details in the full report and paper.