Email  /  CV  /  Github  /  LinkedIn  /  Twitter

Edward Beeching

I am a Machine Learning Research Scientist at Hugging Face. I work on problems involving Embodied Learning, by developing novel Deep Reinforcement Learning approaches and custom simulation environments. My PhD is in Deep Reinforcement Learning approaches to planning and navigation in robotics, which I studied at INSA Lyon, as part of the INRIA CHROMA team.

My work involves state of the art convolutional, recurrent and transformer based network architectures applied to typical RL optimization with on-policy (A2C, PPO, APPO) and off-policy (SAC, Q-learning, R2D2) algorithms. I also often consider auxillary optimization objectives to improve agent performance, such as classification, regression and semantic segmentation.

Publications

Godot Reinforcement Learning Agents
Edward Beeching, Jilles Dibangoye, Olivier Simonin, Christian Wolf. (2021)
AAAI-22 Workshop on Reinforcement Learning in Games
PDF | Webpage | Code

Graph augmented Deep Reinforcement Learning in the GameRLand3D environment
Edward Beeching, Maxim Peter, Philippe Marcotte, Jilles Debangoye, Olivier Simonin, Joshua Romoff, Christian Wolf. (2021)
AAAI-22 Workshop on Reinforcement Learning in Games
PDF | Webpage | Code

Learning to plan in uncertain topological maps
Edward Beeching, Jilles Dibangoye, Olivier Simonin, Christian Wolf. (2020)
ECCV 2020 (spotlight)
PDF | Webpage | Code

EgoMap: Projective mapping and structured egocentric memory for Deep RL
Edward Beeching, Jilles Dibangoye, Olivier Simonin, Christian Wolf. (2020)
ECML-PKDD 2020
PDF | Webpage | Code

Deep Reinforcement Learning on a Budget: 3D Control and Reasoning Without a Supercomputer
Edward Beeching, Jilles Dibangoye, Olivier Simonin, Christian Wolf. (2020)
ICPR 2020
PDF | Webpage | Code

Agent Examples

Agent behaviors learned with the Godot RL Agents Library

Image-goal based navigation in a 3D scan of our laboratory. The map is not provided to the agent.

Example of a Deep Reinforcement Learning Agent trained to collect a sequence of objects with the Advantage Actor Critic Algorithm. The map is not provided to the agent.