EgoMap: Projective mapping and structured egocentric memory for Deep RL

Edward Beeching
Jilles Dibangoye
Olivier Simonin
Christian Wolf
CITI, INRIA CHROMA, INSA Lyon
CITI, INRIA CHROMA, INSA Lyon
CITI, INRIA CHROMA, INSA Lyon
LIRIS, CNRS, INSA Lyon

Published at ECML PKDD, 2020

[Paper]
[Code]



Abstract

Tasks involving localization, memorization and planning in partially observable 3D environments are an ongoing challenge in Deep Reinforcement Learning. We present EgoMap, a spatially structured neural memory architecture. EgoMap augments a deep reinforcement learning agent’s performance in 3D environments on challenging tasks with multi-step objectives. The EgoMap architecture incorporates several inductive biases including a differentiable inverse projection of CNN feature vectors onto a top-down spatially structured map. The map is updated with ego-motion measurements through a differentiable affine transform. We show this architecture outperforms both standard recurrent agents and state of the art agents with structured memory. We demonstrate that incorporating these inductive biases into an agent’s architecture allows for stable training with reward alone, circumventing the expense of acquiring and labelling expert trajectories. A detailed ablation study demonstrates the impact of key aspects of the architecture and through extensive qualitative analysis, we show how the agent exploits its structured internal memory to achieve higher performance.


Paper and Bibtex

[Paper]

Citation
 
Beeching, E., Dibangoye, J., Simonin, O., and Wolf, C., 2020. EgoMap: Projective mapping and structured egocentric memory for Deep RL. In proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases.

[Bibtex]
@inproceedings{beeching2020egomap,
  title={EgoMap: Projective mapping and structured egocentric memory for
    Deep RL},
  author={Beeching, Edward and Dibangoye, Jilles and 
          Simonin, Olivier and Wolf, Christian}
  booktitle={ECMLPKDD},
  year={2020}}
                


Acknowledgements

This work was funded by grant Deepvision (ANR-15-CE23-0029, STPGP479356-15), a joint French/Canadian call by ANR \& NSERC. We gratefully acknowledge support from the CNRS/IN2P3 Computing Center (Lyon - France) for providing computing and data-processing resources needed for this work.
Website template from here and here.