Space-Time Memory Network for Sounding Object Localization in Videos

Abstract

Leveraging temporal synchronization and association within sight and sound is an essential step towards robust localization of sounding objects. To this end, we propose a space-time memory network for sounding object localization in videos.

Publication
BMVC, 2021

Citation

@article{DBLP:journals/corr/abs-2111-05526,
  author    = {Sizhe Li and
               Yapeng Tian and
               Chenliang Xu},
  title     = {Space-Time Memory Network for Sounding Object Localization in Videos},
  journal   = {CoRR},
  volume    = {abs/2111.05526},
  year      = {2021},
  url       = {https://arxiv.org/abs/2111.05526},
  eprinttype = {arXiv},
  eprint    = {2111.05526},
  timestamp = {Tue, 16 Nov 2021 12:12:31 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2111-05526.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}
Sizhe Lester Li
Sizhe Lester Li
李思哲

I am interested in building inverse models that learn to caputure the rich and structured representation of our world from unstructured observation, through physical interactions of embodied agents. To this end, my research draws ideas from vision, graphics, robotics, and computational cognitive science.