Skip to content

ICIAP2022 - Learning Semantics for Visual Place Recognition through Multi-Scale Attention

License

Notifications You must be signed in to change notification settings

valeriopaolicelli/SegVPR

Repository files navigation

SegVPR

This is the official PyTorch implementation of our work: "Learning Semantics for Visual Place Recognition through Multi-Scale Attention" accepted at ICIAP 2021.

In this paper we address the task of visual place recognition (VPR), where the goal is to retrieve the correct GPS coordinates of a given query image against a huge geotagged gallery. While recent works have shown that building descriptors incorporating semantic and appearance information is beneficial, current state-of-the-art methods opt for a top down definition of the significant semantic content. Here we present the first VPR algorithm that learns robust global embeddings from both visual appearance and semantic content of the data, with the segmentation process being dynamically guided by the recognition of places through a multi-scale attention module. Experiments on various scenarios validate this new approach and demonstrate its performance against state-of-the-art methods. Finally, we propose the first synthetic-world dataset suited for both place recognition and segmentation tasks.

Read the paper here: [ArXiV]

Overview Architecture
Teaser Architecture
MS-Attention-Module MS-Pooling-Module

Setup:

Datasets: (please refer to details)

Town 3 Town 10
Town3_image Town3_mask Town10_image Town10_mask
UTMx 277349.751
UTMy 110471.756
UTMx 277414.576
UTMy 110665.787
* Oxford RobotCar available on the official website. We use the Overcast scenario as the gallery, while the queries are divided into four scenarios: Rain, Snow, Sun, and Night, with one image sampled every 5 meters and filename formatted as `@UTMx@UTMy@.jpg`.

Usage:

  • Train: Using default parameters, the script uses the final architecture configuration with ResNet50 encoder, DeepLab semantic segmentation module, multi-scale pooling layer from 4th and 5th conv blocks and finally the domain adaptation module. It follows the exact training protocol and implementation details described into the main paper and supplementary material. It trains all layers of the encoder and uses the multi-scale attention computed with the features extracted from the 4th conv block.
    python3 main.py --exp_name=<name output log folder> --dataset_root=<root path of IDDAv2 train dataset> --dataset_root_val=<root path of IDDAv2 val dataset> --dataset_root_test=<root path of RobotCar dataset> --DA_datasets=<path to the RobotCar folder where all scenarios are merged>
    To resume the training specify --resume=<path of checkpoint .pth>
    To change the encoder specify --arch=resnet101
    To change the semantic segmentation module specify --semnet=pspnet
  • Evaluate:
    python3 eval.py --resume=<path of checkpoint .pth> --dataset_root_val=<root path of IDDAv2 val dataset> --dataset_root_test=<root path of RobotCar dataset>

Pretrained models:

Cite us

If you use this repository, please consider to cite us:

@InProceedings{Paolicelli_2022_ICIAP,
author    = {Paolicelli, Valerio and Tavera, Antonio and Masone, Carlo and Berton, Gabriele Moreno and Caputo, Barbara}},
title     = {Learning Semantics for Visual Place Recognition through Multi-Scale Attention},
booktitle = {Image Analysis and Processing – ICIAP 2022},
month     = {},
year      = {},
pages     = {}

About

ICIAP2022 - Learning Semantics for Visual Place Recognition through Multi-Scale Attention

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages