Cross-View Geolocalization Data
A library for managing datasets for cross-view geolocalization (CVGL).
Main features:
-
📁 Define a common data format for different datasets. -
📁 ⬅️ Provide code for (downloading and) converting datasets to this format. -
📁 ➡️ Provide code for loading data samples from this format. -
📏 Provide improved ground-truth for all datasets
Installation
pip install cvgl_data
Content
- Usage
- Pseudo-labelled ground-truth
- Dataset format
- Build from source
- Example: Integration with PyTorch
- Notes
Paper
If you find this library useful for your research, please consider citing:
@InProceedings{Fervers_2023_CVPR,
author = {Fervers, Florian and Bullinger, Sebastian and Bodensteiner, Christoph and Arens, Michael and Stiefelhagen, Rainer},
title = {Uncertainty-Aware Vision-Based Metric Cross-View Geolocalization},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2023},
pages = {21621-21631}
}
Usage
📁 ⬅️ Downloading and converting data
The following python script converts the Argoverse V1 dataset in the specified folder to the cvgl-data format:
python3 scripts/prepare/argoverse_v1 --path ~/argoverse-v1 --min-pixels 640 # Downsample
See ./scripts/prepare for all supported datasets. Some datasets have to be downloaded manually (if prompted by the script).
📁 ➡️ Loading data
1. Loading meta-data
The following loads an index of a dataset from disk with meta-data such as camera parameters, paths to image files, etc. This is fast and excludes image or point cloud files which are later loaded on-demand.
>>> import cvgl_data
>>> scenes = cvgl_data.load("~/argoverse-v1")
Each dataset consists of a list of scenes. To inspect the sensors used in a scene:
>>> scene = scenes[0]
>>> scene
{
camera: {
ring_front_center: {
intr: np.ndarray,
name: "ring_front_center",
resolution: (640, 1024),
timestamps: np.ndarray(shape=(901)),
},
# ...
},
dataset: "argoverse-v1",
ego_to_world: {
timestamps: np.ndarray(shape=(6877)),
},
geopose: {
timestamps: np.ndarray(shape=(6877)),
},
lidar: {
all: {
name: all,
timestamps: np.ndarray(shape=(300)),
},
},
location: "MIA",
scene_name: "00c561b9-2057-358d-82c6-5b06d76cebcf",
}
A scene can either contain sensors with
- timestamped sequences of measurements (e.g. a video consisting of multiple frames as in Argoverse V1), or
-
unordered sets of measurements (e.g. ground panoramas and matching aerial images as in CVUSA). In this case, the
timestamp
field acts as an id for matching sensor data.
Each sensor in a scene is represented by a loader object responsible for retrieving the corresponding data from disk.
2. Loading data at timestamp
Once the meta-data is loaded, we can choose a scene and timestamp and load the corresponding data from disk. Since sensor setups often capture data asynchronously, the sparse sensor measurements have to be interpolated between subsequent measurements. Each loader object comes with a method loader.load(timestamp: int)
that loads the necessary data from disk and performs interpolation (if possible) for the requested timestamp. E.g. to load data from all sensors for a given timestamp:
>>> timestamp = scene.camera["ring_front_center"].timestamps[0] # Choose the first camera timestamp
>>> frame = scene.load(timestamp)
>>> frame
{
camera: {
ring_front_center: {
cam_to_ego: cosy.Rigid,
image: np.ndarray(shape=(640, 1024, 3)),
intr: np.ndarray,
name: "ring_front_center",
timestamp: 315969629022515,
},
# ...
timestamp: 315969629022515,
},
dataset: "argoverse-v1",
ego_to_world: {
timestamp: 315969629022515,
transform: cosy.Rigid,
},
geopose: {
bearing: 177.389,
latlon: (25.798512, -80.194962),
timestamp: 315969629022515,
},
lidar: {
all: {
name: "all",
points: np.ndarray(shape=(81522, 3)),
timestamp: 315969629022515,
},
points: np.ndarray(shape=(81522, 3)),
timestamp: 315969629022515,
},
location: "MIA",
name: "argoverse-v1-00c561b9-2057-358d-82c6-5b06d76cebcf-315969629022515",
scene_name: "00c561b9-2057-358d-82c6-5b06d76cebcf",
timestamp: 315969629022515,
}
The meta-data object scene
and the timestamped data object frame
have the same tree structure.
🌍 ➡️ Loading data from tiled web maps
In addition to loading pre-defined samples from an existing dataset, we can also load virtual data samples from a tiled web map using the library tiledwebmaps. We first choose a tileloader from which aerial images will be fetched (see examples):
>>> import tiledwebmaps as twm
>>> tileloader = ...
We can then define a virtual scene that provides map data and corresponding geo-poses, and load a data sample from it:
>>> twm_scene = cvgl_data.load_tiledwebmaps(tileloader, name="name-of-tileloader", zoom=20)
>>> twm_frame = twm_scene.load(
>>> latlon=frame.geopose.latlon, # Load map data centered on the geo-pose of our previously loaded vehicle frame
>>> bearing=frame.geopose.bearing,
>>> meters_per_pixel=0.1,
>>> shape=(512, 512),
>>> )
>>> twm_frame
{
dataset: "name-of-tileloader",
geopose: {
bearing: 177.389,
latlon: (25.7985, -80.195),
timestamp: 0,
},
location: "unknown-location",
map: {
image: np.ndarray(shape=(512, 512, 3)),
meters_per_pixel: 0.100000,
name: "name-of-tileloader",
timestamp: 0,
},
name: "name-of-tileloader-z20-lat25.7985-lon-80.195-b177.389",
scene_name: "name-of-tileloader",
timestamp: 0,
}
Visualization
1. Individual frames
The script draw_frames
can be used to draw individual data frames by projecting the lidar points into ground and aerial images. This requires a file config.yaml
that contains the paths to all datasets and definitions of tileloaders (see data/config.yaml for an example).
Example:
python3 draw_frames --output ~/frames --num 70 --dataset nuscenes --config config.yaml --tileloader massgis21 --location boston-seaport --aerial-point-radius 2.5
Output (Map data: MassGIS, vehicle data: Nuscenes):
2. Trajectories
The script draw_trajectories
can be used to draw trajectories on aerial images.
Example:
python3 draw_trajectories --output ~/trajectories --dataset nuscenes --config config.yaml --tileloader massgis21 --downsample 2 --radius 2 --stride 1.0 --per-scene --location boston-seaport
Output (Map data: MassGIS, vehicle data: Nuscenes):
📏 Ground-truth
The original ground-truth geo-poses of datasets are often inaccurate. In our paper, we use a pseudo-label approach to produce more accurate ground-truth. The pseudo-labelling also yields outlier-scores that can serve to indicate invalid data samples (i.e. when the vehicle is travelling through a tunnel, when the aerial and ground data are out-of-date, or when the pseudo-labelling failed).
Download the pseudo-labels from the following link (currently without Kitti and Kitti-360):
https://drive.google.com/file/d/1DOHekyqi0FtLh97YYR6_EngciGpKr1Pc
The pseudo-labels retain the license of the original datasets.
The pseudo-label files also follow the cvgl-data format and can be integrated without replacing any files in the original dataset directories. After downloading and extracting the pseudo-labels, they can be included when loading a dataset as follows:
>>> scenes = cvgl_data.load("~/argoverse-v1", updates=["/path/to/extracted/pseudolabels/"])
When updates
is passed, the function will first check all provided paths when looking for a file (e.g. geopose.npz
for pseudo-labels), and only fallback to the original file if no update was found. The provided path should contain one folder per dataset and follow the cvgl-data format per dataset.
📁 Format
cvgl-data defines the following format for datasets:
|-- {dataset_name} # the root folder's name specifies the dataset name
|-- LICENSE # ID and link to the license of the dataset
|-- {scene_name}... # every scene is stored in a separate folder
|-- camera
|-- {camera_name}... # every camera is stored in a separate folder
|-- images #
|-- {timestamp}.{jpg|png|...}... # one camera image per captured timestamp
cam_to_ego.npz # a list of transformations from camera to ego coordinates per timestamp
timestamps.npz # a list of all timestamps for which a camera image is available
config.yaml # constant configuration parameters over all timestamps: e.g. intr, cam_to_ego, resolution
|-- lidar
|-- {lidar_name}... # every lidar sensor is stored in a separater folder
|-- points #
|-- {timestamp}.npz... # one point cloud per captured timestamp (in ego coordinates)
|-- timestamps.npz # a list of all timestamps for which a point cloud is available
|-- config.yaml # constant configuration parameters over all timestamps: (currently none)
|-- map
|-- {map_name}... # every orthographic map is stored in a separate folder
|-- images #
|-- {id}.{jpg|png|...}... # one image per timestamp/ id
|-- meters_per_pixel.npz # a list of meters_per_pixel per timestamp/ id
|-- config.yaml # constant configuration parameters over all timestamps/ ids: resolution, meters_per_pixel
|-- odometry #
|-- angular_velocity.npz # a list of angular velocities and corresponding timestamps (in ego coordinates)
|-- linear_acceleration.npz # a list of linear accelerations and corresponding timestamps (in ego coordinates)
|-- linear_velocity.npz # a list of linear velocities and corresponding timestamps (in ego coordinates)
|-- ego_to_world.npz # a list of transformations from ego to world coordinates and corresponding timestamps
|-- geopose.npz # a list of geo-poses (latlon, bearing) and corresponding timestamps
|-- outlier_scores.npz # a list of outlier scores (indicating whether a given data-sample is invalid) and corresponding timestamps
|-- config.yaml # dataset configuration parameters: dataset (name), location (e.g. New York)
The format uses npz files for numerical data, yaml files for configuration data and allows for common image file-types.
Build from source
We include a docker script that pulls all necessary dependencies and builds manylinux wheels for this library.
1. Install docker
2. Build wheels
Run sh build_wheel/build_all.sh ./wheels
to build wheels for different python versions and store them in ./wheels
.
3. Install via pip
pip install --find-links ./wheels cvgl_data
Example: Integration with PyTorch
import torch, cvgl_data
import tiledwebmaps as twm
from torch.utils.data import Dataset, DataLoader
import numpy as np
# Load a vehicle dataset and define a tileloader
scenes = cvgl_data.load("/path/to/dataset")
tileloader = ...
# Define a PyTorch dataset that loads paired ground and aerial data
class MyDataset(Dataset):
# scene_timestamps_pairs: list of (scene, timestamps) defining which timestamps should be sampled per scene
def __init__(self, tileloader, scene_timestamps_pairs):
# Flatten scene_timestamps_pairs into a list of (scene, timestamp)
self.scene_timestamp_pairs = []
for scene, timestamps in scene_timestamps_pairs:
self.scene_timestamp_pairs.extend([(scene, timestamp) for timestamp in timestamps])
self.twm_scene = cvgl_data.load_tiledwebmaps(tileloader, name="name-of-tileloader", zoom=20)
def __len__(self):
return len(self.scene_timestamp_pairs)
def __getitem__(self, idx):
# Load vehicle data
scene, timestamp = self.scene_timestamp_pairs[idx]
frame = scene.load(timestamp)
# Load aerial data
twm_frame = self.twm_scene.load(
latlon=frame.geopose.latlon,
bearing=frame.geopose.bearing,
meters_per_pixel=0.2,
shape=(512, 512),
)
# Retrieve the data from frame+twm_frame that the model takes as input. E.g.:
ground_images = [np.copy(camera.image) for camera in frame.camera.values()]
aerial_image = np.copy(twm_frame.map.image)
return ground_images, aerial_image
scene_timestamps_pairs = []
for scene in scenes:
# Retrieve timestamps of the first camera
timestamps = scene.camera.values()[0].timestamps
# Keep only those timestamps where data for all sensors is available (i.e. between first and last measurement per sensor)
timestamps = cvgl_data.intersect_timestamps(timestamps, cvgl_data.get_all_timestamps(scene))
scene_timestamps_pairs.append((scene, timestamps))
dataset = MyDataset(tileloader, scene_timestamps_pairs)
dataloader = DataLoader(dataset, batch_size=4, shuffle=True, num_workers=4)
Notes
- The GIL is released for all operations, such that multiple calls can be made concurrently in multiple threads.
- We use the following coordinate system conventions:
- Ego: x, y, z = forward, left, up
- Camera: x, y, z = right, down, forward