Game4Loc: A UAV Geo-Localization Benchmark from Game Data

Yuxiang Ji*, Boyong He*, Zhuoyue Tan, Liaoni Wu
Xiamen University
*Indicates Equal Contribution

Localization in flight trajectory after pre-trained on GTA-UAV dataset.

Abstract

The vision-based geo-localization technology for UAV, serving as a secondary source of GPS information in addition to the global navigation satellite systems (GNSS), can still operate independently when communication with the external environment is cut off. Recent deep learning based methods attribute this as the task of image matching and retrieval. By retrieving drone-view images in satellite image database with GPS information tagged, approximate localization information can be obtained. However, due to high costs and privacy concerns, it is usually difficult to obtain large quantities of drone-view images from a continuous area. Existing droneview datasets are mostly composed of small-scale aerial photography with a strong assumption that there exists a perfect one-to-one aligned reference image for any query, leaving a significant gap from the practical localization scenario. In this work, we construct a large-range contiguous area UAV geo-localization dataset named GTA-UAV, featuring multiple flight altitudes, attitudes, scenes, and targets using modern computer games. Based on this dataset, we introduce a more practical UAV geo-localization task including partial matches of cross-view paired data, and expand the image-level retrieval to the actual localization in terms of distance (meters). For the construction of drone-view and satellite-view pairs, we adopt a weight-based contrastive learning approach, which allows for effective learning while avoiding additional post-processing matching steps. Experiments demonstrate the effectiveness of our data and training method for UAV geolocalization, as well as the generalization capabilities to realworld scenarios.

GTA-UAV Dataset

MY ALT TEXT
The paired data construction process of GTA-UAV, where Positive and Semi-positive satellite-view are paired with Drone-view by IOU.

MY ALT TEXT

Unlike the aligned one-to-one retrieval strong assumption of existing datasets, we do not center-align the drone-satellite pairs. Instead, we use a collect-then-match approach, pairing them by calculating the overlapping of the ground area covered by the two views.

Methodology

MY ALT TEXT
The overview of proposed training and inference pipeline. (left) We use ViT as feature encoder and weighted-InfoNCE for training positive and semi-positive batched samples from mutually exclusive sampling. (right) Then the retrieval could be based on discriminative features to achieve localization.