pytracking

pytracking

基于PyTorch的开源视觉目标跟踪和视频对象分割框架

PyTracking是基于PyTorch的开源视觉目标跟踪和视频对象分割框架。它实现了多个先进的跟踪算法,如TaMOs、RTS和ToMP,并提供完整的训练代码和预训练模型。该框架包含用于实现和评估视觉跟踪器的库,涵盖常用数据集、性能分析脚本和通用构建模块。其LTR训练框架支持多种跟踪网络的训练,提供丰富的数据集和功能。

视觉目标跟踪视频目标分割PyTorch深度学习计算机视觉Github开源项目

PyTracking

A general python framework for visual object tracking and video object segmentation, based on PyTorch.

:fire: One tracking paper accepted at WACV 2024! 👇

:fire: One tracking paper accepted at WACV 2023! 👇

:fire: One tracking paper accepted at ECCV 2022! 👇

Highlights

TaMOs, RTS, ToMP, KeepTrack, LWL, KYS, PrDiMP, DiMP and ATOM Trackers

Official implementation of the TaMOs (WACV 2024), RTS (ECCV 2022), ToMP (CVPR 2022), KeepTrack (ICCV 2021), LWL (ECCV 2020), KYS (ECCV 2020), PrDiMP (CVPR 2020), DiMP (ICCV 2019), and ATOM (CVPR 2019) trackers, including complete training code and trained models.

Tracking Libraries

Libraries for implementing and evaluating visual trackers. It includes

  • All common tracking and video object segmentation datasets.
  • Scripts to analyse tracker performance and obtain standard performance scores.
  • General building blocks, including deep networks, optimization, feature extraction and utilities for correlation filter tracking.

Training Framework: LTR

LTR (Learning Tracking Representations) is a general framework for training your visual tracking networks. It is equipped with

  • All common training datasets for visual object tracking and segmentation.
  • Functions for data sampling, processing etc.
  • Network modules for visual tracking.
  • And much more...

Model Zoo

The tracker models trained using PyTracking, along with their results on standard tracking benchmarks are provided in the model zoo.

Trackers

The toolkit contains the implementation of the following trackers.

TaMOs (WACV 2024)

[Paper] [Raw results] [Models] [Training Code] [Tracker Code]

Official implementation of TaMOs. TaMOs is the first generico object tracker to tackle the problem of tracking multiple generic object at once. It uses a shared model predictor consisting of a Transformer in order to produce multiple target models (one for each specified target). It achieves sub-linear run-time when tracking multiple objects and outperforms existing single object trackers when running one instance for each target separately. TaMOs serves as the baseline tracker for the new large-scale generic object tracking benchmark LaGOT (see here) that contains multiple annotated target objects per sequence.

TaMOs_teaser_figure

RTS (ECCV 2022)

[Paper] [Raw results] [Models] [Training Code] [Tracker Code]

Official implementation of RTS. RTS is a robust, end-to-end trainable, segmentation-centric pipeline that internally works with segmentation masks instead of bounding boxes. Thus, it can learn a better target representation that clearly differentiates the target from the background. To achieve the necessary robustness for challenging tracking scenarios, a separate instance localization component is used to condition the segmentation decoder when producing the output mask.

RTS_teaser_figure

ToMP (CVPR 2022)

[Paper] [Raw results] [Models] [Training Code] [Tracker Code]

Official implementation of ToMP. ToMP employs a Transformer-based model prediction module in order to localize the target. The model predictor is further extended to estimate a second set of weights that are applied for accurate bounding box regression. The resulting tracker ToMP relies on training and on test frame information in order to predict all weights transductively.

ToMP_teaser_figure

KeepTrack (ICCV 2021)

[Paper] [Raw results] [Models] [Training Code] [Tracker Code]

Official implementation of KeepTrack. KeepTrack actively handles distractor objects to continue tracking the target. It employs a learned target candidate association network, that allows to propagate the identities of all target candidates from frame-to-frame. To tackle the problem of lacking groundtruth correspondences between distractor objects in visual tracking, it uses a training strategy that combines partial annotations with self-supervision.

KeepTrack_teaser_figure

LWL (ECCV 2020)

[Paper] [Raw results] [Models] [Training Code] [Tracker Code]

Official implementation of the LWL tracker. LWL is an end-to-end trainable video object segmentation architecture which captures the current target object information in a compact parametric model. It integrates a differentiable few-shot learner module, which predicts the target model parameters using the first frame annotation. The learner is designed to explicitly optimize an error between target model prediction and a ground truth label. LWL further learns the ground-truth labels used by the few-shot learner to train the target model. All modules in the architecture are trained end-to-end by maximizing segmentation accuracy on annotated VOS videos.

LWL overview figure

KYS (ECCV 2020)

[Paper] [Raw results] [Models] [Training Code] [Tracker Code]

Official implementation of the KYS tracker. Unlike conventional frame-by-frame detection based tracking, KYS propagates valuable scene information through the sequence. This information is used to achieve an improved scene-aware target prediction in each frame. The scene information is represented using a dense set of localized state vectors. These state vectors are propagated through the sequence and combined with the appearance model output to localize the target. The network is learned to effectively utilize the scene information by directly maximizing tracking performance on video segments KYS overview figure

PrDiMP (CVPR 2020)

[Paper] [Raw results] [Models] [Training Code] [Tracker Code]

Official implementation of the PrDiMP tracker. This work proposes a general formulation for probabilistic regression, which is then applied to visual tracking in the DiMP framework. The network predicts the conditional probability density of the target state given an input image. The probability density is flexibly parametrized by the neural network itself. The regression network is trained by directly minimizing the Kullback-Leibler divergence.

DiMP (ICCV 2019)

[Paper] [Raw results] [Models] [Training Code] [Tracker Code]

Official implementation of the DiMP tracker. DiMP is an end-to-end tracking architecture, capable of fully exploiting both target and background appearance information for target model prediction. It is based on a target model prediction network, which is derived from a discriminative learning loss by applying an iterative optimization procedure. The model prediction network employs a steepest descent based methodology that computes an optimal step length in each iteration to provide fast convergence. The model predictor also includes an initializer network that efficiently provides an initial estimate of the model weights.

DiMP overview figure

ATOM (CVPR 2019)

[Paper] [Raw results] [Models] [Training Code] [Tracker Code]

Official implementation of the ATOM tracker. ATOM is based on (i) a target estimation module that is trained offline, and (ii) target classification module that is trained online. The target estimation module is trained to predict the intersection-over-union (IoU) overlap between the target and a bounding box estimate. The target classification module is learned online using dedicated optimization techniques to discriminate between the target object and background.

ATOM overview figure

ECO/UPDT (CVPR 2017/ECCV 2018)

[Paper] [Models] [Tracker Code]

An unofficial implementation of the ECO tracker. It is implemented based on an extensive and general library for complex operations and Fourier tools. The implementation differs from the version used in the original paper in a few important aspects.

  1. This implementation uses features from vgg-m layer 1 and resnet18 residual block 3.
  2. As in our later UPDT tracker, seperate filters are trained for shallow and deep features, and extensive data augmentation is employed in the first frame.
  3. The GMM memory module is not implemented, instead the raw projected samples are stored.

Please refer to the official implementation of ECO if you are looking to reproduce the results in the ECO paper or download the raw results.

Associated trackers

We list associated trackers that can be found in external repositories.

E.T.Track (WACV 2023)

[Paper] [Code]

Official implementation of E.T.Track. E.T.Track utilized our proposed Exemplar Transformer, a transformer module utilizing a single instance level attention layer for realtime visual object tracking. E.T.Track is up to 8x faster than other transformer-based models, and consistently outperforms competing lightweight trackers that can operate in realtime on standard CPUs.

ETTrack_teaser_figure

Installation

Clone the GIT repository.

git clone https://github.com/visionml/pytracking.git

Clone the submodules.

In the repository directory, run the commands:

git submodule update --init

Install dependencies

Run the installation script to install all the dependencies. You need to provide the conda install path (e.g. ~/anaconda3) and the name for the created conda environment (here pytracking).

bash install.sh conda_install_path pytracking

This script will also download the default networks and set-up the environment.

Note: The install script has been tested on an Ubuntu 18.04 system. In case of issues, check the detailed installation instructions.

Windows: (NOT Recommended!) Check these installation instructions.

Let's test it!

Activate the conda environment and run the script pytracking/run_webcam.py to run ATOM using the webcam input.

conda activate pytracking cd pytracking python run_webcam.py dimp dimp50

What's next?

pytracking - for implementing your tracker

ltr - for training your tracker

Contributors

Main Contributors

Guest Contributors

Acknowledgments

编辑推荐精选

商汤小浣熊

商汤小浣熊

最强AI数据分析助手

小浣熊家族Raccoon,您的AI智能助手,致力于通过先进的人工智能技术,为用户提供高效、便捷的智能服务。无论是日常咨询还是专业问题解答,小浣熊都能以快速、准确的响应满足您的需求,让您的生活更加智能便捷。

imini AI

imini AI

像人一样思考的AI智能体

imini 是一款超级AI智能体,能根据人类指令,自主思考、自主完成、并且交付结果的AI智能体。

Keevx

Keevx

AI数字人视频创作平台

Keevx 一款开箱即用的AI数字人视频创作平台,广泛适用于电商广告、企业培训与社媒宣传,让全球企业与个人创作者无需拍摄剪辑,就能快速生成多语言、高质量的专业视频。

即梦AI

即梦AI

一站式AI创作平台

提供 AI 驱动的图片、视频生成及数字人等功能,助力创意创作

扣子-AI办公

扣子-AI办公

AI办公助手,复杂任务高效处理

AI办公助手,复杂任务高效处理。办公效率低?扣子空间AI助手支持播客生成、PPT制作、网页开发及报告写作,覆盖科研、商业、舆情等领域的专家Agent 7x24小时响应,生活工作无缝切换,提升50%效率!

TRAE编程

TRAE编程

AI辅助编程,代码自动修复

Trae是一种自适应的集成开发环境(IDE),通过自动化和多元协作改变开发流程。利用Trae,团队能够更快速、精确地编写和部署代码,从而提高编程效率和项目交付速度。Trae具备上下文感知和代码自动完成功能,是提升开发效率的理想工具。

AI工具TraeAI IDE协作生产力转型热门
蛙蛙写作

蛙蛙写作

AI小说写作助手,一站式润色、改写、扩写

蛙蛙写作—国内先进的AI写作平台,涵盖小说、学术、社交媒体等多场景。提供续写、改写、润色等功能,助力创作者高效优化写作流程。界面简洁,功能全面,适合各类写作者提升内容品质和工作效率。

AI辅助写作AI工具蛙蛙写作AI写作工具学术助手办公助手营销助手AI助手
问小白

问小白

全能AI智能助手,随时解答生活与工作的多样问题

问小白,由元石科技研发的AI智能助手,快速准确地解答各种生活和工作问题,包括但不限于搜索、规划和社交互动,帮助用户在日常生活中提高效率,轻松管理个人事务。

热门AI助手AI对话AI工具聊天机器人
Transly

Transly

实时语音翻译/同声传译工具

Transly是一个多场景的AI大语言模型驱动的同声传译、专业翻译助手,它拥有超精准的音频识别翻译能力,几乎零延迟的使用体验和支持多国语言可以让你带它走遍全球,无论你是留学生、商务人士、韩剧美剧爱好者,还是出国游玩、多国会议、跨国追星等等,都可以满足你所有需要同传的场景需求,线上线下通用,扫除语言障碍,让全世界的语言交流不再有国界。

讯飞智文

讯飞智文

一键生成PPT和Word,让学习生活更轻松

讯飞智文是一个利用 AI 技术的项目,能够帮助用户生成 PPT 以及各类文档。无论是商业领域的市场分析报告、年度目标制定,还是学生群体的职业生涯规划、实习避坑指南,亦或是活动策划、旅游攻略等内容,它都能提供支持,帮助用户精准表达,轻松呈现各种信息。

AI办公办公工具AI工具讯飞智文AI在线生成PPTAI撰写助手多语种文档生成AI自动配图热门
下拉加载更多