
<div align="center">
<h1 align="center">top CVPR 2024 papers</h1>
<a href="https://github.com/SkalskiP/top-cvpr-2023-papers">2023</a>
</div>
<br>
<div align="center">
<img width="600" src="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/347853f9-9e93-4ca0-858b-a7c3f6bba073" alt="vancouver">
</div>
👋 hello
Computer Vision and Pattern Recognition is a massive conference. In 2024 alone,
11,532 papers were submitted, and 2,719 were accepted. I created this repository
to help you search for crème de la crème of CVPR publications. If the paper you are
looking for is not on my short list, take a peek at the full
list of accepted papers.
🗞️ papers and posters
🔥 - highlighted papers
<!--- AUTOGENERATED_PAPERS_LIST -->
<!---
WARNING: DO NOT EDIT THIS LIST MANUALLY. IT IS AUTOMATICALLY GENERATED.
HEAD OVER TO https://github.com/SkalskiP/top-cvpr-2024-papers/blob/master/CONTRIBUTING.md FOR MORE DETAILS ON HOW TO MAKE CHANGES PROPERLY.
-->
3d from multi-view and sensors
<p align="left">
<a href="https://cvpr.thecvf.com/media/PosterPDFs/CVPR%202024/31668.png?t=1717417393.7589533" title="SpatialTracker: Tracking Any 2D Pixels in 3D Space">
<img src="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/56498f78-2ca0-46ee-9231-6aa1806b6ebc" alt="SpatialTracker: Tracking Any 2D Pixels in 3D Space" width="400px" align="left" />
</a>
<a href="https://arxiv.org/abs/2404.04319" title="SpatialTracker: Tracking Any 2D Pixels in 3D Space">
<strong>🔥 SpatialTracker: Tracking Any 2D Pixels in 3D Space</strong>
</a>
<br/>
Yuxi Xiao, Qianqian Wang, Shangzhan Zhang, Nan Xue, Sida Peng, Yujun Shen, Xiaowei Zhou
<br/>
[<a href="https://arxiv.org/abs/2404.04319">paper</a>] [<a href="https://github.com/henry123-boy/SpaTracker">code</a>]
<br/>
<strong>Topic:</strong> 3D from multi-view and sensors
<br/>
<strong>Session:</strong> Fri 21 Jun 1:30 p.m. EDT — 3 p.m. EDT #84
</p>
<br/>
<br/>
<p align="left">
<a href="https://cvpr.thecvf.com/media/PosterPDFs/CVPR%202024/31616.png?t=1716470830.0209699" title="ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models">
<img src="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/0453bf88-9d54-4ecf-8a45-01af0f604faf" alt="ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models" width="400px" align="left" />
</a>
<a href="https://arxiv.org/abs/2403.01807" title="ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models">
<strong>ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models</strong>
</a>
<br/>
Lukas Höllein, Aljaž Božič, Norman Müller, David Novotny, Hung-Yu Tseng, Christian Richardt, Michael Zollhöfer, Matthias Nießner
<br/>
[<a href="https://arxiv.org/abs/2403.01807">paper</a>] [<a href="https://github.com/facebookresearch/ViewDiff">code</a>] [<a href="https://youtu.be/SdjoCqHzMMk">video</a>]
<br/>
<strong>Topic:</strong> 3D from multi-view and sensors
<br/>
<strong>Session:</strong> Wed 19 Jun 8 p.m. EDT — 9:30 p.m. EDT #20
</p>
<br/>
<br/>
<p align="left">
<a href="https://arxiv.org/abs/2405.12979" title="OmniGlue: Generalizable Feature Matching with Foundation Model Guidance">
<strong>OmniGlue: Generalizable Feature Matching with Foundation Model Guidance</strong>
</a>
<br/>
Hanwen Jiang, Arjun Karpur, Bingyi Cao, Qixing Huang, Andre Araujo
<br/>
[<a href="https://arxiv.org/abs/2405.12979">paper</a>] [<a href="https://github.com/google-research/omniglue">code</a>] [<a href="https://huggingface.co/spaces/qubvel-hf/omniglue">demo</a>]
<br/>
<strong>Topic:</strong> 3D from multi-view and sensors
<br/>
<strong>Session:</strong> Fri 21 Jun 1:30 p.m. EDT — 3 p.m. EDT #32
</p>
<br/>
deep learning architectures and techniques
<p align="left">
<a href="https://cvpr.thecvf.com/media/PosterPDFs/CVPR%202024/30529.png?t=1717455193.7819567" title="Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks">
<img src="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/4aaf3f87-cc62-4fa3-af99-c8c1c83c0069" alt="Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks" width="400px" align="left" />
</a>
<a href="https://arxiv.org/pdf/2311.06242" title="Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks">
<strong>🔥 Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks</strong>
</a>
<br/>
Bin Xiao, Haiping Wu, Weijian Xu, Xiyang Dai, Houdong Hu, Yumao Lu, Michael Zeng, Ce Liu, Lu Yuan
<br/>
[<a href="https://arxiv.org/pdf/2311.06242">paper</a>] [<a href="https://youtu.be/cOlyA00K1ec">video</a>] [<a href="https://huggingface.co/spaces/gokaygokay/Florence-2">demo</a>] [<a href="https://youtu.be/cOlyA00K1ec">colab</a>]
<br/>
<strong>Topic:</strong> Deep learning architectures and techniques
<br/>
<strong>Session:</strong> Wed 19 Jun 8 p.m. EDT — 9:30 p.m. EDT #102
</p>
<br/>
<br/>
document analysis and understanding
<p align="left">
<a href="https://arxiv.org/abs/2405.04408" title="DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks">
<strong>DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks</strong>
</a>
<br/>
Jiaxin Zhang, Dezhi Peng, Chongyu Liu, Peirong Zhang, Lianwen Jin
<br/>
[<a href="https://arxiv.org/abs/2405.04408">paper</a>] [<a href="https://github.com/ZZZHANG-jx/DocRes">code</a>] [<a href="https://huggingface.co/spaces/qubvel-hf/documents-restoration">demo</a>]
<br/>
<strong>Topic:</strong> Document analysis and understanding
<br/>
<strong>Session:</strong> Thu 20 Jun