top-cvpr-2024-papers

<div align="center"> <h1 align="center">top CVPR 2024 papers</h1> <a href="https://github.com/SkalskiP/top-cvpr-2023-papers">2023</a> </div> <div align="center"> <img width="600" src="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/347853f9-9e93-4ca0-858b-a7c3f6bba073" alt="vancouver"> </div>

👋 hello

Computer Vision and Pattern Recognition is a massive conference. In 2024 alone, 11,532 papers were submitted, and 2,719 were accepted. I created this repository to help you search for crème de la crème of CVPR publications. If the paper you are looking for is not on my short list, take a peek at the full list of accepted papers.

🗞️ papers and posters

🔥 - highlighted papers

3d from multi-view and sensors

<a href="https://cvpr.thecvf.com/media/PosterPDFs/CVPR%202024/31668.png?t=1717417393.7589533" title="SpatialTracker: Tracking Any 2D Pixels in 3D Space"> <img src="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/56498f78-2ca0-46ee-9231-6aa1806b6ebc" alt="SpatialTracker: Tracking Any 2D Pixels in 3D Space" width="400px" align="left" /> </a> <a href="https://arxiv.org/abs/2404.04319" title="SpatialTracker: Tracking Any 2D Pixels in 3D Space"> 🔥 SpatialTracker: Tracking Any 2D Pixels in 3D Space </a> Yuxi Xiao, Qianqian Wang, Shangzhan Zhang, Nan Xue, Sida Peng, Yujun Shen, Xiaowei Zhou [<a href="https://arxiv.org/abs/2404.04319">paper</a>] [<a href="https://github.com/henry123-boy/SpaTracker">code</a>] Topic: 3D from multi-view and sensors Session: Fri 21 Jun 1:30 p.m. EDT — 3 p.m. EDT #84 <a href="https://cvpr.thecvf.com/media/PosterPDFs/CVPR%202024/31616.png?t=1716470830.0209699" title="ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models"> <img src="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/0453bf88-9d54-4ecf-8a45-01af0f604faf" alt="ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models" width="400px" align="left" /> </a> <a href="https://arxiv.org/abs/2403.01807" title="ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models"> ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models </a> Lukas Höllein, Aljaž Božič, Norman Müller, David Novotny, Hung-Yu Tseng, Christian Richardt, Michael Zollhöfer, Matthias Nießner [<a href="https://arxiv.org/abs/2403.01807">paper</a>] [<a href="https://github.com/facebookresearch/ViewDiff">code</a>] [<a href="https://youtu.be/SdjoCqHzMMk">video</a>] Topic: 3D from multi-view and sensors Session: Wed 19 Jun 8 p.m. EDT — 9:30 p.m. EDT #20 <a href="https://arxiv.org/abs/2405.12979" title="OmniGlue: Generalizable Feature Matching with Foundation Model Guidance"> OmniGlue: Generalizable Feature Matching with Foundation Model Guidance </a> Hanwen Jiang, Arjun Karpur, Bingyi Cao, Qixing Huang, Andre Araujo [<a href="https://arxiv.org/abs/2405.12979">paper</a>] [<a href="https://github.com/google-research/omniglue">code</a>] [<a href="https://huggingface.co/spaces/qubvel-hf/omniglue">demo</a>] Topic: 3D from multi-view and sensors Session: Fri 21 Jun 1:30 p.m. EDT — 3 p.m. EDT #32

deep learning architectures and techniques

<a href="https://cvpr.thecvf.com/media/PosterPDFs/CVPR%202024/30529.png?t=1717455193.7819567" title="Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks"> <img src="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/4aaf3f87-cc62-4fa3-af99-c8c1c83c0069" alt="Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks" width="400px" align="left" /> </a> <a href="https://arxiv.org/pdf/2311.06242" title="Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks"> 🔥 Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks </a> Bin Xiao, Haiping Wu, Weijian Xu, Xiyang Dai, Houdong Hu, Yumao Lu, Michael Zeng, Ce Liu, Lu Yuan [<a href="https://arxiv.org/pdf/2311.06242">paper</a>] [<a href="https://youtu.be/cOlyA00K1ec">video</a>] [<a href="https://huggingface.co/spaces/gokaygokay/Florence-2">demo</a>] [<a href="https://youtu.be/cOlyA00K1ec">colab</a>] Topic: Deep learning architectures and techniques Session: Wed 19 Jun 8 p.m. EDT — 9:30 p.m. EDT #102

document analysis and understanding

<a href="https://arxiv.org/abs/2405.04408" title="DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks"> DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks </a> Jiaxin Zhang, Dezhi Peng, Chongyu Liu, Peirong Zhang, Lianwen Jin [<a href="https://arxiv.org/abs/2405.04408">paper</a>] [<a href="https://github.com/ZZZHANG-jx/DocRes">code</a>] [<a href="https://huggingface.co/spaces/qubvel-hf/documents-restoration">demo</a>] Topic: Document analysis and understanding Session: Thu 20 Jun