计算机视觉
3D标注与检测
3-DHAL3D: 细粒度3D部件标注的分层主动学习
Fenggen Yu, Yiming Qian, Francisca Gil Ureta, Brian Jackson, Eric Bennett, Richard Zhang
ImGeoNet: 图像诱导的几何感知体素表示用于多视图3D目标检测
Tao Tu, Shun-Po Chuang, Yu-Lun Liu, Cheng Sun, Ke Zhang, Donna Roy, Cheng-Hao Kuo, Min Sun
动作识别
SkeleTR: 面向自然场景的基于骨架的动作识别
Haodong Duan, Mingze Xu, Bing Shuai, Davide Modolo, Zhuowen Tu, Joe Tighe, Alessandro Bergamo
数据表示
Linear spaces of meanings: 视觉语言模型中的组合结构
Matthew Trager, Pramuditha Perera, Luca Zancato, Alessandro Achille, Parminder Bhatia, Stefano Soatto
运动引导掩码用于时空表示学习
David Fan, Jue Wang, Leo Liao, Yi Zhu, Vimal Bhat, Hector Santos, Rohith Mysore Vijaya Kumar, Xinyu (Arthur) Li
视频生成
SIDGAN: 通过平移不变学习生成高清配音视频
Urwa Muaz, Wondong Jang, Rohun Tripathi, santhosh Mani, Wenbin Ouyang, Ravi Teja Gadde, Baris Gecer, Sergio Elizondo, Reza Madad, Naveen Nair
地理空间基础模型
通过持续预训练构建地理空间基础模型
Matias Mendieta, Boran Han, Xingjian Shi, Yi Zhu, Chen Chen
图神经网络
学习图神经网络的自适应邻域
Avi Saha, Oscar Mendez, Chris Russell, Richard Bowden
图像检索
FashionNTM: 通过级联记忆实现多轮时尚图像检索
Anwesan Pal, Sahil Wadhwa, Ayush Jaiswal, Xu Zhang, Yue Wu, Rakesh Chada, Pradeep Natarajan, Henrik I. Christensen
图像分割
Coarse-to-fine amodal segmentation with shape prior
Jianxiong Gao, Xuelin Qian, Yikai Wang, Tianjun Xiao, Tong He, Zheng Zhang, Yanwei Fu
LD-ZNet: 基于潜在扩散的文本图像分割方法
Koutilya PNVR, Bharat Singh, Pallabi Ghosh, Behjat Siddiquie, David Jacobs
基于对象中心表示的学习监督信号重新思考amodal视频分割
Ke Fan, Jingshi Lei, Xuelin Qian, Miaopeng Yu, Tianjun Xiao, Tong He, Zheng Zhang, Yanwei Fu
信息提取
DocTr: 用于文档结构化信息提取的文档变换器
Haofu Liao, Aruni RoyChowdhury, Weijian Li, Ankan Bansal, Yuting Zhang, Zhuowen Tu, Ravi Kumar Satzoda, R. Manmatha, Vijay Mahadevan
机器遗忘
SAFE: 基于分片图的机器遗忘方法
Yonatan Dukler, Ben Bowman, Alessandro Achille, Aditya Golatkar, Ashwin Swaminathan, Stefano Soatto
目标检测
基于变换器的域自适应检测双向对齐
Liqiang He, Wei Wang, Albert Chen, Min Sun, Cheng-Hao Kuo, Sinisa Todorovic
视频中无监督开放词汇目标定位
Ke Fan, Zechen Bai, Tianjun Xiao, Dominik Zietlow, Max Horn, Zixu Zhao, Carl-Johann Simon-Gabriel, Mike Zheng Shou, Francesco Locatello, Bernt Schiele, Thomas Brox, Zheng Zhang, Yanwei Fu, Tong He
目标跟踪
以对象为中心的多目标跟踪
Zixu Zhao, Jiaze Wang, Max Horn, Yizhuo Ding, Tong He, Zechen Bai, Dominik Zietlow, Carl-Johann Simon-Gabriel, Bing Shuai, Zhuowen Tu, Thomas Brox, Bernt Schiele, Yanwei Fu, Francesco Locatello, Zheng Zhang, Tianjun Xiao
场景文本识别
CLIPTER: 在场景文本识别中关注更大画面
Aviad Aberdam, David Haim Bensaid, Alona Golts, Roy Ganz, Oren Nuriel, Royee Tichauer, Shai Mazor, Ron Litman
Towards models that can see and read
Roy Ganz, Oren Nuriel, Aviad Aberdam, Yair Kittenplon, Shai Mazor, Ron Litman
迁移学习
PADCLIP: 在CLIP中通过自适应去偏进行无监督域适应的伪标注
Zhengfeng Lai, Sol Vesdapunt, Ning Zhou, Jun Wu, Cong Phuoc Huynh, Xuelu Li, Kah Kuen Fu, Chen-Nee Chuah
视频检索
通过文本条件特征对齐实现音频增强的文本到视频检索
Sarah Ibrahimi, Xiaohang Sun, Pichao Wang, Amanmeet Garg, Ashutosh Sanan, Mohamed Omar
视频分割
MEGA: 多模态对齐聚合与蒸馏用于电影视频分割
Najmeh Sadoughi, Xinyu (Arthur) Li, Avijit Vajpayee, David Fan, Bing Shuai, Hector Santos, Vimal Bhat, Rohith Mysore Vijaya Kumar