首页 > 最新文献

IEEE Transactions on Pattern Analysis and Machine Intelligence最新文献

英文 中文
Training-Free Ultra Small Model for Universal Sparse Reconstruction in Compressed Sensing 压缩感知中通用稀疏重建的无需训练的超小模型
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-02 DOI: 10.1109/tpami.2026.3680162
Chaoqing Tang, Huanze Zhuang, Guiyun Tian, Zhenli Zeng, Yi Ding, Wenzhong Liu, Lin Lin, Xiang Bai
{"title":"Training-Free Ultra Small Model for Universal Sparse Reconstruction in Compressed Sensing","authors":"Chaoqing Tang, Huanze Zhuang, Guiyun Tian, Zhenli Zeng, Yi Ding, Wenzhong Liu, Lin Lin, Xiang Bai","doi":"10.1109/tpami.2026.3680162","DOIUrl":"https://doi.org/10.1109/tpami.2026.3680162","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"26 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147598866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Continuous Spatiotemporal Implicit Neural Fields for Unsupervised Video Denoising 学习连续时空隐式神经场的无监督视频去噪
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-02 DOI: 10.1109/tpami.2026.3680159
Xiaowan Hu, Henan Liu, Ce Zheng, Xinyang Li, Mai Xu
{"title":"Learning Continuous Spatiotemporal Implicit Neural Fields for Unsupervised Video Denoising","authors":"Xiaowan Hu, Henan Liu, Ce Zheng, Xinyang Li, Mai Xu","doi":"10.1109/tpami.2026.3680159","DOIUrl":"https://doi.org/10.1109/tpami.2026.3680159","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"121 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147598871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SparseBEV: A Fully Sparse Framework for Multi-View 3D Object Detection. SparseBEV:用于多视图3D目标检测的全稀疏框架。
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-01 DOI: 10.1109/tpami.2026.3679808
Yang Chen,Haisong Liu,Limin Wang
Camera-based 3D object detection in BEV (Bird's Eye View) space has drawn great attention over the past few years. Dense detectors typically follow a two-stage pipeline by first constructing a dense BEV feature and then performing object detection in BEV space, which suffers from complex view transformations and high computation costs. On the other side, sparse detectors follow a query-based paradigm without explicit dense BEV feature construction but generally underperform compared to dense ones. In this paper, we find that the key to mitigating this performance gap is the adaptability of the detector in both BEV and image space. To this end, we propose a fully sparse 3D object detector that outperforms the dense counterparts and enjoys a higher running speed. Our sparse detector contains three key designs, which are (1) scale-adaptive self attention to aggregate features with adaptive receptive field in BEV space, (2) scale-adaptive cross attention to capture the unique temporal dynamics associated with different objects, (3) adaptive sampling and mixing to perform interactions between queries and image features under the guidance of queries. These key components enhance the adaptability of the detector in both BEV and image space. Furthermore, we explore two distinct temporal modeling approaches: sampling-point-based multi-frame stacking (dubbed SparseBEV) and query-based recurrent temporal fusion (dubbed SparseBEV++) to leverage temporal features effectively. Experiments are conducted on the nuScenes and Waymo datasets. On the val split of nuScenes, both SparseBEV and SparseBEV++ surpass all previous methods. Our SparseBEV achieves a performance of 55.8 NDS and a speed of 23.5 FPS, and SparseBEV++ further achieves a remarkable 57.1 NDS while maintaining a real-time inference speed of 24.6 FPS. On the Waymo dataset, our best-performing model, SparseBEV++, outperforms previous methods with a lead of 58.9 mAP and 55.2 mAPH.
在过去的几年里,基于相机的鸟瞰空间三维目标检测引起了人们的极大关注。密集检测器通常采用两阶段流程,首先构建密集的BEV特征,然后在BEV空间中进行目标检测,这一流程具有复杂的视图转换和较高的计算成本。另一方面,稀疏检测器遵循基于查询的范式,没有显式的密集BEV特征构建,但通常表现不如密集检测器。在本文中,我们发现减轻这种性能差距的关键是检测器在BEV和图像空间中的适应性。为此,我们提出了一种完全稀疏的3D物体检测器,它比密集的检测器性能更好,并且具有更高的运行速度。我们的稀疏检测器包含三个关键设计,即(1)尺度自适应自关注在BEV空间中具有自适应接受场的聚合特征,(2)尺度自适应交叉关注捕获与不同对象相关的独特时间动态,(3)自适应采样和混合,在查询的指导下执行查询与图像特征之间的交互。这些关键部件增强了探测器在BEV空间和图像空间的适应性。此外,我们探索了两种不同的时间建模方法:基于采样点的多帧叠加(称为SparseBEV)和基于查询的循环时间融合(称为SparseBEV++),以有效地利用时间特征。在nuScenes和Waymo数据集上进行了实验。在nuScenes的val分割上,SparseBEV和sparsebev++都超越了之前的所有方法。我们的SparseBEV实现了55.8 NDS的性能和23.5 FPS的速度,而SparseBEV++进一步实现了57.1 NDS的卓越性能,同时保持了24.6 FPS的实时推理速度。在Waymo数据集上,我们表现最好的模型sparsebev++以58.9 mAP和55.2 mAPH领先于之前的方法。
{"title":"SparseBEV: A Fully Sparse Framework for Multi-View 3D Object Detection.","authors":"Yang Chen,Haisong Liu,Limin Wang","doi":"10.1109/tpami.2026.3679808","DOIUrl":"https://doi.org/10.1109/tpami.2026.3679808","url":null,"abstract":"Camera-based 3D object detection in BEV (Bird's Eye View) space has drawn great attention over the past few years. Dense detectors typically follow a two-stage pipeline by first constructing a dense BEV feature and then performing object detection in BEV space, which suffers from complex view transformations and high computation costs. On the other side, sparse detectors follow a query-based paradigm without explicit dense BEV feature construction but generally underperform compared to dense ones. In this paper, we find that the key to mitigating this performance gap is the adaptability of the detector in both BEV and image space. To this end, we propose a fully sparse 3D object detector that outperforms the dense counterparts and enjoys a higher running speed. Our sparse detector contains three key designs, which are (1) scale-adaptive self attention to aggregate features with adaptive receptive field in BEV space, (2) scale-adaptive cross attention to capture the unique temporal dynamics associated with different objects, (3) adaptive sampling and mixing to perform interactions between queries and image features under the guidance of queries. These key components enhance the adaptability of the detector in both BEV and image space. Furthermore, we explore two distinct temporal modeling approaches: sampling-point-based multi-frame stacking (dubbed SparseBEV) and query-based recurrent temporal fusion (dubbed SparseBEV++) to leverage temporal features effectively. Experiments are conducted on the nuScenes and Waymo datasets. On the val split of nuScenes, both SparseBEV and SparseBEV++ surpass all previous methods. Our SparseBEV achieves a performance of 55.8 NDS and a speed of 23.5 FPS, and SparseBEV++ further achieves a remarkable 57.1 NDS while maintaining a real-time inference speed of 24.6 FPS. On the Waymo dataset, our best-performing model, SparseBEV++, outperforms previous methods with a lead of 58.9 mAP and 55.2 mAPH.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"17 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147585516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dictionary Multi-Modal Temporal Graph Learning 字典多模态时间图学习
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-31 DOI: 10.1109/tpami.2026.3679419
Meng Liu, Ke Liang, Miaomiao Li, Xueling Zhu, Xinwang Liu
{"title":"Dictionary Multi-Modal Temporal Graph Learning","authors":"Meng Liu, Ke Liang, Miaomiao Li, Xueling Zhu, Xinwang Liu","doi":"10.1109/tpami.2026.3679419","DOIUrl":"https://doi.org/10.1109/tpami.2026.3679419","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"53 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147586544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Chat-Scene++: Exploiting Context-Rich Object Identification for 3D LLM. 利用上下文丰富的三维LLM对象识别。
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-31 DOI: 10.1109/tpami.2026.3679561
Haifeng Huang,Yilun Chen,Zehan Wang,Jiangmiao Pang,Zhou Zhao
Recent advancements in multi-modal large language models (MLLMs) have shown strong potential for 3D scene understanding. However, existing methods struggle with fine-grained object grounding and contextual reasoning, limiting their ability to interpret and interact with complex 3D environments. In this paper, we present Chat-Scene++, an MLLM framework that represents 3D scenes as context-rich object sequences. By structuring scenes as sequences of objects with contextual semantics, Chat-Scene++ enables object-centric representation and interaction. It decomposes a 3D scene into object representations paired with identifier tokens, allowing LLMs to follow instructions across diverse 3D vision-language tasks. To capture inter-object relationships and global semantics, Chat-Scene++ extracts context-rich object features using large-scale pre-trained 3D scene-level and 2D image-level encoders, unlike the isolated per-object features in Chat-Scene. Its flexible object-centric design also supports grounded chain-of-thought (G-CoT) reasoning, enabling the model to distinguish objects at both category and spatial levels during multi-step inference. Without the need for additional task-specific heads or fine-tuning, Chat-Scene++ achieves state-of-the-art performance on five major 3D vision-language benchmarks: ScanRefer, Multi3DRefer, Scan2Cap, ScanQA, and SQA3D. These results highlight its effectiveness in scene comprehension, object grounding, and spatial reasoning. Additionally, without reconstructing 3D worlds through computationally expensive processes, we demonstrate its applicability to real-world scenarios using only 2D inputs. Code will be made available at https://github.com/ZzZZCHS/Chat-Scene https://github.com/ZzZZCHS/Chat-Scene.
多模态大语言模型(mllm)的最新进展显示出3D场景理解的强大潜力。然而,现有的方法与细粒度的对象基础和上下文推理相斗争,限制了它们解释和与复杂3D环境交互的能力。在本文中,我们提出了Chat-Scene++,这是一个将3D场景表示为上下文丰富的对象序列的mlm框架。通过将场景构建为具有上下文语义的对象序列,Chat-Scene++支持以对象为中心的表示和交互。它将3D场景分解为与标识符标记配对的对象表示,允许llm在不同的3D视觉语言任务中遵循指令。为了捕获对象间关系和全局语义,Chat-Scene++使用大规模预训练的3D场景级和2D图像级编码器提取上下文丰富的对象特征,而不像Chat-Scene中孤立的每个对象特征。其灵活的以对象为中心的设计还支持接地思维链(G-CoT)推理,使模型能够在多步推理过程中区分类别和空间级别的对象。无需额外的任务特定头部或微调,Chat-Scene++在五个主要3D视觉语言基准上实现了最先进的性能:scanreference, Multi3DRefer, Scan2Cap, ScanQA和SQA3D。这些结果突出了该方法在场景理解、物体基础和空间推理方面的有效性。此外,无需通过计算昂贵的过程重建3D世界,我们仅使用2D输入证明了其对现实世界场景的适用性。代码将在https://github.com/ZzZZCHS/Chat-Scene https://github.com/ZzZZCHS/Chat-Scene上提供。
{"title":"Chat-Scene++: Exploiting Context-Rich Object Identification for 3D LLM.","authors":"Haifeng Huang,Yilun Chen,Zehan Wang,Jiangmiao Pang,Zhou Zhao","doi":"10.1109/tpami.2026.3679561","DOIUrl":"https://doi.org/10.1109/tpami.2026.3679561","url":null,"abstract":"Recent advancements in multi-modal large language models (MLLMs) have shown strong potential for 3D scene understanding. However, existing methods struggle with fine-grained object grounding and contextual reasoning, limiting their ability to interpret and interact with complex 3D environments. In this paper, we present Chat-Scene++, an MLLM framework that represents 3D scenes as context-rich object sequences. By structuring scenes as sequences of objects with contextual semantics, Chat-Scene++ enables object-centric representation and interaction. It decomposes a 3D scene into object representations paired with identifier tokens, allowing LLMs to follow instructions across diverse 3D vision-language tasks. To capture inter-object relationships and global semantics, Chat-Scene++ extracts context-rich object features using large-scale pre-trained 3D scene-level and 2D image-level encoders, unlike the isolated per-object features in Chat-Scene. Its flexible object-centric design also supports grounded chain-of-thought (G-CoT) reasoning, enabling the model to distinguish objects at both category and spatial levels during multi-step inference. Without the need for additional task-specific heads or fine-tuning, Chat-Scene++ achieves state-of-the-art performance on five major 3D vision-language benchmarks: ScanRefer, Multi3DRefer, Scan2Cap, ScanQA, and SQA3D. These results highlight its effectiveness in scene comprehension, object grounding, and spatial reasoning. Additionally, without reconstructing 3D worlds through computationally expensive processes, we demonstrate its applicability to real-world scenarios using only 2D inputs. Code will be made available at https://github.com/ZzZZCHS/Chat-Scene https://github.com/ZzZZCHS/Chat-Scene.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"12 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147584078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Harnessing Meta-Learning for Controllable Full-Frame Video Stabilization 利用元学习实现可控的全帧视频稳定
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-31 DOI: 10.1109/tpami.2026.3679401
Muhammad Kashif Ali, Eun Woo Im, Dongjin Kim, Tae Hyun Kim, Vivek Gupta, Haonan Luo, Tianrui Li
{"title":"Harnessing Meta-Learning for Controllable Full-Frame Video Stabilization","authors":"Muhammad Kashif Ali, Eun Woo Im, Dongjin Kim, Tae Hyun Kim, Vivek Gupta, Haonan Luo, Tianrui Li","doi":"10.1109/tpami.2026.3679401","DOIUrl":"https://doi.org/10.1109/tpami.2026.3679401","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"20 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147586531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MAN++: Scaling Momentum Auxiliary Network for Supervised Local Learning in Vision Tasks 视觉任务中监督局部学习的尺度动量辅助网络
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-31 DOI: 10.1109/tpami.2026.3679406
Junhao Su, Feiyu Zhu, Hengyu Shi, Tianyang Han, Yurui Qiu, Junfeng Luo, Xiaoming Wei, Jialin Gao
{"title":"MAN++: Scaling Momentum Auxiliary Network for Supervised Local Learning in Vision Tasks","authors":"Junhao Su, Feiyu Zhu, Hengyu Shi, Tianyang Han, Yurui Qiu, Junfeng Luo, Xiaoming Wei, Jialin Gao","doi":"10.1109/tpami.2026.3679406","DOIUrl":"https://doi.org/10.1109/tpami.2026.3679406","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"53 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147586532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Match Stereo Videos Via Bidirectional Alignment. 通过双向对齐匹配立体视频。
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-31 DOI: 10.1109/tpami.2026.3679033
Junpeng Jing,Ye Mao,Anlan Qiu,Krystian Mikolajczyk
Video stereo matching is the task of estimating consistent disparity maps from rectified stereo videos. There is considerable scope for improvement in both datasets and methods within this area. Recent learning-based methods often focus on optimizing performance for independent stereo pairs, leading to temporal inconsistencies in videos. Existing video methods typically employ sliding window operation over time dimension, which can result in low-frequency oscillations corresponding to the window size. To address these challenges, we propose a bidirectional alignment mechanism for adjacent frames as a fundamental operation. Building on this, we introduce a novel video processing framework, BiDAStereo, and a plugin stabilizer network, BiDAStabilizer, compatible with general image-based methods. Regarding datasets, current synthetic object-based and indoor datasets are commonly used for training and benchmarking, with a lack of outdoor nature scenarios. To bridge this gap, we present a realistic synthetic dataset and benchmark focused on natural scenes, along with a real-world dataset captured by a stereo camera in diverse urban scenes for qualitative evaluation. Extensive experiments on in-domain, out-of-domain, and robustness evaluation demonstrate the contribution of our methods and datasets, showcasing improvements in prediction quality and achieving state-of-the-art results on various commonly used benchmarks. The project page, demos, code, and datasets are available at: https://tomtomtommi.github.io/BiDAVideo/.
视频立体匹配是在校正后的立体视频中估计一致的视差图。这一领域的数据集和方法都有相当大的改进空间。最近基于学习的方法通常侧重于优化独立立体对的性能,导致视频中的时间不一致。现有的视频方法通常采用随时间维度的滑动窗口操作,这可能导致与窗口大小相对应的低频振荡。为了解决这些挑战,我们提出了一种相邻帧的双向对齐机制作为基本操作。在此基础上,我们介绍了一个新的视频处理框架BiDAStereo和一个插件稳定器网络BiDAStabilizer,它与一般的基于图像的方法兼容。在数据集方面,目前基于合成对象和室内的数据集通常用于训练和基准测试,缺乏室外自然场景。为了弥补这一差距,我们提出了一个真实的合成数据集和基准,专注于自然场景,以及一个由立体摄像机在不同城市场景中捕获的真实数据集,用于定性评估。在域内、域外和鲁棒性评估方面的大量实验证明了我们的方法和数据集的贡献,展示了预测质量的改进,并在各种常用基准上取得了最先进的结果。项目页面、演示、代码和数据集可在:https://tomtomtommi.github.io/BiDAVideo/上获得。
{"title":"Match Stereo Videos Via Bidirectional Alignment.","authors":"Junpeng Jing,Ye Mao,Anlan Qiu,Krystian Mikolajczyk","doi":"10.1109/tpami.2026.3679033","DOIUrl":"https://doi.org/10.1109/tpami.2026.3679033","url":null,"abstract":"Video stereo matching is the task of estimating consistent disparity maps from rectified stereo videos. There is considerable scope for improvement in both datasets and methods within this area. Recent learning-based methods often focus on optimizing performance for independent stereo pairs, leading to temporal inconsistencies in videos. Existing video methods typically employ sliding window operation over time dimension, which can result in low-frequency oscillations corresponding to the window size. To address these challenges, we propose a bidirectional alignment mechanism for adjacent frames as a fundamental operation. Building on this, we introduce a novel video processing framework, BiDAStereo, and a plugin stabilizer network, BiDAStabilizer, compatible with general image-based methods. Regarding datasets, current synthetic object-based and indoor datasets are commonly used for training and benchmarking, with a lack of outdoor nature scenarios. To bridge this gap, we present a realistic synthetic dataset and benchmark focused on natural scenes, along with a real-world dataset captured by a stereo camera in diverse urban scenes for qualitative evaluation. Extensive experiments on in-domain, out-of-domain, and robustness evaluation demonstrate the contribution of our methods and datasets, showcasing improvements in prediction quality and achieving state-of-the-art results on various commonly used benchmarks. The project page, demos, code, and datasets are available at: https://tomtomtommi.github.io/BiDAVideo/.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"41 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147584077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual Geometry Margin Optimization for Coupled-Noisy Robust Ensemble Learning. 耦合噪声鲁棒集成学习的对偶几何余量优化。
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-31 DOI: 10.1109/tpami.2026.3679394
Zheng Wang,Guanxiong He,Jie Wang,Runxin Zhang,Liaoyuan Tang,Rong Wang,Feiping Nie
Ensemble learning methods, such as Bagging and Boosting, are well-regarded for their ability to enhance model performance by combining diverse base learners. These approaches leverage the strengths of individual models to achieve more accurate and robust predictions. However, real-world datasets often contain noise, which can significantly impair model effectiveness. This paper focuses on two prevalent and challenging types: feature noise, which can lead to fitting instability and poor generalization, and label noise, which can lead to erroneous supervision and model overfitting. Recognizing the inherent properties of ensemble learning, particularly its focus on optimizing the decision margin to improve classification accuracy, we see an opportunity to bolster ensemble model robustness. To address both feature and label noise, we propose a novel approach called Dual Geometry Margin Boosting (DGMB). This method employs two key strategies: the Decision Plane Margin (DPM), which enhances class separation, and the Hyper-Sphere Margin (HSM), which effectively filters out potentially noisy samples during the learning process. Our experiments demonstrate the impressive ability of DGMB to resist both feature and label noise. Through rigorous testing on various noise-contaminated datasets, we show that DGMB maintains strong performance and outperforms other robust Ensemble methods.
集成学习方法,如Bagging和Boosting,因其通过组合不同的基础学习器来提高模型性能的能力而受到好评。这些方法利用单个模型的优势来实现更准确和健壮的预测。然而,现实世界的数据集经常包含噪声,这可能会严重损害模型的有效性。本文主要关注两种普遍且具有挑战性的类型:特征噪声,它会导致拟合不稳定和泛化不良,以及标签噪声,它会导致错误的监督和模型过拟合。认识到集成学习的固有属性,特别是其对优化决策裕度以提高分类精度的关注,我们看到了增强集成模型鲁棒性的机会。为了同时解决特征噪声和标签噪声,我们提出了一种新的方法,称为双几何边缘增强(DGMB)。该方法采用了两个关键策略:增强类分离的决策平面边缘(DPM)和在学习过程中有效滤除潜在噪声样本的超球面边缘(HSM)。我们的实验证明了DGMB抵抗特征噪声和标签噪声的惊人能力。通过对各种噪声污染数据集的严格测试,我们表明DGMB保持了强大的性能,并且优于其他鲁棒集成方法。
{"title":"Dual Geometry Margin Optimization for Coupled-Noisy Robust Ensemble Learning.","authors":"Zheng Wang,Guanxiong He,Jie Wang,Runxin Zhang,Liaoyuan Tang,Rong Wang,Feiping Nie","doi":"10.1109/tpami.2026.3679394","DOIUrl":"https://doi.org/10.1109/tpami.2026.3679394","url":null,"abstract":"Ensemble learning methods, such as Bagging and Boosting, are well-regarded for their ability to enhance model performance by combining diverse base learners. These approaches leverage the strengths of individual models to achieve more accurate and robust predictions. However, real-world datasets often contain noise, which can significantly impair model effectiveness. This paper focuses on two prevalent and challenging types: feature noise, which can lead to fitting instability and poor generalization, and label noise, which can lead to erroneous supervision and model overfitting. Recognizing the inherent properties of ensemble learning, particularly its focus on optimizing the decision margin to improve classification accuracy, we see an opportunity to bolster ensemble model robustness. To address both feature and label noise, we propose a novel approach called Dual Geometry Margin Boosting (DGMB). This method employs two key strategies: the Decision Plane Margin (DPM), which enhances class separation, and the Hyper-Sphere Margin (HSM), which effectively filters out potentially noisy samples during the learning process. Our experiments demonstrate the impressive ability of DGMB to resist both feature and label noise. Through rigorous testing on various noise-contaminated datasets, we show that DGMB maintains strong performance and outperforms other robust Ensemble methods.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"17 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147584116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-View Clustering Via Bilaterally Constrained Anchor Graph. 基于双边约束锚图的多视图聚类。
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-31 DOI: 10.1109/tpami.2026.3678628
Qianyao Qiang,Bin Zhang,Yunjia Hua,Feiping Nie
The anchor similarity matrix, widely used for efficient clustering, exhibits an imbalance between its rows and columns - only the rows are typically constrained by probabilistic properties, unlike the regular similarity matrix where both dimensions are regulated. This paper addresses the critical question of how to impose meaningful constraints on the columns to better capture the data structure. We propose a novel method, termed Multi-view Clustering via Bilaterally constrained anchor Graph (MCBG), which learns a fused anchor similarity matrix with bilateral constraints. To ensure consistency across views, we quantitatively assess their contributions and integrate them into a unified model. By applying distinct constraints to rows and columns, MCBG promotes a balanced and expressive anchor similarity distribution, avoiding degenerate cases. Furthermore, a rank constraint on the Laplacian matrix of an anchor-pairwise graph is incorporated, ensuring a one-step post-processing-free multi-view clustering framework. An efficient alternating iterative optimization algorithm is developed, adapted to the natural properties of the target problem. Extensive experiments validate the superiority of the proposed method.
锚点相似矩阵被广泛用于高效聚类,它在行和列之间表现出不平衡——只有行通常受到概率属性的约束,而不像常规的相似矩阵,它的两个维度都是受调节的。本文解决了如何在列上施加有意义的约束以更好地捕获数据结构的关键问题。本文提出了一种基于双边约束锚点图(bilateral constrained anchor Graph, MCBG)的多视图聚类方法,该方法学习具有双边约束的融合锚点相似矩阵。为了确保视图之间的一致性,我们定量地评估它们的贡献,并将它们集成到一个统一的模型中。通过对行和列应用不同的约束,MCBG促进了平衡和富有表现力的锚点相似性分布,避免了退化情况。此外,在锚点对图的拉普拉斯矩阵上加入秩约束,保证了一步无后处理的多视图聚类框架。根据目标问题的自然性质,提出了一种高效的交替迭代优化算法。大量的实验验证了该方法的优越性。
{"title":"Multi-View Clustering Via Bilaterally Constrained Anchor Graph.","authors":"Qianyao Qiang,Bin Zhang,Yunjia Hua,Feiping Nie","doi":"10.1109/tpami.2026.3678628","DOIUrl":"https://doi.org/10.1109/tpami.2026.3678628","url":null,"abstract":"The anchor similarity matrix, widely used for efficient clustering, exhibits an imbalance between its rows and columns - only the rows are typically constrained by probabilistic properties, unlike the regular similarity matrix where both dimensions are regulated. This paper addresses the critical question of how to impose meaningful constraints on the columns to better capture the data structure. We propose a novel method, termed Multi-view Clustering via Bilaterally constrained anchor Graph (MCBG), which learns a fused anchor similarity matrix with bilateral constraints. To ensure consistency across views, we quantitatively assess their contributions and integrate them into a unified model. By applying distinct constraints to rows and columns, MCBG promotes a balanced and expressive anchor similarity distribution, avoiding degenerate cases. Furthermore, a rank constraint on the Laplacian matrix of an anchor-pairwise graph is incorporated, ensuring a one-step post-processing-free multi-view clustering framework. An efficient alternating iterative optimization algorithm is developed, adapted to the natural properties of the target problem. Extensive experiments validate the superiority of the proposed method.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"7 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147584076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Pattern Analysis and Machine Intelligence
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1