首页 > 最新文献

信息工程最新文献

英文 中文
IF:
Nonlinear optimal control for the five-axle and three-steering coupled-vehicle system
Pub Date : 2025-04-23 DOI: 10.1007/s43684-025-00097-x
G. Rigatos, M. Abbaszadeh, K. Busawon, P. Siano, M. Al Numay, G. Cuccurullo, F. Zouari

Transportation of heavy loads is often performed by multi-axle multi-steered heavy duty vehicles In this article a novel nonlinear optimal control method is applied to the kinematic model of the five-axle and three-steering coupled vehicle system. First, it is proven that the dynamic model of this articulated multi-vehicle system is differentially flat. Next. the state-space model of the five-axle and three-steering vehicle system undergoes approximate linearization around a temporary operating point that is recomputed at each time-step of the control method. The linearization is based on Taylor series expansion and on the associated Jacobian matrices. For the linearized state-space model of the five-axle and three-steering vehicle system a stabilizing optimal (H-infinity) feedback controller is designed. This controller stands for the solution of the nonlinear optimal control problem under model uncertainty and external perturbations. To compute the controller’s feedback gains an algebraic Riccati equation is repetitively solved at each iteration of the control algorithm. The stability properties of the control method are proven through Lyapunov analysis. The proposed nonlinear optimal control approach achieves fast and accurate tracking of setpoints under moderate variations of the control inputs and minimal dispersion of energy by the propulsion and steering system of the five-axle and three-steering vehicle system.

{"title":"Nonlinear optimal control for the five-axle and three-steering coupled-vehicle system","authors":"G. Rigatos,&nbsp;M. Abbaszadeh,&nbsp;K. Busawon,&nbsp;P. Siano,&nbsp;M. Al Numay,&nbsp;G. Cuccurullo,&nbsp;F. Zouari","doi":"10.1007/s43684-025-00097-x","DOIUrl":"10.1007/s43684-025-00097-x","url":null,"abstract":"<div><p>Transportation of heavy loads is often performed by multi-axle multi-steered heavy duty vehicles In this article a novel nonlinear optimal control method is applied to the kinematic model of the five-axle and three-steering coupled vehicle system. First, it is proven that the dynamic model of this articulated multi-vehicle system is differentially flat. Next. the state-space model of the five-axle and three-steering vehicle system undergoes approximate linearization around a temporary operating point that is recomputed at each time-step of the control method. The linearization is based on Taylor series expansion and on the associated Jacobian matrices. For the linearized state-space model of the five-axle and three-steering vehicle system a stabilizing optimal (H-infinity) feedback controller is designed. This controller stands for the solution of the nonlinear optimal control problem under model uncertainty and external perturbations. To compute the controller’s feedback gains an algebraic Riccati equation is repetitively solved at each iteration of the control algorithm. The stability properties of the control method are proven through Lyapunov analysis. The proposed nonlinear optimal control approach achieves fast and accurate tracking of setpoints under moderate variations of the control inputs and minimal dispersion of energy by the propulsion and steering system of the five-axle and three-steering vehicle system.</p></div>","PeriodicalId":71187,"journal":{"name":"自主智能系统(英文)","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s43684-025-00097-x.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143861351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multidimensional image morphing-fast image-based rendering of open 3D and VR environments
Q1 Computer Science Pub Date : 2025-04-01 DOI: 10.1016/j.vrih.2023.06.007
Simon Seibt , Bastian Kuth , Bartosz von Rymon Lipinski , Thomas Chang , Marc Erich Latoschik

Background

In recent years, the demand for interactive photorealistic three-dimensional (3D) environments has increased in various fields, including architecture, engineering, and entertainment. However, achieving a balance between the quality and efficiency of high-performance 3D applications and virtual reality (VR) remains challenging.

Methods

This study addresses this issue by revisiting and extending view interpolation for image-based rendering (IBR), which enables the exploration of spacious open environments in 3D and VR. Therefore, we introduce multimorphing, a novel rendering method based on the spatial data structure of 2D image patches, called the image graph. Using this approach, novel views can be rendered with up to six degrees of freedom using only a sparse set of views. The rendering process does not require 3D reconstruction of the geometry or per-pixel depth information, and all relevant data for the output are extracted from the local morphing cells of the image graph. The detection of parallax image regions during preprocessing reduces rendering artifacts by extrapolating image patches from adjacent cells in real-time. In addition, a GPU-based solution was presented to resolve exposure inconsistencies within a dataset, enabling seamless transitions of brightness when moving between areas with varying light intensities.

Results

Experiments on multiple real-world and synthetic scenes demonstrate that the presented method achieves high "VR-compatible" frame rates, even on mid-range and legacy hardware, respectively. While achieving adequate visual quality even for sparse datasets, it outperforms other IBR and current neural rendering approaches.

Conclusions

Using the correspondence-based decomposition of input images into morphing cells of 2D image patches, multidimensional image morphing provides high-performance novel view generation, supporting open 3D and VR environments. Nevertheless, the handling of morphing artifacts in the parallax image regions remains a topic for future research.
{"title":"Multidimensional image morphing-fast image-based rendering of open 3D and VR environments","authors":"Simon Seibt ,&nbsp;Bastian Kuth ,&nbsp;Bartosz von Rymon Lipinski ,&nbsp;Thomas Chang ,&nbsp;Marc Erich Latoschik","doi":"10.1016/j.vrih.2023.06.007","DOIUrl":"10.1016/j.vrih.2023.06.007","url":null,"abstract":"<div><h3>Background</h3><div>In recent years, the demand for interactive photorealistic three-dimensional (3D) environments has increased in various fields, including architecture, engineering, and entertainment. However, achieving a balance between the quality and efficiency of high-performance 3D applications and virtual reality (VR) remains challenging.</div></div><div><h3>Methods</h3><div>This study addresses this issue by revisiting and extending view interpolation for image-based rendering (IBR), which enables the exploration of spacious open environments in 3D and VR. Therefore, we introduce multimorphing, a novel rendering method based on the spatial data structure of 2D image patches, called the image graph. Using this approach, novel views can be rendered with up to six degrees of freedom using only a sparse set of views. The rendering process does not require 3D reconstruction of the geometry or per-pixel depth information, and all relevant data for the output are extracted from the local morphing cells of the image graph. The detection of parallax image regions during preprocessing reduces rendering artifacts by extrapolating image patches from adjacent cells in real-time. In addition, a GPU-based solution was presented to resolve exposure inconsistencies within a dataset, enabling seamless transitions of brightness when moving between areas with varying light intensities.</div></div><div><h3>Results</h3><div>Experiments on multiple real-world and synthetic scenes demonstrate that the presented method achieves high \"VR-compatible\" frame rates, even on mid-range and legacy hardware, respectively. While achieving adequate visual quality even for sparse datasets, it outperforms other IBR and current neural rendering approaches.</div></div><div><h3>Conclusions</h3><div>Using the correspondence-based decomposition of input images into morphing cells of 2D image patches, multidimensional image morphing provides high-performance novel view generation, supporting open 3D and VR environments. Nevertheless, the handling of morphing artifacts in the parallax image regions remains a topic for future research.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 2","pages":"Pages 155-172"},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
STDNet: Improved lip reading via short-term temporal dependency modeling
Q1 Computer Science Pub Date : 2025-04-01 DOI: 10.1016/j.vrih.2024.07.003
Xiaoer Wu , Zhenhua Tan , Ziwei Cheng , Yuran Ru

Background

Lip reading uses lip images for visual speech recognition. Deep-learning-based lip reading has greatly improved performance in current datasets; however, most existing research ignores the significance of short-term temporal dependencies of lip-shape variations between adjacent frames, which leaves space for further improvement in feature extraction.

Methods

This article presents a spatiotemporal feature fusion network (STDNet) that compensates for the deficiencies of current lip-reading approaches in short-term temporal dependency modeling. Specifically, to distinguish more similar and intricate content, STDNet adds a temporal feature extraction branch based on a 3D-CNN, which enhances the learning of dynamic lip movements in adjacent frames while not affecting spatial feature extraction. In particular, we designed a local–temporal block, which aggregates interframe differences, strengthening the relationship between various local lip regions through multiscale convolution. We incorporated the squeeze-and-excitation mechanism into the Global-Temporal Block, which processes a single frame as an independent unitto learn temporal variations across the entire lip region more effectively. Furthermore, attention pooling was introduced to highlight meaningful frames containing key semantic information for the target word.

Results

Experimental results demonstrated STDNet's superior performance on the LRW and LRW-1000, achieving word-level recognition accuracies of 90.2% and 53.56%, respectively. Extensive ablation experiments verified the rationality and effectiveness of its modules.

Conclusions

The proposed model effectively addresses short-term temporal dependency limitations in lip reading, and improves the temporal robustness of the model against variable-length sequences. These advancements validate the importance of explicit short-term dynamics modeling for practical lip-reading systems.
{"title":"STDNet: Improved lip reading via short-term temporal dependency modeling","authors":"Xiaoer Wu ,&nbsp;Zhenhua Tan ,&nbsp;Ziwei Cheng ,&nbsp;Yuran Ru","doi":"10.1016/j.vrih.2024.07.003","DOIUrl":"10.1016/j.vrih.2024.07.003","url":null,"abstract":"<div><h3>Background</h3><div>Lip reading uses lip images for visual speech recognition. Deep-learning-based lip reading has greatly improved performance in current datasets; however, most existing research ignores the significance of short-term temporal dependencies of lip-shape variations between adjacent frames, which leaves space for further improvement in feature extraction.</div></div><div><h3>Methods</h3><div>This article presents a spatiotemporal feature fusion network (STDNet) that compensates for the deficiencies of current lip-reading approaches in short-term temporal dependency modeling. Specifically, to distinguish more similar and intricate content, STDNet adds a temporal feature extraction branch based on a 3D-CNN, which enhances the learning of dynamic lip movements in adjacent frames while not affecting spatial feature extraction. In particular, we designed a local–temporal block, which aggregates interframe differences, strengthening the relationship between various local lip regions through multiscale convolution. We incorporated the squeeze-and-excitation mechanism into the Global-Temporal Block, which processes a single frame as an independent unitto learn temporal variations across the entire lip region more effectively. Furthermore, attention pooling was introduced to highlight meaningful frames containing key semantic information for the target word.</div></div><div><h3>Results</h3><div>Experimental results demonstrated STDNet's superior performance on the LRW and LRW-1000, achieving word-level recognition accuracies of 90.2% and 53.56%, respectively. Extensive ablation experiments verified the rationality and effectiveness of its modules.</div></div><div><h3>Conclusions</h3><div>The proposed model effectively addresses short-term temporal dependency limitations in lip reading, and improves the temporal robustness of the model against variable-length sequences. These advancements validate the importance of explicit short-term dynamics modeling for practical lip-reading systems.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 2","pages":"Pages 173-187"},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Segmentation of CAD models using hybrid representation
Q1 Computer Science Pub Date : 2025-04-01 DOI: 10.1016/j.vrih.2025.01.001
Claude Uwimana , Shengdi Zhou , Limei Yang , Zhuqing Li , Norbelt Mutagisha , Edouard Niyongabo , Bin Zhou
In this paper, we introduce an innovative method for computer-aided design (CAD) segmentation by concatenating meshes and CAD models. Many previous CAD segmentation methods have achieved impressive performance using single representations, such as meshes, CAD, and point clouds. However, existing methods cannot effectively combine different three-dimensional model types for the direct conversion, alignment, and integrity maintenance of geometric and topological information. Hence, we propose an integration approach that combines the geometric accuracy of CAD data with the flexibility of mesh representations, as well as introduce a unique hybrid representation that combines CAD and mesh models to enhance segmentation accuracy. To combine these two model types, our hybrid system utilizes advanced-neural-network techniques to convert CAD models into mesh models. For complex CAD models, model segmentation is crucial for model retrieval and reuse. In partial retrieval, it aims to segment a complex CAD model into several simple components. The first component of our hybrid system involves advanced mesh-labeling algorithms that harness the digitization of CAD properties to mesh models. The second component integrates labelled face features for CAD segmentation by leveraging the abundant multisemantic information embedded in CAD models. This combination of mesh and CAD not only refines the accuracy of boundary delineation but also provides a comprehensive understanding of the underlying object semantics. This study uses the Fusion 360 Gallery dataset. Experimental results indicate that our hybrid method can segment these models with higher accuracy than other methods that use single representations.
{"title":"Segmentation of CAD models using hybrid representation","authors":"Claude Uwimana ,&nbsp;Shengdi Zhou ,&nbsp;Limei Yang ,&nbsp;Zhuqing Li ,&nbsp;Norbelt Mutagisha ,&nbsp;Edouard Niyongabo ,&nbsp;Bin Zhou","doi":"10.1016/j.vrih.2025.01.001","DOIUrl":"10.1016/j.vrih.2025.01.001","url":null,"abstract":"<div><div>In this paper, we introduce an innovative method for computer-aided design (CAD) segmentation by concatenating meshes and CAD models. Many previous CAD segmentation methods have achieved impressive performance using single representations, such as meshes, CAD, and point clouds. However, existing methods cannot effectively combine different three-dimensional model types for the direct conversion, alignment, and integrity maintenance of geometric and topological information. Hence, we propose an integration approach that combines the geometric accuracy of CAD data with the flexibility of mesh representations, as well as introduce a unique hybrid representation that combines CAD and mesh models to enhance segmentation accuracy. To combine these two model types, our hybrid system utilizes advanced-neural-network techniques to convert CAD models into mesh models. For complex CAD models, model segmentation is crucial for model retrieval and reuse. In partial retrieval, it aims to segment a complex CAD model into several simple components. The first component of our hybrid system involves advanced mesh-labeling algorithms that harness the digitization of CAD properties to mesh models. The second component integrates labelled face features for CAD segmentation by leveraging the abundant multisemantic information embedded in CAD models. This combination of mesh and CAD not only refines the accuracy of boundary delineation but also provides a comprehensive understanding of the underlying object semantics. This study uses the Fusion 360 Gallery dataset. Experimental results indicate that our hybrid method can segment these models with higher accuracy than other methods that use single representations.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 2","pages":"Pages 188-202"},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient and lightweight 3D building reconstruction from drone imagery using sparse line and point clouds
Q1 Computer Science Pub Date : 2025-04-01 DOI: 10.1016/j.vrih.2025.02.001
Xiongjie Yin , Jinquan He , Zhanglin Cheng
Efficient three-dimensional (3D) building reconstruction from drone imagery often faces data acquisition, storage, and computational challenges because of its reliance on dense point clouds. In this study, we introduced a novel method for efficient and lightweight 3D building reconstruction from drone imagery using line clouds and sparse point clouds. Our approach eliminates the need to generate dense point clouds, and thus significantly reduces the computational burden by reconstructing 3D models directly from sparse data. We addressed the limitations of line clouds for plane detection and reconstruction by using a new algorithm. This algorithm projects 3D line clouds onto a 2D plane, clusters the projections to identify potential planes, and refines them using sparse point clouds to ensure an accurate and efficient model reconstruction. Extensive qualitative and quantitative experiments demonstrated the effectiveness of our method, demonstrating its superiority over existing techniques in terms of simplicity and efficiency.
从无人机图像中高效重建三维(3D)建筑往往面临数据采集、存储和计算方面的挑战,因为它依赖于密集的点云。在这项研究中,我们介绍了一种利用线云和稀疏点云从无人机图像中高效、轻量级重建三维建筑物的新方法。我们的方法无需生成密集的点云,直接从稀疏数据中重建三维模型,从而大大减轻了计算负担。我们使用一种新算法解决了线云在平面检测和重建方面的局限性。该算法将三维线云投影到二维平面上,对投影进行聚类以识别潜在平面,并利用稀疏点云对其进行细化,以确保准确高效地重建模型。广泛的定性和定量实验证明了我们的方法的有效性,证明了它在简便性和效率方面优于现有技术。
{"title":"Efficient and lightweight 3D building reconstruction from drone imagery using sparse line and point clouds","authors":"Xiongjie Yin ,&nbsp;Jinquan He ,&nbsp;Zhanglin Cheng","doi":"10.1016/j.vrih.2025.02.001","DOIUrl":"10.1016/j.vrih.2025.02.001","url":null,"abstract":"<div><div>Efficient three-dimensional (3D) building reconstruction from drone imagery often faces data acquisition, storage, and computational challenges because of its reliance on dense point clouds. In this study, we introduced a novel method for efficient and lightweight 3D building reconstruction from drone imagery using line clouds and sparse point clouds. Our approach eliminates the need to generate dense point clouds, and thus significantly reduces the computational burden by reconstructing 3D models directly from sparse data. We addressed the limitations of line clouds for plane detection and reconstruction by using a new algorithm. This algorithm projects 3D line clouds onto a 2D plane, clusters the projections to identify potential planes, and refines them using sparse point clouds to ensure an accurate and efficient model reconstruction. Extensive qualitative and quantitative experiments demonstrated the effectiveness of our method, demonstrating its superiority over existing techniques in terms of simplicity and efficiency.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 2","pages":"Pages 111-126"},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepSafe:Two-level deep learning approach for disaster victims detection
Q1 Computer Science Pub Date : 2025-04-01 DOI: 10.1016/j.vrih.2024.08.005
Amir Azizi , Panayiotis Charalambous , Yiorgos Chrysanthou

Background

Efficient disaster victim detection (DVD) in urban areas after natural disasters is crucial for minimizing losses. However, conventional search and rescue (SAR) methods often experience delays, which can hinder the timely detection of victims. SAR teams face various challenges, including limited access to debris and collapsed structures, safety risks due to unstable conditions, and disrupted communication networks.

Methods

In this paper, we present DeepSafe, a novel two-level deep learning approach for multilevel classification and object detection using a simulated disaster victim dataset. DeepSafe first employs YOLOv8 to classify images into victim and non-victim categories. Subsequently, Detectron2 is used to precisely locate and outline the victims.

Results

Experimental results demonstrate the promising performance of DeepSafe in both victim classification and detection. The model effectively identified and located victims under the challenging conditions presented in the dataset.

Conclusion

DeepSafe offers a practical tool for real-time disaster management and SAR operations, significantly improving conventional methods by reducing delays and enhancing victim detection accuracy in disaster-stricken urban areas.
{"title":"DeepSafe:Two-level deep learning approach for disaster victims detection","authors":"Amir Azizi ,&nbsp;Panayiotis Charalambous ,&nbsp;Yiorgos Chrysanthou","doi":"10.1016/j.vrih.2024.08.005","DOIUrl":"10.1016/j.vrih.2024.08.005","url":null,"abstract":"<div><h3>Background</h3><div>Efficient disaster victim detection (DVD) in urban areas after natural disasters is crucial for minimizing losses. However, conventional search and rescue (SAR) methods often experience delays, which can hinder the timely detection of victims. SAR teams face various challenges, including limited access to debris and collapsed structures, safety risks due to unstable conditions, and disrupted communication networks.</div></div><div><h3>Methods</h3><div>In this paper, we present DeepSafe, a novel two-level deep learning approach for multilevel classification and object detection using a simulated disaster victim dataset. DeepSafe first employs YOLOv8 to classify images into victim and non-victim categories. Subsequently, Detectron2 is used to precisely locate and outline the victims.</div></div><div><h3>Results</h3><div>Experimental results demonstrate the promising performance of DeepSafe in both victim classification and detection. The model effectively identified and located victims under the challenging conditions presented in the dataset.</div></div><div><h3>Conclusion</h3><div>DeepSafe offers a practical tool for real-time disaster management and SAR operations, significantly improving conventional methods by reducing delays and enhancing victim detection accuracy in disaster-stricken urban areas.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 2","pages":"Pages 139-154"},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deconfounded fashion image captioning with transformer and multimodal retrieval
Q1 Computer Science Pub Date : 2025-04-01 DOI: 10.1016/j.vrih.2024.08.002
Tao Peng, Weiqiao Yin, Junping Liu, Li Li, Xinrong Hu

Background

The annotation of fashion images is a significantly important task in the fashion industry as well as social media and e-commerce. However, owing to the complexity and diversity of fashion images, this task entails multiple challenges, including the lack of fine-grained captions and confounders caused by dataset bias. Specifically, confounders often cause models to learn spurious correlations, thereby reducing their generalization capabilities.

Method

In this work, we propose the Deconfounded Fashion Image Captioning (DFIC) framework, which first uses multimodal retrieval to enrich the predicted captions of clothing, and then constructs a detailed causal graph using causal inference in the decoder to perform deconfounding. Multimodal retrieval is used to obtain semantic words related to image features, which are input into the decoder as prompt words to enrich sentence descriptions. In the decoder, causal inference is applied to disentangle visual and semantic features while concurrently eliminating visual and language confounding.

Results

Overall, our method can not only effectively enrich the captions of target images, but also greatly reduce confounders caused by the dataset. To verify the effectiveness of the proposed framework, the model was experimentally verified using the FACAD dataset.
背景时尚图片注释是时尚产业、社交媒体和电子商务中一项非常重要的任务。然而,由于时尚图片的复杂性和多样性,这项任务面临着多重挑战,包括缺乏细粒度标题和数据集偏差造成的混杂因素。具体来说,混杂因素往往会导致模型学习到虚假的相关性,从而降低模型的泛化能力。在这项工作中,我们提出了去混杂时尚图片字幕框架(DFIC),该框架首先使用多模态检索来丰富预测的服装字幕,然后在解码器中使用因果推理来构建详细的因果图,从而执行去混杂。多模态检索用于获取与图像特征相关的语义词,并将其作为提示词输入解码器,以丰富句子描述。在解码器中,应用因果推理来分离视觉和语义特征,同时消除视觉和语言混淆。结果总的来说,我们的方法不仅能有效地丰富目标图像的标题,还能大大减少数据集造成的混淆。为了验证所提框架的有效性,我们使用 FACAD 数据集对该模型进行了实验验证。
{"title":"Deconfounded fashion image captioning with transformer and multimodal retrieval","authors":"Tao Peng,&nbsp;Weiqiao Yin,&nbsp;Junping Liu,&nbsp;Li Li,&nbsp;Xinrong Hu","doi":"10.1016/j.vrih.2024.08.002","DOIUrl":"10.1016/j.vrih.2024.08.002","url":null,"abstract":"<div><h3>Background</h3><div>The annotation of fashion images is a significantly important task in the fashion industry as well as social media and e-commerce. However, owing to the complexity and diversity of fashion images, this task entails multiple challenges, including the lack of fine-grained captions and confounders caused by dataset bias. Specifically, confounders often cause models to learn spurious correlations, thereby reducing their generalization capabilities.</div></div><div><h3>Method</h3><div>In this work, we propose the Deconfounded Fashion Image Captioning (DFIC) framework, which first uses multimodal retrieval to enrich the predicted captions of clothing, and then constructs a detailed causal graph using causal inference in the decoder to perform deconfounding. Multimodal retrieval is used to obtain semantic words related to image features, which are input into the decoder as prompt words to enrich sentence descriptions. In the decoder, causal inference is applied to disentangle visual and semantic features while concurrently eliminating visual and language confounding.</div></div><div><h3>Results</h3><div>Overall, our method can not only effectively enrich the captions of target images, but also greatly reduce confounders caused by the dataset. To verify the effectiveness of the proposed framework, the model was experimentally verified using the FACAD dataset.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 2","pages":"Pages 127-138"},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Intelligent hierarchical federated learning system based on semi-asynchronous and scheduled synchronous control strategies in satellite network
Pub Date : 2025-03-20 DOI: 10.1007/s43684-025-00095-z
Qiang Mei, Rui Huang, Duo Li, Jingyi Li, Nan Shi, Mei Du, Yingkang Zhong, Chunqi Tian

Federated learning (FL) is a technology that allows multiple devices to collaboratively train a global model without sharing original data, which is a hot topic in distributed intelligent systems. Combined with satellite network, FL can overcome the geographical limitation and achieve broader applications. However, it also faces the issues such as straggler effect, unreliable network environments and non-independent and identically distributed (Non-IID) samples. To address these problems, we propose an intelligent hierarchical FL system based on semi-asynchronous and scheduled synchronous control strategies in cloud-edge-client structure for satellite network. Our intelligent system effectively handles multiple client requests by distributing the communication load of the central cloud to various edge clouds. Additionally, the cloud server selection algorithm and the edge-client semi-asynchronous control strategy minimize clients’ waiting time, improving the overall efficiency of the FL process. Furthermore, the center-edge scheduled synchronous control strategy ensures the timeliness of partial models. Based on the experiment results, our proposed intelligent hierarchical FL system demonstrates a distinct advantage in global accuracy over traditional FedAvg, achieving 2% higher global accuracy within the same time frame and reducing 52% training time to achieve the target accuracy.

{"title":"Intelligent hierarchical federated learning system based on semi-asynchronous and scheduled synchronous control strategies in satellite network","authors":"Qiang Mei,&nbsp;Rui Huang,&nbsp;Duo Li,&nbsp;Jingyi Li,&nbsp;Nan Shi,&nbsp;Mei Du,&nbsp;Yingkang Zhong,&nbsp;Chunqi Tian","doi":"10.1007/s43684-025-00095-z","DOIUrl":"10.1007/s43684-025-00095-z","url":null,"abstract":"<div><p>Federated learning (FL) is a technology that allows multiple devices to collaboratively train a global model without sharing original data, which is a hot topic in distributed intelligent systems. Combined with satellite network, FL can overcome the geographical limitation and achieve broader applications. However, it also faces the issues such as straggler effect, unreliable network environments and non-independent and identically distributed (Non-IID) samples. To address these problems, we propose an intelligent hierarchical FL system based on semi-asynchronous and scheduled synchronous control strategies in cloud-edge-client structure for satellite network. Our intelligent system effectively handles multiple client requests by distributing the communication load of the central cloud to various edge clouds. Additionally, the cloud server selection algorithm and the edge-client semi-asynchronous control strategy minimize clients’ waiting time, improving the overall efficiency of the FL process. Furthermore, the center-edge scheduled synchronous control strategy ensures the timeliness of partial models. Based on the experiment results, our proposed intelligent hierarchical FL system demonstrates a distinct advantage in global accuracy over traditional FedAvg, achieving 2% higher global accuracy within the same time frame and reducing 52% training time to achieve the target accuracy.</p></div>","PeriodicalId":71187,"journal":{"name":"自主智能系统(英文)","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s43684-025-00095-z.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143655326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A glance over the past decade: road scene parsing towards safe and comfortable autonomous driving
Pub Date : 2025-03-13 DOI: 10.1007/s43684-025-00096-y
Rui Fan, Jiahang Li, Jiaqi Li, Jiale Wang, Ziwei Long, Ning Jia, Yanan Liu, Wenshuo Wang, Mohammud J. Bocus, Sergey Vityazev, Xieyuanli Chen, Junhao Xiao, Stepan Andreev, Huimin Lu, Alexander Dvorkovich

Road scene parsing is a crucial capability for self-driving vehicles and intelligent road inspection systems. Recent research has increasingly focused on enhancing driving safety and comfort by improving the detection of both drivable areas and road defects. This article reviews state-of-the-art networks developed over the past decade for both general-purpose semantic segmentation and specialized road scene parsing tasks. It also includes extensive experimental comparisons of these networks across five public datasets. Additionally, we explore the key challenges and emerging trends in the field, aiming to guide researchers toward developing next-generation models for more effective and reliable road scene parsing.

{"title":"A glance over the past decade: road scene parsing towards safe and comfortable autonomous driving","authors":"Rui Fan,&nbsp;Jiahang Li,&nbsp;Jiaqi Li,&nbsp;Jiale Wang,&nbsp;Ziwei Long,&nbsp;Ning Jia,&nbsp;Yanan Liu,&nbsp;Wenshuo Wang,&nbsp;Mohammud J. Bocus,&nbsp;Sergey Vityazev,&nbsp;Xieyuanli Chen,&nbsp;Junhao Xiao,&nbsp;Stepan Andreev,&nbsp;Huimin Lu,&nbsp;Alexander Dvorkovich","doi":"10.1007/s43684-025-00096-y","DOIUrl":"10.1007/s43684-025-00096-y","url":null,"abstract":"<div><p>Road scene parsing is a crucial capability for self-driving vehicles and intelligent road inspection systems. Recent research has increasingly focused on enhancing driving safety and comfort by improving the detection of both drivable areas and road defects. This article reviews state-of-the-art networks developed over the past decade for both general-purpose semantic segmentation and specialized road scene parsing tasks. It also includes extensive experimental comparisons of these networks across five public datasets. Additionally, we explore the key challenges and emerging trends in the field, aiming to guide researchers toward developing next-generation models for more effective and reliable road scene parsing.</p></div>","PeriodicalId":71187,"journal":{"name":"自主智能系统(英文)","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s43684-025-00096-y.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143602382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
WGO: a similarly encoded whale-goshawk optimization algorithm for uncertain cloud manufacturing service composition
Pub Date : 2025-03-05 DOI: 10.1007/s43684-025-00089-x
Kezhou Chen, Tao Wang, Huimin Zhuo, Lianglun Cheng

Service Composition and Optimization Selection (SCOS) is crucial in Cloud Manufacturing (CMfg), but the uncertainties in service states and working environments pose challenges for existing QoS-based methods. Recently, digital twins have gained prominence in CMfg due to their predictive capabilities, enhancing the reliability of service composition. Heuristic algorithms are widely used in this field for their flexibility and compatibility with uncertain environments. This paper proposes the Whale-Goshawk Optimization Algorithm (WGO), which combines the Whale Optimization Algorithm (WOA) and Northern Goshawk Optimization Algorithm (NGO). A novel similar integer coding method, incorporating spatial feature information, addresses the limitations of traditional integer coding, while a whale-optimized prey generation strategy improves NGO’s global optimization efficiency. Additionally, a local search method based on similar integer coding enhances WGO’s local search ability. Experimental results demonstrate the effectiveness of the proposed approach.

{"title":"WGO: a similarly encoded whale-goshawk optimization algorithm for uncertain cloud manufacturing service composition","authors":"Kezhou Chen,&nbsp;Tao Wang,&nbsp;Huimin Zhuo,&nbsp;Lianglun Cheng","doi":"10.1007/s43684-025-00089-x","DOIUrl":"10.1007/s43684-025-00089-x","url":null,"abstract":"<div><p>Service Composition and Optimization Selection (SCOS) is crucial in Cloud Manufacturing (CMfg), but the uncertainties in service states and working environments pose challenges for existing QoS-based methods. Recently, digital twins have gained prominence in CMfg due to their predictive capabilities, enhancing the reliability of service composition. Heuristic algorithms are widely used in this field for their flexibility and compatibility with uncertain environments. This paper proposes the Whale-Goshawk Optimization Algorithm (WGO), which combines the Whale Optimization Algorithm (WOA) and Northern Goshawk Optimization Algorithm (NGO). A novel similar integer coding method, incorporating spatial feature information, addresses the limitations of traditional integer coding, while a whale-optimized prey generation strategy improves NGO’s global optimization efficiency. Additionally, a local search method based on similar integer coding enhances WGO’s local search ability. Experimental results demonstrate the effectiveness of the proposed approach.</p></div>","PeriodicalId":71187,"journal":{"name":"自主智能系统(英文)","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s43684-025-00089-x.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
全部 科技通报 计算机与应用化学 电子工业专用设备 河北工业科技 Journal of Applied Sciences 中国科学技术大学学报 印制电路信息 Wuhan University Journal of Natural Sciences Virtual Reality Intelligent Hardware 模式识别与人工智能 控制与决策 电机与控制学报 南京邮电大学学报(自然科学版) 计算机研究与发展 计算机学报 自动化学报 电子科技大学学报 华南理工大学学报(自然科学版) 中国图象图形学报 雷达学报 信息与控制 数据采集与处理 机器人 西北工业大学学报 High Technology Letters Journal of Cybersecurity 中国科学:信息科学 Big Data Mining and Analytics Visual Computing for Industry, Biomedicine, and Art 清华大学学报(自然科学版) 计算机辅助设计与图形学学报 电波科学学报 Journal of Biosafety and Biosecurity Blockchain-Research and Applications 建模与仿真(英文) 建模与仿真 Soc Netw 单片机与嵌入式系统应用 信息安全(英文) 数据挖掘 指挥信息系统与技术 通信世界 智能与融合网络(英文) 电磁分析与应用期刊(英文) 资源环境与信息工程(英文) 无线传感网络(英文) 高性能计算技术 中文信息学报 通信技术政策研究 Tsinghua Sci. Technol. 天线与传播(英文) 物联网技术 离散数学期刊(英文) 计算机应用 ZTE Communications 软件工程与应用(英文) 航空计算技术 智能控制与自动化(英文) 电路与系统(英文) 计算机工程 天线学报 仪表技术与传感器 海军航空工程学院学报 Comput Technol Appl 军事通信技术 计算机仿真 无线通信 现代电子技术(英文) Journal of Systems Science and Information 电脑和通信(英文) 无线工程与技术(英文) 无线互联科技 人工智能与机器人研究 计算机工程与设计 电路与系统学报 软件 通讯和计算机:中英文版 智能学习系统与应用(英文) 图像与信号处理 软件工程与应用 电力电子 现代非线性理论与应用(英文) 计算机科学 计算机科学与应用 物联网(英文) 数据与计算发展前沿 电信科学 自主智能(英文) 人工智能杂志(英文) 信号处理 人工智能技术学报(英文) 自主智能系统(英文) 信息通信技术 数据分析和信息处理(英文)
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1