Pub Date : 2025-04-23DOI: 10.1007/s43684-025-00097-x
G. Rigatos, M. Abbaszadeh, K. Busawon, P. Siano, M. Al Numay, G. Cuccurullo, F. Zouari
Transportation of heavy loads is often performed by multi-axle multi-steered heavy duty vehicles In this article a novel nonlinear optimal control method is applied to the kinematic model of the five-axle and three-steering coupled vehicle system. First, it is proven that the dynamic model of this articulated multi-vehicle system is differentially flat. Next. the state-space model of the five-axle and three-steering vehicle system undergoes approximate linearization around a temporary operating point that is recomputed at each time-step of the control method. The linearization is based on Taylor series expansion and on the associated Jacobian matrices. For the linearized state-space model of the five-axle and three-steering vehicle system a stabilizing optimal (H-infinity) feedback controller is designed. This controller stands for the solution of the nonlinear optimal control problem under model uncertainty and external perturbations. To compute the controller’s feedback gains an algebraic Riccati equation is repetitively solved at each iteration of the control algorithm. The stability properties of the control method are proven through Lyapunov analysis. The proposed nonlinear optimal control approach achieves fast and accurate tracking of setpoints under moderate variations of the control inputs and minimal dispersion of energy by the propulsion and steering system of the five-axle and three-steering vehicle system.
{"title":"Nonlinear optimal control for the five-axle and three-steering coupled-vehicle system","authors":"G. Rigatos, M. Abbaszadeh, K. Busawon, P. Siano, M. Al Numay, G. Cuccurullo, F. Zouari","doi":"10.1007/s43684-025-00097-x","DOIUrl":"10.1007/s43684-025-00097-x","url":null,"abstract":"<div><p>Transportation of heavy loads is often performed by multi-axle multi-steered heavy duty vehicles In this article a novel nonlinear optimal control method is applied to the kinematic model of the five-axle and three-steering coupled vehicle system. First, it is proven that the dynamic model of this articulated multi-vehicle system is differentially flat. Next. the state-space model of the five-axle and three-steering vehicle system undergoes approximate linearization around a temporary operating point that is recomputed at each time-step of the control method. The linearization is based on Taylor series expansion and on the associated Jacobian matrices. For the linearized state-space model of the five-axle and three-steering vehicle system a stabilizing optimal (H-infinity) feedback controller is designed. This controller stands for the solution of the nonlinear optimal control problem under model uncertainty and external perturbations. To compute the controller’s feedback gains an algebraic Riccati equation is repetitively solved at each iteration of the control algorithm. The stability properties of the control method are proven through Lyapunov analysis. The proposed nonlinear optimal control approach achieves fast and accurate tracking of setpoints under moderate variations of the control inputs and minimal dispersion of energy by the propulsion and steering system of the five-axle and three-steering vehicle system.</p></div>","PeriodicalId":71187,"journal":{"name":"自主智能系统(英文)","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s43684-025-00097-x.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143861351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01DOI: 10.1016/j.vrih.2023.06.007
Simon Seibt , Bastian Kuth , Bartosz von Rymon Lipinski , Thomas Chang , Marc Erich Latoschik
Background
In recent years, the demand for interactive photorealistic three-dimensional (3D) environments has increased in various fields, including architecture, engineering, and entertainment. However, achieving a balance between the quality and efficiency of high-performance 3D applications and virtual reality (VR) remains challenging.
Methods
This study addresses this issue by revisiting and extending view interpolation for image-based rendering (IBR), which enables the exploration of spacious open environments in 3D and VR. Therefore, we introduce multimorphing, a novel rendering method based on the spatial data structure of 2D image patches, called the image graph. Using this approach, novel views can be rendered with up to six degrees of freedom using only a sparse set of views. The rendering process does not require 3D reconstruction of the geometry or per-pixel depth information, and all relevant data for the output are extracted from the local morphing cells of the image graph. The detection of parallax image regions during preprocessing reduces rendering artifacts by extrapolating image patches from adjacent cells in real-time. In addition, a GPU-based solution was presented to resolve exposure inconsistencies within a dataset, enabling seamless transitions of brightness when moving between areas with varying light intensities.
Results
Experiments on multiple real-world and synthetic scenes demonstrate that the presented method achieves high "VR-compatible" frame rates, even on mid-range and legacy hardware, respectively. While achieving adequate visual quality even for sparse datasets, it outperforms other IBR and current neural rendering approaches.
Conclusions
Using the correspondence-based decomposition of input images into morphing cells of 2D image patches, multidimensional image morphing provides high-performance novel view generation, supporting open 3D and VR environments. Nevertheless, the handling of morphing artifacts in the parallax image regions remains a topic for future research.
{"title":"Multidimensional image morphing-fast image-based rendering of open 3D and VR environments","authors":"Simon Seibt , Bastian Kuth , Bartosz von Rymon Lipinski , Thomas Chang , Marc Erich Latoschik","doi":"10.1016/j.vrih.2023.06.007","DOIUrl":"10.1016/j.vrih.2023.06.007","url":null,"abstract":"<div><h3>Background</h3><div>In recent years, the demand for interactive photorealistic three-dimensional (3D) environments has increased in various fields, including architecture, engineering, and entertainment. However, achieving a balance between the quality and efficiency of high-performance 3D applications and virtual reality (VR) remains challenging.</div></div><div><h3>Methods</h3><div>This study addresses this issue by revisiting and extending view interpolation for image-based rendering (IBR), which enables the exploration of spacious open environments in 3D and VR. Therefore, we introduce multimorphing, a novel rendering method based on the spatial data structure of 2D image patches, called the image graph. Using this approach, novel views can be rendered with up to six degrees of freedom using only a sparse set of views. The rendering process does not require 3D reconstruction of the geometry or per-pixel depth information, and all relevant data for the output are extracted from the local morphing cells of the image graph. The detection of parallax image regions during preprocessing reduces rendering artifacts by extrapolating image patches from adjacent cells in real-time. In addition, a GPU-based solution was presented to resolve exposure inconsistencies within a dataset, enabling seamless transitions of brightness when moving between areas with varying light intensities.</div></div><div><h3>Results</h3><div>Experiments on multiple real-world and synthetic scenes demonstrate that the presented method achieves high \"VR-compatible\" frame rates, even on mid-range and legacy hardware, respectively. While achieving adequate visual quality even for sparse datasets, it outperforms other IBR and current neural rendering approaches.</div></div><div><h3>Conclusions</h3><div>Using the correspondence-based decomposition of input images into morphing cells of 2D image patches, multidimensional image morphing provides high-performance novel view generation, supporting open 3D and VR environments. Nevertheless, the handling of morphing artifacts in the parallax image regions remains a topic for future research.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 2","pages":"Pages 155-172"},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01DOI: 10.1016/j.vrih.2024.07.003
Xiaoer Wu , Zhenhua Tan , Ziwei Cheng , Yuran Ru
Background
Lip reading uses lip images for visual speech recognition. Deep-learning-based lip reading has greatly improved performance in current datasets; however, most existing research ignores the significance of short-term temporal dependencies of lip-shape variations between adjacent frames, which leaves space for further improvement in feature extraction.
Methods
This article presents a spatiotemporal feature fusion network (STDNet) that compensates for the deficiencies of current lip-reading approaches in short-term temporal dependency modeling. Specifically, to distinguish more similar and intricate content, STDNet adds a temporal feature extraction branch based on a 3D-CNN, which enhances the learning of dynamic lip movements in adjacent frames while not affecting spatial feature extraction. In particular, we designed a local–temporal block, which aggregates interframe differences, strengthening the relationship between various local lip regions through multiscale convolution. We incorporated the squeeze-and-excitation mechanism into the Global-Temporal Block, which processes a single frame as an independent unitto learn temporal variations across the entire lip region more effectively. Furthermore, attention pooling was introduced to highlight meaningful frames containing key semantic information for the target word.
Results
Experimental results demonstrated STDNet's superior performance on the LRW and LRW-1000, achieving word-level recognition accuracies of 90.2% and 53.56%, respectively. Extensive ablation experiments verified the rationality and effectiveness of its modules.
Conclusions
The proposed model effectively addresses short-term temporal dependency limitations in lip reading, and improves the temporal robustness of the model against variable-length sequences. These advancements validate the importance of explicit short-term dynamics modeling for practical lip-reading systems.
{"title":"STDNet: Improved lip reading via short-term temporal dependency modeling","authors":"Xiaoer Wu , Zhenhua Tan , Ziwei Cheng , Yuran Ru","doi":"10.1016/j.vrih.2024.07.003","DOIUrl":"10.1016/j.vrih.2024.07.003","url":null,"abstract":"<div><h3>Background</h3><div>Lip reading uses lip images for visual speech recognition. Deep-learning-based lip reading has greatly improved performance in current datasets; however, most existing research ignores the significance of short-term temporal dependencies of lip-shape variations between adjacent frames, which leaves space for further improvement in feature extraction.</div></div><div><h3>Methods</h3><div>This article presents a spatiotemporal feature fusion network (STDNet) that compensates for the deficiencies of current lip-reading approaches in short-term temporal dependency modeling. Specifically, to distinguish more similar and intricate content, STDNet adds a temporal feature extraction branch based on a 3D-CNN, which enhances the learning of dynamic lip movements in adjacent frames while not affecting spatial feature extraction. In particular, we designed a local–temporal block, which aggregates interframe differences, strengthening the relationship between various local lip regions through multiscale convolution. We incorporated the squeeze-and-excitation mechanism into the Global-Temporal Block, which processes a single frame as an independent unitto learn temporal variations across the entire lip region more effectively. Furthermore, attention pooling was introduced to highlight meaningful frames containing key semantic information for the target word.</div></div><div><h3>Results</h3><div>Experimental results demonstrated STDNet's superior performance on the LRW and LRW-1000, achieving word-level recognition accuracies of 90.2% and 53.56%, respectively. Extensive ablation experiments verified the rationality and effectiveness of its modules.</div></div><div><h3>Conclusions</h3><div>The proposed model effectively addresses short-term temporal dependency limitations in lip reading, and improves the temporal robustness of the model against variable-length sequences. These advancements validate the importance of explicit short-term dynamics modeling for practical lip-reading systems.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 2","pages":"Pages 173-187"},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01DOI: 10.1016/j.vrih.2025.01.001
Claude Uwimana , Shengdi Zhou , Limei Yang , Zhuqing Li , Norbelt Mutagisha , Edouard Niyongabo , Bin Zhou
In this paper, we introduce an innovative method for computer-aided design (CAD) segmentation by concatenating meshes and CAD models. Many previous CAD segmentation methods have achieved impressive performance using single representations, such as meshes, CAD, and point clouds. However, existing methods cannot effectively combine different three-dimensional model types for the direct conversion, alignment, and integrity maintenance of geometric and topological information. Hence, we propose an integration approach that combines the geometric accuracy of CAD data with the flexibility of mesh representations, as well as introduce a unique hybrid representation that combines CAD and mesh models to enhance segmentation accuracy. To combine these two model types, our hybrid system utilizes advanced-neural-network techniques to convert CAD models into mesh models. For complex CAD models, model segmentation is crucial for model retrieval and reuse. In partial retrieval, it aims to segment a complex CAD model into several simple components. The first component of our hybrid system involves advanced mesh-labeling algorithms that harness the digitization of CAD properties to mesh models. The second component integrates labelled face features for CAD segmentation by leveraging the abundant multisemantic information embedded in CAD models. This combination of mesh and CAD not only refines the accuracy of boundary delineation but also provides a comprehensive understanding of the underlying object semantics. This study uses the Fusion 360 Gallery dataset. Experimental results indicate that our hybrid method can segment these models with higher accuracy than other methods that use single representations.
{"title":"Segmentation of CAD models using hybrid representation","authors":"Claude Uwimana , Shengdi Zhou , Limei Yang , Zhuqing Li , Norbelt Mutagisha , Edouard Niyongabo , Bin Zhou","doi":"10.1016/j.vrih.2025.01.001","DOIUrl":"10.1016/j.vrih.2025.01.001","url":null,"abstract":"<div><div>In this paper, we introduce an innovative method for computer-aided design (CAD) segmentation by concatenating meshes and CAD models. Many previous CAD segmentation methods have achieved impressive performance using single representations, such as meshes, CAD, and point clouds. However, existing methods cannot effectively combine different three-dimensional model types for the direct conversion, alignment, and integrity maintenance of geometric and topological information. Hence, we propose an integration approach that combines the geometric accuracy of CAD data with the flexibility of mesh representations, as well as introduce a unique hybrid representation that combines CAD and mesh models to enhance segmentation accuracy. To combine these two model types, our hybrid system utilizes advanced-neural-network techniques to convert CAD models into mesh models. For complex CAD models, model segmentation is crucial for model retrieval and reuse. In partial retrieval, it aims to segment a complex CAD model into several simple components. The first component of our hybrid system involves advanced mesh-labeling algorithms that harness the digitization of CAD properties to mesh models. The second component integrates labelled face features for CAD segmentation by leveraging the abundant multisemantic information embedded in CAD models. This combination of mesh and CAD not only refines the accuracy of boundary delineation but also provides a comprehensive understanding of the underlying object semantics. This study uses the Fusion 360 Gallery dataset. Experimental results indicate that our hybrid method can segment these models with higher accuracy than other methods that use single representations.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 2","pages":"Pages 188-202"},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01DOI: 10.1016/j.vrih.2025.02.001
Xiongjie Yin , Jinquan He , Zhanglin Cheng
Efficient three-dimensional (3D) building reconstruction from drone imagery often faces data acquisition, storage, and computational challenges because of its reliance on dense point clouds. In this study, we introduced a novel method for efficient and lightweight 3D building reconstruction from drone imagery using line clouds and sparse point clouds. Our approach eliminates the need to generate dense point clouds, and thus significantly reduces the computational burden by reconstructing 3D models directly from sparse data. We addressed the limitations of line clouds for plane detection and reconstruction by using a new algorithm. This algorithm projects 3D line clouds onto a 2D plane, clusters the projections to identify potential planes, and refines them using sparse point clouds to ensure an accurate and efficient model reconstruction. Extensive qualitative and quantitative experiments demonstrated the effectiveness of our method, demonstrating its superiority over existing techniques in terms of simplicity and efficiency.
{"title":"Efficient and lightweight 3D building reconstruction from drone imagery using sparse line and point clouds","authors":"Xiongjie Yin , Jinquan He , Zhanglin Cheng","doi":"10.1016/j.vrih.2025.02.001","DOIUrl":"10.1016/j.vrih.2025.02.001","url":null,"abstract":"<div><div>Efficient three-dimensional (3D) building reconstruction from drone imagery often faces data acquisition, storage, and computational challenges because of its reliance on dense point clouds. In this study, we introduced a novel method for efficient and lightweight 3D building reconstruction from drone imagery using line clouds and sparse point clouds. Our approach eliminates the need to generate dense point clouds, and thus significantly reduces the computational burden by reconstructing 3D models directly from sparse data. We addressed the limitations of line clouds for plane detection and reconstruction by using a new algorithm. This algorithm projects 3D line clouds onto a 2D plane, clusters the projections to identify potential planes, and refines them using sparse point clouds to ensure an accurate and efficient model reconstruction. Extensive qualitative and quantitative experiments demonstrated the effectiveness of our method, demonstrating its superiority over existing techniques in terms of simplicity and efficiency.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 2","pages":"Pages 111-126"},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01DOI: 10.1016/j.vrih.2024.08.005
Amir Azizi , Panayiotis Charalambous , Yiorgos Chrysanthou
Background
Efficient disaster victim detection (DVD) in urban areas after natural disasters is crucial for minimizing losses. However, conventional search and rescue (SAR) methods often experience delays, which can hinder the timely detection of victims. SAR teams face various challenges, including limited access to debris and collapsed structures, safety risks due to unstable conditions, and disrupted communication networks.
Methods
In this paper, we present DeepSafe, a novel two-level deep learning approach for multilevel classification and object detection using a simulated disaster victim dataset. DeepSafe first employs YOLOv8 to classify images into victim and non-victim categories. Subsequently, Detectron2 is used to precisely locate and outline the victims.
Results
Experimental results demonstrate the promising performance of DeepSafe in both victim classification and detection. The model effectively identified and located victims under the challenging conditions presented in the dataset.
Conclusion
DeepSafe offers a practical tool for real-time disaster management and SAR operations, significantly improving conventional methods by reducing delays and enhancing victim detection accuracy in disaster-stricken urban areas.
{"title":"DeepSafe:Two-level deep learning approach for disaster victims detection","authors":"Amir Azizi , Panayiotis Charalambous , Yiorgos Chrysanthou","doi":"10.1016/j.vrih.2024.08.005","DOIUrl":"10.1016/j.vrih.2024.08.005","url":null,"abstract":"<div><h3>Background</h3><div>Efficient disaster victim detection (DVD) in urban areas after natural disasters is crucial for minimizing losses. However, conventional search and rescue (SAR) methods often experience delays, which can hinder the timely detection of victims. SAR teams face various challenges, including limited access to debris and collapsed structures, safety risks due to unstable conditions, and disrupted communication networks.</div></div><div><h3>Methods</h3><div>In this paper, we present DeepSafe, a novel two-level deep learning approach for multilevel classification and object detection using a simulated disaster victim dataset. DeepSafe first employs YOLOv8 to classify images into victim and non-victim categories. Subsequently, Detectron2 is used to precisely locate and outline the victims.</div></div><div><h3>Results</h3><div>Experimental results demonstrate the promising performance of DeepSafe in both victim classification and detection. The model effectively identified and located victims under the challenging conditions presented in the dataset.</div></div><div><h3>Conclusion</h3><div>DeepSafe offers a practical tool for real-time disaster management and SAR operations, significantly improving conventional methods by reducing delays and enhancing victim detection accuracy in disaster-stricken urban areas.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 2","pages":"Pages 139-154"},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01DOI: 10.1016/j.vrih.2024.08.002
Tao Peng, Weiqiao Yin, Junping Liu, Li Li, Xinrong Hu
Background
The annotation of fashion images is a significantly important task in the fashion industry as well as social media and e-commerce. However, owing to the complexity and diversity of fashion images, this task entails multiple challenges, including the lack of fine-grained captions and confounders caused by dataset bias. Specifically, confounders often cause models to learn spurious correlations, thereby reducing their generalization capabilities.
Method
In this work, we propose the Deconfounded Fashion Image Captioning (DFIC) framework, which first uses multimodal retrieval to enrich the predicted captions of clothing, and then constructs a detailed causal graph using causal inference in the decoder to perform deconfounding. Multimodal retrieval is used to obtain semantic words related to image features, which are input into the decoder as prompt words to enrich sentence descriptions. In the decoder, causal inference is applied to disentangle visual and semantic features while concurrently eliminating visual and language confounding.
Results
Overall, our method can not only effectively enrich the captions of target images, but also greatly reduce confounders caused by the dataset. To verify the effectiveness of the proposed framework, the model was experimentally verified using the FACAD dataset.
{"title":"Deconfounded fashion image captioning with transformer and multimodal retrieval","authors":"Tao Peng, Weiqiao Yin, Junping Liu, Li Li, Xinrong Hu","doi":"10.1016/j.vrih.2024.08.002","DOIUrl":"10.1016/j.vrih.2024.08.002","url":null,"abstract":"<div><h3>Background</h3><div>The annotation of fashion images is a significantly important task in the fashion industry as well as social media and e-commerce. However, owing to the complexity and diversity of fashion images, this task entails multiple challenges, including the lack of fine-grained captions and confounders caused by dataset bias. Specifically, confounders often cause models to learn spurious correlations, thereby reducing their generalization capabilities.</div></div><div><h3>Method</h3><div>In this work, we propose the Deconfounded Fashion Image Captioning (DFIC) framework, which first uses multimodal retrieval to enrich the predicted captions of clothing, and then constructs a detailed causal graph using causal inference in the decoder to perform deconfounding. Multimodal retrieval is used to obtain semantic words related to image features, which are input into the decoder as prompt words to enrich sentence descriptions. In the decoder, causal inference is applied to disentangle visual and semantic features while concurrently eliminating visual and language confounding.</div></div><div><h3>Results</h3><div>Overall, our method can not only effectively enrich the captions of target images, but also greatly reduce confounders caused by the dataset. To verify the effectiveness of the proposed framework, the model was experimentally verified using the FACAD dataset.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 2","pages":"Pages 127-138"},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-20DOI: 10.1007/s43684-025-00095-z
Qiang Mei, Rui Huang, Duo Li, Jingyi Li, Nan Shi, Mei Du, Yingkang Zhong, Chunqi Tian
Federated learning (FL) is a technology that allows multiple devices to collaboratively train a global model without sharing original data, which is a hot topic in distributed intelligent systems. Combined with satellite network, FL can overcome the geographical limitation and achieve broader applications. However, it also faces the issues such as straggler effect, unreliable network environments and non-independent and identically distributed (Non-IID) samples. To address these problems, we propose an intelligent hierarchical FL system based on semi-asynchronous and scheduled synchronous control strategies in cloud-edge-client structure for satellite network. Our intelligent system effectively handles multiple client requests by distributing the communication load of the central cloud to various edge clouds. Additionally, the cloud server selection algorithm and the edge-client semi-asynchronous control strategy minimize clients’ waiting time, improving the overall efficiency of the FL process. Furthermore, the center-edge scheduled synchronous control strategy ensures the timeliness of partial models. Based on the experiment results, our proposed intelligent hierarchical FL system demonstrates a distinct advantage in global accuracy over traditional FedAvg, achieving 2% higher global accuracy within the same time frame and reducing 52% training time to achieve the target accuracy.
{"title":"Intelligent hierarchical federated learning system based on semi-asynchronous and scheduled synchronous control strategies in satellite network","authors":"Qiang Mei, Rui Huang, Duo Li, Jingyi Li, Nan Shi, Mei Du, Yingkang Zhong, Chunqi Tian","doi":"10.1007/s43684-025-00095-z","DOIUrl":"10.1007/s43684-025-00095-z","url":null,"abstract":"<div><p>Federated learning (FL) is a technology that allows multiple devices to collaboratively train a global model without sharing original data, which is a hot topic in distributed intelligent systems. Combined with satellite network, FL can overcome the geographical limitation and achieve broader applications. However, it also faces the issues such as straggler effect, unreliable network environments and non-independent and identically distributed (Non-IID) samples. To address these problems, we propose an intelligent hierarchical FL system based on semi-asynchronous and scheduled synchronous control strategies in cloud-edge-client structure for satellite network. Our intelligent system effectively handles multiple client requests by distributing the communication load of the central cloud to various edge clouds. Additionally, the cloud server selection algorithm and the edge-client semi-asynchronous control strategy minimize clients’ waiting time, improving the overall efficiency of the FL process. Furthermore, the center-edge scheduled synchronous control strategy ensures the timeliness of partial models. Based on the experiment results, our proposed intelligent hierarchical FL system demonstrates a distinct advantage in global accuracy over traditional FedAvg, achieving 2% higher global accuracy within the same time frame and reducing 52% training time to achieve the target accuracy.</p></div>","PeriodicalId":71187,"journal":{"name":"自主智能系统(英文)","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s43684-025-00095-z.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143655326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Road scene parsing is a crucial capability for self-driving vehicles and intelligent road inspection systems. Recent research has increasingly focused on enhancing driving safety and comfort by improving the detection of both drivable areas and road defects. This article reviews state-of-the-art networks developed over the past decade for both general-purpose semantic segmentation and specialized road scene parsing tasks. It also includes extensive experimental comparisons of these networks across five public datasets. Additionally, we explore the key challenges and emerging trends in the field, aiming to guide researchers toward developing next-generation models for more effective and reliable road scene parsing.
{"title":"A glance over the past decade: road scene parsing towards safe and comfortable autonomous driving","authors":"Rui Fan, Jiahang Li, Jiaqi Li, Jiale Wang, Ziwei Long, Ning Jia, Yanan Liu, Wenshuo Wang, Mohammud J. Bocus, Sergey Vityazev, Xieyuanli Chen, Junhao Xiao, Stepan Andreev, Huimin Lu, Alexander Dvorkovich","doi":"10.1007/s43684-025-00096-y","DOIUrl":"10.1007/s43684-025-00096-y","url":null,"abstract":"<div><p>Road scene parsing is a crucial capability for self-driving vehicles and intelligent road inspection systems. Recent research has increasingly focused on enhancing driving safety and comfort by improving the detection of both drivable areas and road defects. This article reviews state-of-the-art networks developed over the past decade for both general-purpose semantic segmentation and specialized road scene parsing tasks. It also includes extensive experimental comparisons of these networks across five public datasets. Additionally, we explore the key challenges and emerging trends in the field, aiming to guide researchers toward developing next-generation models for more effective and reliable road scene parsing.</p></div>","PeriodicalId":71187,"journal":{"name":"自主智能系统(英文)","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s43684-025-00096-y.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143602382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-05DOI: 10.1007/s43684-025-00089-x
Kezhou Chen, Tao Wang, Huimin Zhuo, Lianglun Cheng
Service Composition and Optimization Selection (SCOS) is crucial in Cloud Manufacturing (CMfg), but the uncertainties in service states and working environments pose challenges for existing QoS-based methods. Recently, digital twins have gained prominence in CMfg due to their predictive capabilities, enhancing the reliability of service composition. Heuristic algorithms are widely used in this field for their flexibility and compatibility with uncertain environments. This paper proposes the Whale-Goshawk Optimization Algorithm (WGO), which combines the Whale Optimization Algorithm (WOA) and Northern Goshawk Optimization Algorithm (NGO). A novel similar integer coding method, incorporating spatial feature information, addresses the limitations of traditional integer coding, while a whale-optimized prey generation strategy improves NGO’s global optimization efficiency. Additionally, a local search method based on similar integer coding enhances WGO’s local search ability. Experimental results demonstrate the effectiveness of the proposed approach.
{"title":"WGO: a similarly encoded whale-goshawk optimization algorithm for uncertain cloud manufacturing service composition","authors":"Kezhou Chen, Tao Wang, Huimin Zhuo, Lianglun Cheng","doi":"10.1007/s43684-025-00089-x","DOIUrl":"10.1007/s43684-025-00089-x","url":null,"abstract":"<div><p>Service Composition and Optimization Selection (SCOS) is crucial in Cloud Manufacturing (CMfg), but the uncertainties in service states and working environments pose challenges for existing QoS-based methods. Recently, digital twins have gained prominence in CMfg due to their predictive capabilities, enhancing the reliability of service composition. Heuristic algorithms are widely used in this field for their flexibility and compatibility with uncertain environments. This paper proposes the Whale-Goshawk Optimization Algorithm (WGO), which combines the Whale Optimization Algorithm (WOA) and Northern Goshawk Optimization Algorithm (NGO). A novel similar integer coding method, incorporating spatial feature information, addresses the limitations of traditional integer coding, while a whale-optimized prey generation strategy improves NGO’s global optimization efficiency. Additionally, a local search method based on similar integer coding enhances WGO’s local search ability. Experimental results demonstrate the effectiveness of the proposed approach.</p></div>","PeriodicalId":71187,"journal":{"name":"自主智能系统(英文)","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s43684-025-00089-x.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}