Pub Date : 2025-04-01Epub Date: 2025-01-04DOI: 10.1016/j.neunet.2024.107112
Yupeng Wang, Yongli Wang, Zaki Ahmad Khan, Anqi Huang, Jianghui Sang
Smoke is a critical indicator of forest fires, often detectable before flames ignite. Accurate smoke identification in remote sensing images is vital for effective forest fire monitoring within Internet of Things (IoT) systems. However, existing detection methods frequently falter in complex real-world scenarios, where variable smoke shapes and sizes, intricate backgrounds, and smoke-like phenomena (e.g., clouds and haze) lead to missed detections and false alarms. To address these challenges, we propose the Multi-level Feature Fusion Network (MFFNet), a novel framework grounded in contrastive learning. MFFNet begins by extracting multi-scale features from remote sensing images using a pre-trained ConvNeXt model, capturing information across different levels of granularity to accommodate variations in smoke appearance. The Attention Feature Enhancement Module further refines these multi-scale features, enhancing fine-grained, discriminative attributes relevant to smoke detection. Subsequently, the Bilinear Feature Fusion Module combines these enriched features, effectively reducing background interference and improving the model's ability to distinguish smoke from visually similar phenomena. Finally, contrastive feature learning is employed to improve robustness against intra-class variations by focusing on unique regions within the smoke patterns. Evaluated on the benchmark dataset USTC_SmokeRS, MFFNet achieves an accuracy of 98.87%. Additionally, our model demonstrates a detection rate of 94.54% on the extended E_SmokeRS dataset, with a low false alarm rate of 3.30%. These results highlight the effectiveness of MFFNet in recognizing smoke in remote sensing images, surpassing existing methodologies. The code is accessible at https://github.com/WangYuPeng1/MFFNet.
{"title":"Multi-level feature fusion networks for smoke recognition in remote sensing imagery.","authors":"Yupeng Wang, Yongli Wang, Zaki Ahmad Khan, Anqi Huang, Jianghui Sang","doi":"10.1016/j.neunet.2024.107112","DOIUrl":"10.1016/j.neunet.2024.107112","url":null,"abstract":"<p><p>Smoke is a critical indicator of forest fires, often detectable before flames ignite. Accurate smoke identification in remote sensing images is vital for effective forest fire monitoring within Internet of Things (IoT) systems. However, existing detection methods frequently falter in complex real-world scenarios, where variable smoke shapes and sizes, intricate backgrounds, and smoke-like phenomena (e.g., clouds and haze) lead to missed detections and false alarms. To address these challenges, we propose the Multi-level Feature Fusion Network (MFFNet), a novel framework grounded in contrastive learning. MFFNet begins by extracting multi-scale features from remote sensing images using a pre-trained ConvNeXt model, capturing information across different levels of granularity to accommodate variations in smoke appearance. The Attention Feature Enhancement Module further refines these multi-scale features, enhancing fine-grained, discriminative attributes relevant to smoke detection. Subsequently, the Bilinear Feature Fusion Module combines these enriched features, effectively reducing background interference and improving the model's ability to distinguish smoke from visually similar phenomena. Finally, contrastive feature learning is employed to improve robustness against intra-class variations by focusing on unique regions within the smoke patterns. Evaluated on the benchmark dataset USTC_SmokeRS, MFFNet achieves an accuracy of 98.87%. Additionally, our model demonstrates a detection rate of 94.54% on the extended E_SmokeRS dataset, with a low false alarm rate of 3.30%. These results highlight the effectiveness of MFFNet in recognizing smoke in remote sensing images, surpassing existing methodologies. The code is accessible at https://github.com/WangYuPeng1/MFFNet.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"184 ","pages":"107112"},"PeriodicalIF":6.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142967303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01Epub Date: 2025-01-06DOI: 10.1016/j.neunet.2024.107096
Xinlei Yu, Ahmed Elazab, Ruiquan Ge, Jichao Zhu, Lingyan Zhang, Gangyong Jia, Qing Wu, Xiang Wan, Lihua Li, Changmiao Wang
Accurately predicting intracerebral hemorrhage (ICH) prognosis is a critical and indispensable step in the clinical management of patients post-ICH. Recently, integrating artificial intelligence, particularly deep learning, has significantly enhanced prediction accuracy and alleviated neurosurgeons from the burden of manual prognosis assessment. However, uni-modal methods have shown suboptimal performance due to the intricate pathophysiology of the ICH. On the other hand, existing cross-modal approaches that incorporate tabular data have often failed to effectively extract complementary information and cross-modal features between modalities, thereby limiting their prognostic capabilities. This study introduces a novel cross-modal network, ICH-PRNet, designed to predict ICH prognosis outcomes. Specifically, we propose a joint-attention interaction encoder that effectively integrates computed tomography images and clinical texts within a unified representational space. Additionally, we define a multi-loss function comprising three components to comprehensively optimize cross-modal fusion capabilities. To balance the training process, we employ a self-adaptive dynamic prioritization algorithm that adjusts the weights of each component, accordingly. Our model, through these innovative designs, establishes robust semantic connections between modalities and uncovers rich, complementary cross-modal information, thereby achieving superior prediction results. Extensive experimental results and comparisons with state-of-the-art methods on both in-house and publicly available datasets unequivocally demonstrate the superiority and efficacy of the proposed method. Our code is at https://github.com/YU-deep/ICH-PRNet.git.
{"title":"ICH-PRNet: a cross-modal intracerebral haemorrhage prognostic prediction method using joint-attention interaction mechanism.","authors":"Xinlei Yu, Ahmed Elazab, Ruiquan Ge, Jichao Zhu, Lingyan Zhang, Gangyong Jia, Qing Wu, Xiang Wan, Lihua Li, Changmiao Wang","doi":"10.1016/j.neunet.2024.107096","DOIUrl":"10.1016/j.neunet.2024.107096","url":null,"abstract":"<p><p>Accurately predicting intracerebral hemorrhage (ICH) prognosis is a critical and indispensable step in the clinical management of patients post-ICH. Recently, integrating artificial intelligence, particularly deep learning, has significantly enhanced prediction accuracy and alleviated neurosurgeons from the burden of manual prognosis assessment. However, uni-modal methods have shown suboptimal performance due to the intricate pathophysiology of the ICH. On the other hand, existing cross-modal approaches that incorporate tabular data have often failed to effectively extract complementary information and cross-modal features between modalities, thereby limiting their prognostic capabilities. This study introduces a novel cross-modal network, ICH-PRNet, designed to predict ICH prognosis outcomes. Specifically, we propose a joint-attention interaction encoder that effectively integrates computed tomography images and clinical texts within a unified representational space. Additionally, we define a multi-loss function comprising three components to comprehensively optimize cross-modal fusion capabilities. To balance the training process, we employ a self-adaptive dynamic prioritization algorithm that adjusts the weights of each component, accordingly. Our model, through these innovative designs, establishes robust semantic connections between modalities and uncovers rich, complementary cross-modal information, thereby achieving superior prediction results. Extensive experimental results and comparisons with state-of-the-art methods on both in-house and publicly available datasets unequivocally demonstrate the superiority and efficacy of the proposed method. Our code is at https://github.com/YU-deep/ICH-PRNet.git.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"184 ","pages":"107096"},"PeriodicalIF":6.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142972996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01Epub Date: 2024-12-31DOI: 10.1016/j.neunet.2024.107098
Zhongyuan Lu, Jin Liu, Miaozhong Xu
Modifying the structure of an existing network is a common method to further improve the performance of the network. However, modifying some layers in network often results in pre-trained weight mismatch, and fine-tune process is time-consuming and resource-inefficient. To address this issue, we propose a novel technique called Identity Model Transformation (IMT), which keep the output before and after transformation in an equal form by rigorous algebraic transformations. This approach ensures the preservation of the original model's performance when modifying layers. Additionally, IMT significantly reduces the total training time required to achieve optimal results while further enhancing network performance. IMT has established a bridge for rapid transformation between model architectures, enabling a model to quickly perform analytic continuation and derive a family of tree-like models with better performance. This model family possesses a greater potential for optimization improvements compared to a single model. Extensive experiments across various object detection tasks validated the effectiveness and efficiency of our proposed IMT solution, which saved 94.76% time in fine-tuning the basic model YOLOv4-Rot on DOTA 1.5 dataset, and by using the IMT method, we saw stable performance improvements of 9.89%, 6.94%, 2.36%, and 4.86% on the four datasets: AI-TOD, DOTA1.5, coco2017, and MRSAText, respectively.
{"title":"Identity Model Transformation for boosting performance and efficiency in object detection network.","authors":"Zhongyuan Lu, Jin Liu, Miaozhong Xu","doi":"10.1016/j.neunet.2024.107098","DOIUrl":"10.1016/j.neunet.2024.107098","url":null,"abstract":"<p><p>Modifying the structure of an existing network is a common method to further improve the performance of the network. However, modifying some layers in network often results in pre-trained weight mismatch, and fine-tune process is time-consuming and resource-inefficient. To address this issue, we propose a novel technique called Identity Model Transformation (IMT), which keep the output before and after transformation in an equal form by rigorous algebraic transformations. This approach ensures the preservation of the original model's performance when modifying layers. Additionally, IMT significantly reduces the total training time required to achieve optimal results while further enhancing network performance. IMT has established a bridge for rapid transformation between model architectures, enabling a model to quickly perform analytic continuation and derive a family of tree-like models with better performance. This model family possesses a greater potential for optimization improvements compared to a single model. Extensive experiments across various object detection tasks validated the effectiveness and efficiency of our proposed IMT solution, which saved 94.76% time in fine-tuning the basic model YOLOv4-Rot on DOTA 1.5 dataset, and by using the IMT method, we saw stable performance improvements of 9.89%, 6.94%, 2.36%, and 4.86% on the four datasets: AI-TOD, DOTA1.5, coco2017, and MRSAText, respectively.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"184 ","pages":"107098"},"PeriodicalIF":6.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142957832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01Epub Date: 2025-01-03DOI: 10.1016/j.neunet.2024.107113
Varun Kumar, Somdatta Goswami, Katiana Kontolati, Michael D Shields, George Em Karniadakis
Multi-task learning (MTL) is an inductive transfer mechanism designed to leverage useful information from multiple tasks to improve generalization performance compared to single-task learning. It has been extensively explored in traditional machine learning to address issues such as data sparsity and overfitting in neural networks. In this work, we apply MTL to problems in science and engineering governed by partial differential equations (PDEs). However, implementing MTL in this context is complex, as it requires task-specific modifications to accommodate various scenarios representing different physical processes. To this end, we present a multi-task deep operator network (MT-DeepONet) to learn solutions across various functional forms of source terms in a PDE and multiple geometries in a single concurrent training session. We introduce modifications in the branch network of the vanilla DeepONet to account for various functional forms of a parameterized coefficient in a PDE. Additionally, we handle parameterized geometries by introducing a binary mask in the branch network and incorporating it into the loss term to improve convergence and generalization to new geometry tasks. Our approach is demonstrated on three benchmark problems: (1) learning different functional forms of the source term in the Fisher equation; (2) learning multiple geometries in a 2D Darcy Flow problem and showcasing better transfer learning capabilities to new geometries; and (3) learning 3D parameterized geometries for a heat transfer problem and demonstrate the ability to predict on new but similar geometries. Our MT-DeepONet framework offers a novel approach to solving PDE problems in engineering and science under a unified umbrella based on synergistic learning that reduces the overall training cost for neural operators.
{"title":"Synergistic learning with multi-task DeepONet for efficient PDE problem solving.","authors":"Varun Kumar, Somdatta Goswami, Katiana Kontolati, Michael D Shields, George Em Karniadakis","doi":"10.1016/j.neunet.2024.107113","DOIUrl":"10.1016/j.neunet.2024.107113","url":null,"abstract":"<p><p>Multi-task learning (MTL) is an inductive transfer mechanism designed to leverage useful information from multiple tasks to improve generalization performance compared to single-task learning. It has been extensively explored in traditional machine learning to address issues such as data sparsity and overfitting in neural networks. In this work, we apply MTL to problems in science and engineering governed by partial differential equations (PDEs). However, implementing MTL in this context is complex, as it requires task-specific modifications to accommodate various scenarios representing different physical processes. To this end, we present a multi-task deep operator network (MT-DeepONet) to learn solutions across various functional forms of source terms in a PDE and multiple geometries in a single concurrent training session. We introduce modifications in the branch network of the vanilla DeepONet to account for various functional forms of a parameterized coefficient in a PDE. Additionally, we handle parameterized geometries by introducing a binary mask in the branch network and incorporating it into the loss term to improve convergence and generalization to new geometry tasks. Our approach is demonstrated on three benchmark problems: (1) learning different functional forms of the source term in the Fisher equation; (2) learning multiple geometries in a 2D Darcy Flow problem and showcasing better transfer learning capabilities to new geometries; and (3) learning 3D parameterized geometries for a heat transfer problem and demonstrate the ability to predict on new but similar geometries. Our MT-DeepONet framework offers a novel approach to solving PDE problems in engineering and science under a unified umbrella based on synergistic learning that reduces the overall training cost for neural operators.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"184 ","pages":"107113"},"PeriodicalIF":6.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142967318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01Epub Date: 2024-12-31DOI: 10.1016/j.neunet.2024.107071
Azadeh Faroughi, Parham Moradi, Mahdi Jalili
Recommendation systems are vital tools for helping users discover content that suits their interests. Collaborative filtering methods are one of the techniques employed for analyzing interactions between users and items, which are typically stored in a sparse matrix. This inherent sparsity poses a challenge because it necessitates accurately and effectively filling in these gaps to provide users with meaningful and personalized recommendations. Our solution addresses sparsity in recommendations by incorporating diverse data sources, including trust statements and an imputation graph. The trust graph captures user relationships and trust levels, working in conjunction with an imputation graph, which is constructed by estimating the missing rates of each user based on the user-item matrix using the average rates of the most similar users. Combined with the user-item rating graph, an attention mechanism fine tunes the influence of these graphs, resulting in more personalized and effective recommendations. Our method consistently outperforms state-of-the-art recommenders in real-world dataset evaluations, underscoring its potential to strengthen recommendation systems and mitigate sparsity challenges.
{"title":"Enhancing Recommender Systems through Imputation and Social-Aware Graph Convolutional Neural Network.","authors":"Azadeh Faroughi, Parham Moradi, Mahdi Jalili","doi":"10.1016/j.neunet.2024.107071","DOIUrl":"10.1016/j.neunet.2024.107071","url":null,"abstract":"<p><p>Recommendation systems are vital tools for helping users discover content that suits their interests. Collaborative filtering methods are one of the techniques employed for analyzing interactions between users and items, which are typically stored in a sparse matrix. This inherent sparsity poses a challenge because it necessitates accurately and effectively filling in these gaps to provide users with meaningful and personalized recommendations. Our solution addresses sparsity in recommendations by incorporating diverse data sources, including trust statements and an imputation graph. The trust graph captures user relationships and trust levels, working in conjunction with an imputation graph, which is constructed by estimating the missing rates of each user based on the user-item matrix using the average rates of the most similar users. Combined with the user-item rating graph, an attention mechanism fine tunes the influence of these graphs, resulting in more personalized and effective recommendations. Our method consistently outperforms state-of-the-art recommenders in real-world dataset evaluations, underscoring its potential to strengthen recommendation systems and mitigate sparsity challenges.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"184 ","pages":"107071"},"PeriodicalIF":6.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142967247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-18DOI: 10.1007/s00236-025-00480-4
Philippe Schnoebelen, Isa Vialard
The piecewise complexity h(u) of a word is the minimal length of subwords needed to exactly characterise u. Its piecewise minimality index (rho (u)) is the smallest length k such that u is minimal among its order-k class ([u]_k) in Simon’s congruence. We initiate a study of these two descriptive complexity measures. Among other results, we provide efficient algorithms for computing h(u) and (rho (u)) for a given word u.
{"title":"On the piecewise complexity of words","authors":"Philippe Schnoebelen, Isa Vialard","doi":"10.1007/s00236-025-00480-4","DOIUrl":"10.1007/s00236-025-00480-4","url":null,"abstract":"<div><p>The piecewise complexity <i>h</i>(<i>u</i>) of a word is the minimal length of subwords needed to exactly characterise <i>u</i>. Its piecewise minimality index <span>(rho (u))</span> is the smallest length <i>k</i> such that <i>u</i> is minimal among its order-<i>k</i> class <span>([u]_k)</span> in Simon’s congruence. We initiate a study of these two descriptive complexity measures. Among other results, we provide efficient algorithms for computing <i>h</i>(<i>u</i>) and <span>(rho (u))</span> for a given word <i>u</i>.\u0000</p></div>","PeriodicalId":7189,"journal":{"name":"Acta Informatica","volume":"62 1","pages":""},"PeriodicalIF":0.4,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143431058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-18DOI: 10.1016/j.compeleceng.2025.110173
Zheng Wang , Yu Zhu , Yingjie Zhang , Siying Liu
In the complex underground mining environment, ensuring the correct wearing of personal protective equipment (PPE) is crucial for coal mine safety production. To overcome the limitations of existing PPE detection and tracking technologies, which often suffer from low precision, slow performance, and complex feature extraction processes, this paper introduces an enhanced, lightweight, and high-precision object detection network model based on YOLOv7. The proposed model incorporates a streamlined backbone feature extraction architecture that combines the Mobile Inverted Bottleneck Convolution module with the GhostBottleneck Lightweight module. This integration significantly improves the detection accuracy of miners’ PPE while simultaneously reducing the number of network parameters. Furthermore, the model adopts adaptive spatial feature fusion to enhance its capability in effectively integrating cross-scale features, thereby further boosting its detection performance. To enable continuous and stable tracking of miners’ PPE usage, this paper integrates the DeepSort tracking algorithm, which is based on OSNet, with the improved YOLOv7 detection model. This combination constructs an efficient video-based multi-object tracking algorithm, providing essential support for enhancing the tracking performance of coal miners’ PPE. Experimental results demonstrate that, compared to other state-of-the-art methods, the proposed model achieves a 2.25% increase in mean Average Precision (mAP), a 2.91% improvement in F1 score, a 0.41% enhancement in precision, and a 5.34% increase in recall for PPE detection. Additionally, it exhibits significant improvements in multi-object tracking metrics, with a 5.9% increase in Multi-Object Tracking Accuracy (MOTA), a 3.5% increase in Multi-Object Tracking Precision (MOTP), and a 6.2% increase in IDF1 score. These results fully validate the model’s efficient detection and tracking capabilities for miners’ PPE in complex underground mining environments.
{"title":"An effective deep learning approach enabling miners’ protective equipment detection and tracking using improved YOLOv7 architecture","authors":"Zheng Wang , Yu Zhu , Yingjie Zhang , Siying Liu","doi":"10.1016/j.compeleceng.2025.110173","DOIUrl":"10.1016/j.compeleceng.2025.110173","url":null,"abstract":"<div><div>In the complex underground mining environment, ensuring the correct wearing of personal protective equipment (PPE) is crucial for coal mine safety production. To overcome the limitations of existing PPE detection and tracking technologies, which often suffer from low precision, slow performance, and complex feature extraction processes, this paper introduces an enhanced, lightweight, and high-precision object detection network model based on YOLOv7. The proposed model incorporates a streamlined backbone feature extraction architecture that combines the Mobile Inverted Bottleneck Convolution module with the GhostBottleneck Lightweight module. This integration significantly improves the detection accuracy of miners’ PPE while simultaneously reducing the number of network parameters. Furthermore, the model adopts adaptive spatial feature fusion to enhance its capability in effectively integrating cross-scale features, thereby further boosting its detection performance. To enable continuous and stable tracking of miners’ PPE usage, this paper integrates the DeepSort tracking algorithm, which is based on OSNet, with the improved YOLOv7 detection model. This combination constructs an efficient video-based multi-object tracking algorithm, providing essential support for enhancing the tracking performance of coal miners’ PPE. Experimental results demonstrate that, compared to other state-of-the-art methods, the proposed model achieves a 2.25% increase in mean Average Precision (mAP), a 2.91% improvement in F1 score, a 0.41% enhancement in precision, and a 5.34% increase in recall for PPE detection. Additionally, it exhibits significant improvements in multi-object tracking metrics, with a 5.9% increase in Multi-Object Tracking Accuracy (MOTA), a 3.5% increase in Multi-Object Tracking Precision (MOTP), and a 6.2% increase in IDF1 score. These results fully validate the model’s efficient detection and tracking capabilities for miners’ PPE in complex underground mining environments.</div></div>","PeriodicalId":50630,"journal":{"name":"Computers & Electrical Engineering","volume":"123 ","pages":"Article 110173"},"PeriodicalIF":4.0,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143429578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-18DOI: 10.1016/j.engappai.2025.110270
Hengnian Qi , Qiuyi Xin , Jiabin Ye , Hao Yang , Kai Zhang , Chu Zhang , Qing Lang
Text recognition has become a key area of research due to its wide applications in various fields. As an important branch of computer vision, Chinese text recognition has gained increasing research and practical value. However, the existing Chinese text recognition methods are still limited. This paper proposes an innovative Chinese text recognition method, Multi-Scale Iterative Refinement for Robust Chinese Text Recognition (MIRROR). The model significantly improves the recognition accuracy of Chinese text through advanced algorithms and structural design. The MIRROR model consists of two core components: a feature extractor and a Next-Character Decoder. Specifically, this paper proposes a Spatial Local Self-Attention Module to enhance the model’s ability to model long-distance dependencies in complex character sequences, addressing the problem of complex distributions in medium-to-long distance Chinese character sequences. The Character Refinement Module effectively captures multi-scale information, handles stroke feature differences, and resolves inter-class similarity issues. By combining multi-scale feature extraction with iterative optimization for feature refinement, the model identifies common features across different styles of the same character, solves the intra-class variation problem, and improves model robustness. In addition, this paper introduces a Three-Dimensional Weight Attention Module to refine the granularity of character features. Experiments show that MIRROR significantly outperforms baseline models on Chinese benchmark datasets. On scene datasets, performance improves by 3.08% (from 76.90% to 79.98%), on web datasets by 1.46% (from 70.43% to 71.89%), on document datasets by 0.38% (from 98.72% to 99.10%), and on handwriting datasets by 9.29% (from 50.26% to 59.55%).
{"title":"MIRROR: Multi-scale iterative refinement for robust chinese text recognition","authors":"Hengnian Qi , Qiuyi Xin , Jiabin Ye , Hao Yang , Kai Zhang , Chu Zhang , Qing Lang","doi":"10.1016/j.engappai.2025.110270","DOIUrl":"10.1016/j.engappai.2025.110270","url":null,"abstract":"<div><div>Text recognition has become a key area of research due to its wide applications in various fields. As an important branch of computer vision, Chinese text recognition has gained increasing research and practical value. However, the existing Chinese text recognition methods are still limited. This paper proposes an innovative Chinese text recognition method, <em><strong>M</strong>ulti-Scale <strong>I</strong>terative <strong>R</strong>efinement for <strong>Ro</strong>bust Chinese Text <strong>R</strong>ecognition</em> (MIRROR). The model significantly improves the recognition accuracy of Chinese text through advanced algorithms and structural design. The MIRROR model consists of two core components: a feature extractor and a Next-Character Decoder. Specifically, this paper proposes a Spatial Local Self-Attention Module to enhance the model’s ability to model long-distance dependencies in complex character sequences, addressing the problem of complex distributions in medium-to-long distance Chinese character sequences. The Character Refinement Module effectively captures multi-scale information, handles stroke feature differences, and resolves inter-class similarity issues. By combining multi-scale feature extraction with iterative optimization for feature refinement, the model identifies common features across different styles of the same character, solves the intra-class variation problem, and improves model robustness. In addition, this paper introduces a Three-Dimensional Weight Attention Module to refine the granularity of character features. Experiments show that MIRROR significantly outperforms baseline models on Chinese benchmark datasets. On scene datasets, performance improves by 3.08% (from 76.90% to 79.98%), on web datasets by 1.46% (from 70.43% to 71.89%), on document datasets by 0.38% (from 98.72% to 99.10%), and on handwriting datasets by 9.29% (from 50.26% to 59.55%).</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"146 ","pages":"Article 110270"},"PeriodicalIF":7.5,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143429486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-18DOI: 10.1016/j.engappai.2025.110293
Abdelrahman Sakr , Ahmed R. El shamy , Haider Butt
Proper potential of hydrogen (pH) monitoring finds wide applications in environmental monitoring, clinical diagnostics, and a variety of industrial processes. However, traditional pH sensors normally present several challenges related to adaptability, portability, and environmental compatibility. In addition, the recently developed hydrogel-based sensors have manifested several advantages due to the flexibility and biocompatibility of the material in a wide variety of applications. While much advancement has been made in integration techniques, further advances need improvement in precision and reliability. The present work describes a novel methodology of pH sensing through integration of hydrogel-based sensors with machine learning algorithms. pH-sensitive dye-impregnated hydrogel sensors have been fabricated using three-Dimensional (3D) printing technology, whereby colorimetric data analysis is combined with five machine learning models, namely Decision Trees, eXtreme Gradient Boosting, K-Nearest Neighbours, Random Forests, and Neural Networks, in the classification of pH based on Red, Green, Blue (RGB) data. The sensor designed can detect pH between 4 and 10 pH with high speed, stability, and reversibility. With precision, recall, and F1-scores all above 99%, this shows how efficient the classification approach is based on RGB and gives weight to the potential of the developed sensors for real-time applications in monitoring and diagnostics, hence making a big contribution to the evolution of pH sensing and paving the way for smarter, more adaptable sensor solutions.
{"title":"Innovative integration of machine learning and colorimetry for precise potential of hydrogen monitoring in printed hydrogel sensors","authors":"Abdelrahman Sakr , Ahmed R. El shamy , Haider Butt","doi":"10.1016/j.engappai.2025.110293","DOIUrl":"10.1016/j.engappai.2025.110293","url":null,"abstract":"<div><div>Proper potential of hydrogen (pH) monitoring finds wide applications in environmental monitoring, clinical diagnostics, and a variety of industrial processes. However, traditional pH sensors normally present several challenges related to adaptability, portability, and environmental compatibility. In addition, the recently developed hydrogel-based sensors have manifested several advantages due to the flexibility and biocompatibility of the material in a wide variety of applications. While much advancement has been made in integration techniques, further advances need improvement in precision and reliability. The present work describes a novel methodology of pH sensing through integration of hydrogel-based sensors with machine learning algorithms. pH-sensitive dye-impregnated hydrogel sensors have been fabricated using three-Dimensional (3D) printing technology, whereby colorimetric data analysis is combined with five machine learning models, namely Decision Trees, eXtreme Gradient Boosting, K-Nearest Neighbours, Random Forests, and Neural Networks, in the classification of pH based on Red, Green, Blue (RGB) data. The sensor designed can detect pH between 4 and 10 pH with high speed, stability, and reversibility. With precision, recall, and F1-scores all above 99%, this shows how efficient the classification approach is based on RGB and gives weight to the potential of the developed sensors for real-time applications in monitoring and diagnostics, hence making a big contribution to the evolution of pH sensing and paving the way for smarter, more adaptable sensor solutions.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"146 ","pages":"Article 110293"},"PeriodicalIF":7.5,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143429487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-18DOI: 10.1016/j.conengprac.2025.106282
Kaixin Cui , Wenjing Wu , Jun Shang , Dawei Shi
Alarm systems are essential for the safety maintenance and health management of industrial systems. In this work, a dynamic alarm monitoring approach with data-driven ellipsoidal threshold learning is proposed, and an unknown system is directly learned using noisy data without model identification. An ellipsoid-based normal operating zone of the system variable is iteratively predicted based on system dynamics, and is updated as an external approximation of the intersection of a predicted ellipsoid and a measurement-based ellipsoid with an event-triggering condition. Then, the dynamic alarm limits are calculated for each dimension of the output by an ellipsoid-based quadratic equation, and a projection strategy from output points to the predicted ellipsoids is designed to have two different solutions to the equation. The effectiveness of the proposed dynamic alarm monitoring approach is illustrated by experimental results on the sensor fault and actuator fault detection of an ultrasonic motor with and without an event-triggering condition, respectively.
{"title":"Dynamic alarm monitoring with data-driven ellipsoidal threshold learning","authors":"Kaixin Cui , Wenjing Wu , Jun Shang , Dawei Shi","doi":"10.1016/j.conengprac.2025.106282","DOIUrl":"10.1016/j.conengprac.2025.106282","url":null,"abstract":"<div><div>Alarm systems are essential for the safety maintenance and health management of industrial systems. In this work, a dynamic alarm monitoring approach with data-driven ellipsoidal threshold learning is proposed, and an unknown system is directly learned using noisy data without model identification. An ellipsoid-based normal operating zone of the system variable is iteratively predicted based on system dynamics, and is updated as an external approximation of the intersection of a predicted ellipsoid and a measurement-based ellipsoid with an event-triggering condition. Then, the dynamic alarm limits are calculated for each dimension of the output by an ellipsoid-based quadratic equation, and a projection strategy from output points to the predicted ellipsoids is designed to have two different solutions to the equation. The effectiveness of the proposed dynamic alarm monitoring approach is illustrated by experimental results on the sensor fault and actuator fault detection of an ultrasonic motor with and without an event-triggering condition, respectively.</div></div>","PeriodicalId":50615,"journal":{"name":"Control Engineering Practice","volume":"158 ","pages":"Article 106282"},"PeriodicalIF":5.4,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143429975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}