首页 > 最新文献

2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)最新文献

英文 中文
PARC: A Processing-in-CAM Architecture for Genomic Long Read Pairwise Alignment using ReRAM PARC:使用ReRAM进行基因组长读配对比对的cam处理架构
Pub Date : 2020-01-01 DOI: 10.1109/ASP-DAC47756.2020.9045555
Fan Chen, Linghao Song, Hai Li, Yiran Chen
Technological advances in long read sequences have greatly facilitated the development of genomics. However, managing and analyzing the raw genomic data that outpaces Moore’s Law requires extremely high computational efficiency. On the one hand, existing software solutions can take hundreds of CPU hours to complete human genome alignment. On the other hand, the recently proposed hardware platforms achieve low processing throughput with significant overhead. In this paper, we propose PARC, an Processing-in-Memory architecture for long read pairwise alignment leveraging emerging resistive CAM (content-addressable memory) to accelerate the bottleneck chaining step in DNA alignment. Chaining takes 2-tuple anchors as inputs and identifies a set of correlated anchors as potential alignment candidates. Unlike traditional main memory which organizes relational data structure in a linear address space, PARC stores tuples in two neighboring crossbar arrays with shared row decoder such that column-wise in-memory computational operations and row-wise memory accesses can be performed in-situ in a symmetric crossbar structure. Compared to both software tools and state-of-the-art accelerators, PARC shows significant improvement in alignment throughput and energy efficiency, thanks to the in-site computation capability and optimized data mapping.
长读序列技术的进步极大地促进了基因组学的发展。然而,管理和分析原始基因组数据需要极高的计算效率,这超过了摩尔定律。一方面,现有的软件解决方案可能需要数百个CPU小时来完成人类基因组测序。另一方面,最近提出的硬件平台实现了低处理吞吐量和显著的开销。在本文中,我们提出了PARC,这是一种利用新兴的电阻性CAM(内容可寻址存储器)来加速DNA比对瓶颈链步骤的长读成对比对的内存处理架构。链接将二元组锚作为输入,并识别一组相关锚作为潜在的对齐候选。与在线性地址空间中组织关系数据结构的传统主存不同,PARC将元组存储在两个相邻的交叉栏数组中,并具有共享的行解码器,从而可以在对称交叉栏结构中就地执行按列内存计算操作和按行内存访问。与软件工具和最先进的加速器相比,由于现场计算能力和优化的数据映射,PARC在校准吞吐量和能源效率方面有了显着提高。
{"title":"PARC: A Processing-in-CAM Architecture for Genomic Long Read Pairwise Alignment using ReRAM","authors":"Fan Chen, Linghao Song, Hai Li, Yiran Chen","doi":"10.1109/ASP-DAC47756.2020.9045555","DOIUrl":"https://doi.org/10.1109/ASP-DAC47756.2020.9045555","url":null,"abstract":"Technological advances in long read sequences have greatly facilitated the development of genomics. However, managing and analyzing the raw genomic data that outpaces Moore’s Law requires extremely high computational efficiency. On the one hand, existing software solutions can take hundreds of CPU hours to complete human genome alignment. On the other hand, the recently proposed hardware platforms achieve low processing throughput with significant overhead. In this paper, we propose PARC, an Processing-in-Memory architecture for long read pairwise alignment leveraging emerging resistive CAM (content-addressable memory) to accelerate the bottleneck chaining step in DNA alignment. Chaining takes 2-tuple anchors as inputs and identifies a set of correlated anchors as potential alignment candidates. Unlike traditional main memory which organizes relational data structure in a linear address space, PARC stores tuples in two neighboring crossbar arrays with shared row decoder such that column-wise in-memory computational operations and row-wise memory accesses can be performed in-situ in a symmetric crossbar structure. Compared to both software tools and state-of-the-art accelerators, PARC shows significant improvement in alignment throughput and energy efficiency, thanks to the in-site computation capability and optimized data mapping.","PeriodicalId":125112,"journal":{"name":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127449712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Benchmark Non-volatile and Volatile Memory Based Hybrid Precision Synapses for In-situ Deep Neural Network Training 基于基准非易失性和易失性记忆的混合精确突触原位深度神经网络训练
Pub Date : 2020-01-01 DOI: 10.1109/ASP-DAC47756.2020.9045288
Yandong Luo, Shimeng Yu
Compute-in-memory (CIM) with emerging non-volatile memories (eNVMs) is time and energy efficient for deep neural network (DNN) inference. However, challenges still remain for in-situ DNN training with eNVMs due to the asymmetric weight update behavior, high programming latency and energy consumption. To overcome these challenges, a hybrid precision synapse combining eNVMs with capacitor has been proposed. It leverages the symmetric and fast weight update in the volatile capacitor, as well as the non-volatility and large dynamic range of the eNVMs. In this paper, in-situ DNN training architecture with hybrid precision synapses is proposed and benchmarked with the modified NeuroSim simulator. First, all the circuit modules required for in-situ training with hybrid precision synapses are designed. Then, the impact of weight transfer interval and limited capacitor retention time on training accuracy is investigated by incorporating hardware properties into Tensorflow simulation. Finally, a system-level benchmark is conducted for hybrid precision synapse compared with baseline design that is solely based on eNVMs.
具有新兴非易失性存储器(envm)的内存计算(CIM)对于深度神经网络(DNN)推理具有时间和能量效率。然而,由于不对称的权值更新行为、较高的编程延迟和能量消耗,使用envm进行原位DNN训练仍然存在挑战。为了克服这些挑战,提出了一种结合envm和电容器的混合精密突触。它利用了易失性电容器的对称和快速权重更新,以及envm的非易失性和大动态范围。本文提出了一种基于混合精度突触的DNN原位训练架构,并用改进的NeuroSim模拟器进行了基准测试。首先,设计了混合精确突触原位训练所需的所有电路模块。然后,通过将硬件特性纳入Tensorflow仿真,研究了权传递间隔和有限电容保留时间对训练精度的影响。最后,对混合精度突触进行了系统级基准测试,并与单纯基于envm的基线设计进行了比较。
{"title":"Benchmark Non-volatile and Volatile Memory Based Hybrid Precision Synapses for In-situ Deep Neural Network Training","authors":"Yandong Luo, Shimeng Yu","doi":"10.1109/ASP-DAC47756.2020.9045288","DOIUrl":"https://doi.org/10.1109/ASP-DAC47756.2020.9045288","url":null,"abstract":"Compute-in-memory (CIM) with emerging non-volatile memories (eNVMs) is time and energy efficient for deep neural network (DNN) inference. However, challenges still remain for in-situ DNN training with eNVMs due to the asymmetric weight update behavior, high programming latency and energy consumption. To overcome these challenges, a hybrid precision synapse combining eNVMs with capacitor has been proposed. It leverages the symmetric and fast weight update in the volatile capacitor, as well as the non-volatility and large dynamic range of the eNVMs. In this paper, in-situ DNN training architecture with hybrid precision synapses is proposed and benchmarked with the modified NeuroSim simulator. First, all the circuit modules required for in-situ training with hybrid precision synapses are designed. Then, the impact of weight transfer interval and limited capacitor retention time on training accuracy is investigated by incorporating hardware properties into Tensorflow simulation. Finally, a system-level benchmark is conducted for hybrid precision synapse compared with baseline design that is solely based on eNVMs.","PeriodicalId":125112,"journal":{"name":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133327059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Emerging memories as enablers for in-memory layout transformation acceleration and virtualization 新兴内存作为内存内布局转换加速和虚拟化的推动者
Pub Date : 2020-01-01 DOI: 10.1109/ASP-DAC47756.2020.9045410
M. Liao, J. Sampson
Recent works have shown that certain classes of emerging memory technologies lend themselves to organizations that offer equally dense access support for patterns with multiple strides, such as row-column memories. However, with few exceptions, these prior works have only considered such multi-orientation memories (MOMs) and MOM-caching techniques in the context of traditional processor architectures. In this work, we explore the potential for leveraging the capabilities of MOMs to present multiple concurrent views of data organization within the memory hierarchy as a means to offload and overlap inter-kernel marshalling, a range of data layout transformations, and even lazy construction of derivative data structures to work performed by the MOM-capable memories and caches themselves. We demonstrate the potential of MOM-offloading to improve performance and reduce data movement for select computation patterns and describe the application of the approach to broader classes of processing in memory workloads.
最近的研究表明,某些类别的新兴内存技术适合为具有多个跨步的模式提供同样密集的访问支持的组织,例如行-列内存。然而,除了少数例外,这些先前的工作只在传统处理器架构的背景下考虑了这种多方向存储器(mom)和mom缓存技术。在这项工作中,我们探索了利用mom的功能在内存层次结构中呈现数据组织的多个并发视图的潜力,作为卸载和重叠内核间编组、一系列数据布局转换、甚至派生数据结构的惰性构造的一种手段,以执行由mom功能的内存和缓存本身执行的工作。我们展示了MOM-offloading在提高性能和减少选定计算模式的数据移动方面的潜力,并描述了该方法在内存工作负载中更广泛的处理类别中的应用。
{"title":"Emerging memories as enablers for in-memory layout transformation acceleration and virtualization","authors":"M. Liao, J. Sampson","doi":"10.1109/ASP-DAC47756.2020.9045410","DOIUrl":"https://doi.org/10.1109/ASP-DAC47756.2020.9045410","url":null,"abstract":"Recent works have shown that certain classes of emerging memory technologies lend themselves to organizations that offer equally dense access support for patterns with multiple strides, such as row-column memories. However, with few exceptions, these prior works have only considered such multi-orientation memories (MOMs) and MOM-caching techniques in the context of traditional processor architectures. In this work, we explore the potential for leveraging the capabilities of MOMs to present multiple concurrent views of data organization within the memory hierarchy as a means to offload and overlap inter-kernel marshalling, a range of data layout transformations, and even lazy construction of derivative data structures to work performed by the MOM-capable memories and caches themselves. We demonstrate the potential of MOM-offloading to improve performance and reduce data movement for select computation patterns and describe the application of the approach to broader classes of processing in memory workloads.","PeriodicalId":125112,"journal":{"name":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131150640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Circuit Approaches to Low-Power Multi-Level/Cell FeFET Memory 低功耗多电平/单元ffet存储器的自适应电路方法
Pub Date : 2020-01-01 DOI: 10.1109/ASP-DAC47756.2020.9045106
Juejian Wu, Yixin Xu, Bowen Xue, Yu Wang, Yongpan Liu, Huazhong Yang, Xueqing Li
Ferroelectric FETs (FeFETs) have emerged as a promising multi-level/cell (MLC) nonvolatile memory (NVM) candidate for low-power applications. This originates from the advantages of both efficient memory access and intrinsic device-level in-memory computing flexibilities. However, there still exist challenges for FeFET MLC NVM: (i) high power consumption in read operations due to high-gain requirement for sense amplifiers during sensing, and (ii) high latency and energy consumption in write operations with conventional recursive program-and-verify. Targeting at lower power, less latency, and higher density, this work investigates and optimizes the read and write approaches to MLC FeFET NVM design: (i) Adaptive FeFET memory State Mapping (ASM) between the FeFET drain-source current and the digital states to increase the sensing margin; (ii) Adaptive FeFET Gate Biasing (AGB) read methods that adopt the optimized FeFET gate voltage to boost the sensible dynamic range and to store more levels of states per cell; (iii) Adaptive Prediction-based Direct (APD) write methods that minimize the program-andverify activities. Evaluations show significant latency and energy improvement. Furthermore, the number of sensible levels of states per cell is also increased with an enhanced dynamic sensing range and an enhanced sensing margin.
铁电场效应管(fefet)已成为低功耗应用中有前途的多层次/单元(MLC)非易失性存储器(NVM)候选材料。这源于高效的内存访问和固有的设备级内存计算灵活性的优势。然而,ffet MLC NVM仍然存在挑战:(1)由于在传感过程中对感测放大器的高增益要求,读取操作功耗高;(2)传统递归编程和验证的写操作延迟和能耗高。针对低功耗、低延迟和高密度的目标,本工作研究并优化了MLC ffet NVM设计的读写方法:(i)在ffet漏源电流和数字状态之间进行自适应ffet存储状态映射(ASM),以增加传感裕度;(ii)自适应ffet栅极偏置(AGB)读取方法,采用优化的ffet栅极电压来提高敏感动态范围,并在每个单元中存储更多的状态电平;(iii)基于自适应预测的直接(APD)编写方法,将程序和验证活动最小化。评估显示显著的延迟和能量改善。此外,随着动态感知范围和感知裕度的增强,每个细胞的感知状态水平的数量也增加了。
{"title":"Adaptive Circuit Approaches to Low-Power Multi-Level/Cell FeFET Memory","authors":"Juejian Wu, Yixin Xu, Bowen Xue, Yu Wang, Yongpan Liu, Huazhong Yang, Xueqing Li","doi":"10.1109/ASP-DAC47756.2020.9045106","DOIUrl":"https://doi.org/10.1109/ASP-DAC47756.2020.9045106","url":null,"abstract":"Ferroelectric FETs (FeFETs) have emerged as a promising multi-level/cell (MLC) nonvolatile memory (NVM) candidate for low-power applications. This originates from the advantages of both efficient memory access and intrinsic device-level in-memory computing flexibilities. However, there still exist challenges for FeFET MLC NVM: (i) high power consumption in read operations due to high-gain requirement for sense amplifiers during sensing, and (ii) high latency and energy consumption in write operations with conventional recursive program-and-verify. Targeting at lower power, less latency, and higher density, this work investigates and optimizes the read and write approaches to MLC FeFET NVM design: (i) Adaptive FeFET memory State Mapping (ASM) between the FeFET drain-source current and the digital states to increase the sensing margin; (ii) Adaptive FeFET Gate Biasing (AGB) read methods that adopt the optimized FeFET gate voltage to boost the sensible dynamic range and to store more levels of states per cell; (iii) Adaptive Prediction-based Direct (APD) write methods that minimize the program-andverify activities. Evaluations show significant latency and energy improvement. Furthermore, the number of sensible levels of states per cell is also increased with an enhanced dynamic sensing range and an enhanced sensing margin.","PeriodicalId":125112,"journal":{"name":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130793011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
RRAM-VAC: A Variability-Aware Controller for RRAM-based Memory Architectures RRAM-VAC:基于rram的存储器结构的可变感知控制器
Pub Date : 2020-01-01 DOI: 10.1109/ASP-DAC47756.2020.9045220
Shikhar Tuli, M. Rios, A. Levisse, David Atienza Alonso
The growing need for connected, smart and energy efficient devices requires them to provide both ultra-low standby power and relatively high computing capabilities when awoken. In this context, emerging resistive memory technologies (RRAM) appear as a promising solution as they enable cheap fine grain technology co-integration with CMOS, fast switching and non-volatile storage. However, RRAM technologies suffer from fundamental flaws such as a strong device-to-device and cycle-to-cycle variability which is worsened by aging, forcing the designers to consider worst case design conditions. In this work, we propose, for the first time, a circuit that can take advantage of recently published Write Termination (WT) circuits from both the energy and performances point of view. The proposed RRAM Variability Aware Controller (RRAM-VAC) stores and then coalesces the write requests from the processor before triggering the actual write process. By doing so, it averages the RRAM variability and enables the system to run at the memory programming time distribution mean rather than the worst case tail. We explore the design space of the proposed solution for various RRAM variability specifications, benchmark the effect of the proposed memory controller with real application memory traces and show (for the considered RRAM technology specifications) 44 % to 50 % performances improvement and from 10% to 85% energy gains depending on the application memory access patterns.
对联网、智能和节能设备日益增长的需求要求它们在唤醒时提供超低待机功率和相对较高的计算能力。在这种情况下,新兴的电阻式存储技术(RRAM)成为一种很有前途的解决方案,因为它们使廉价的细粒度技术与CMOS、快速开关和非易失性存储协同集成。然而,RRAM技术存在一些根本性的缺陷,比如器件间和周期间的可变性,这种可变性会因老化而恶化,这迫使设计师考虑最坏的设计情况。在这项工作中,我们首次从能量和性能的角度提出了一种可以利用最近发表的写终止(WT)电路的电路。所提出的RRAM可变性感知控制器(RRAM- vac)在触发实际写过程之前存储并合并来自处理器的写请求。通过这样做,它平均了RRAM可变性,并使系统能够在内存编程时间分布平均值上运行,而不是在最坏情况下运行。我们探索了针对各种RRAM可变性规范提出的解决方案的设计空间,用真实的应用程序内存跟踪对提出的内存控制器的效果进行基准测试,并根据应用程序内存访问模式显示(对于考虑的RRAM技术规范)44%至50%的性能改进和10%至85%的能量增益。
{"title":"RRAM-VAC: A Variability-Aware Controller for RRAM-based Memory Architectures","authors":"Shikhar Tuli, M. Rios, A. Levisse, David Atienza Alonso","doi":"10.1109/ASP-DAC47756.2020.9045220","DOIUrl":"https://doi.org/10.1109/ASP-DAC47756.2020.9045220","url":null,"abstract":"The growing need for connected, smart and energy efficient devices requires them to provide both ultra-low standby power and relatively high computing capabilities when awoken. In this context, emerging resistive memory technologies (RRAM) appear as a promising solution as they enable cheap fine grain technology co-integration with CMOS, fast switching and non-volatile storage. However, RRAM technologies suffer from fundamental flaws such as a strong device-to-device and cycle-to-cycle variability which is worsened by aging, forcing the designers to consider worst case design conditions. In this work, we propose, for the first time, a circuit that can take advantage of recently published Write Termination (WT) circuits from both the energy and performances point of view. The proposed RRAM Variability Aware Controller (RRAM-VAC) stores and then coalesces the write requests from the processor before triggering the actual write process. By doing so, it averages the RRAM variability and enables the system to run at the memory programming time distribution mean rather than the worst case tail. We explore the design space of the proposed solution for various RRAM variability specifications, benchmark the effect of the proposed memory controller with real application memory traces and show (for the considered RRAM technology specifications) 44 % to 50 % performances improvement and from 10% to 85% energy gains depending on the application memory access patterns.","PeriodicalId":125112,"journal":{"name":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133096612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Modulo Scheduling with Rational Initiation Intervals in Custom Hardware Design 自定义硬件设计中具有合理起始间隔的模调度
Pub Date : 2020-01-01 DOI: 10.1109/ASP-DAC47756.2020.9045616
Patrick Sittel, John Wickerson, M. Kumm, P. Zipf
In modulo scheduling, the number of clock cycles between successive inputs (the initiation interval, II) is traditionally an integer, but in this paper, we explore the benefits of allowing it to be a rational number. This rational II can be interpreted as the average number of clock cycles between successive inputs. As the minimum rational II can be less than the minimum integer II, this translates to higher throughput. We formulate rational-II modulo scheduling as an integer linear programming (ILP) problem that is able to find latency-optimal schedules for a fixed rational II. We have applied our scheduler to a standard benchmark of hardware designs, and our results demonstrate a significant speedup compared to state-of-the-art integer-II and rational-II formulations.
在模调度中,连续输入之间的时钟周期数(起始间隔,II)传统上是一个整数,但在本文中,我们探讨了允许它是有理数的好处。这个有理数II可以解释为连续输入之间的平均时钟周期数。由于最小有理数II可以小于最小整数II,这意味着更高的吞吐量。我们将有理-II模调度表述为一个整数线性规划(ILP)问题,该问题能够找到一个固定有理-II的延迟最优调度。我们将调度器应用于硬件设计的标准基准测试,结果表明,与最先进的integer-II和rational-II公式相比,调度器有了显著的加速。
{"title":"Modulo Scheduling with Rational Initiation Intervals in Custom Hardware Design","authors":"Patrick Sittel, John Wickerson, M. Kumm, P. Zipf","doi":"10.1109/ASP-DAC47756.2020.9045616","DOIUrl":"https://doi.org/10.1109/ASP-DAC47756.2020.9045616","url":null,"abstract":"In modulo scheduling, the number of clock cycles between successive inputs (the initiation interval, II) is traditionally an integer, but in this paper, we explore the benefits of allowing it to be a rational number. This rational II can be interpreted as the average number of clock cycles between successive inputs. As the minimum rational II can be less than the minimum integer II, this translates to higher throughput. We formulate rational-II modulo scheduling as an integer linear programming (ILP) problem that is able to find latency-optimal schedules for a fixed rational II. We have applied our scheduler to a standard benchmark of hardware designs, and our results demonstrate a significant speedup compared to state-of-the-art integer-II and rational-II formulations.","PeriodicalId":125112,"journal":{"name":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"159 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123535756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Maximizing the Communication Parallelism for Wavelength-Routed Optical Networks-On-Chips 最大化波长路由光网络片上通信并行性
Pub Date : 2020-01-01 DOI: 10.1109/ASP-DAC47756.2020.9045163
Mengchu Li, Tsun-Ming Tseng, Mahdi Tala, Ulf Schlichtmann
Enabled by recent development in silicon photonics, wavelength-routed optical networks-on-chips (WRONoCs) emerge as an appealing next-generation architecture for the communication in multiprocessor system-on-chip. WRONoCs apply a passive routing mechanism that statically reserves all data transmission paths at design time, and are thus able to avoid the latency and energy overhead for arbitration, compared to other ONoC architectures. Current research mostly assumes that in a WRONoC topology, each initiator node sends one bit at a time to a target node. However, the communication parallelism can be increased by assigning multiple wavelengths to each path, which requires a systematic analysis of the physical parameters of the silicon microring resonators and the wavelength usage among different paths. This work proposes a mathematical modeling method to maximize the communication parallelism of a given WRONoC topology, which provides a foundation for exploiting the bandwidth potential of WRONoCs. Experimental results show that the proposed method significantly outperforms the state-of-the-art approach, and is especially suitable for application-specific WRONoC topologies.
由于硅光子学的最新发展,波长路由片上光网络(WRONoCs)成为多处理器片上系统通信的下一代架构。与其他ONoC架构相比,wronoc采用被动路由机制,在设计时静态保留所有数据传输路径,因此能够避免仲裁的延迟和能量开销。目前的研究大多假设在WRONoC拓扑中,每个启动节点每次向目标节点发送一个比特。然而,可以通过为每条路径分配多个波长来增加通信并行性,这需要系统地分析硅微环谐振器的物理参数和不同路径之间的波长使用情况。本文提出了一种数学建模方法来最大化给定WRONoC拓扑的通信并行性,这为开发WRONoC的带宽潜力提供了基础。实验结果表明,该方法明显优于现有方法,特别适用于特定应用的WRONoC拓扑。
{"title":"Maximizing the Communication Parallelism for Wavelength-Routed Optical Networks-On-Chips","authors":"Mengchu Li, Tsun-Ming Tseng, Mahdi Tala, Ulf Schlichtmann","doi":"10.1109/ASP-DAC47756.2020.9045163","DOIUrl":"https://doi.org/10.1109/ASP-DAC47756.2020.9045163","url":null,"abstract":"Enabled by recent development in silicon photonics, wavelength-routed optical networks-on-chips (WRONoCs) emerge as an appealing next-generation architecture for the communication in multiprocessor system-on-chip. WRONoCs apply a passive routing mechanism that statically reserves all data transmission paths at design time, and are thus able to avoid the latency and energy overhead for arbitration, compared to other ONoC architectures. Current research mostly assumes that in a WRONoC topology, each initiator node sends one bit at a time to a target node. However, the communication parallelism can be increased by assigning multiple wavelengths to each path, which requires a systematic analysis of the physical parameters of the silicon microring resonators and the wavelength usage among different paths. This work proposes a mathematical modeling method to maximize the communication parallelism of a given WRONoC topology, which provides a foundation for exploiting the bandwidth potential of WRONoCs. Experimental results show that the proposed method significantly outperforms the state-of-the-art approach, and is especially suitable for application-specific WRONoC topologies.","PeriodicalId":125112,"journal":{"name":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124592529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Investigating the Inherent Soft Error Resilience of Embedded Applications by Full-System Simulation 通过全系统仿真研究嵌入式应用固有的软错误恢复能力
Pub Date : 2020-01-01 DOI: 10.1109/ASP-DAC47756.2020.9045132
Uzair Sharif, Daniel Mueller-Gritschneder, Ulf Schlichtmann
It has long been acknowledged that some applications feature inherent resilience against soft errors, e.g., the impact of soft errors on multimedia applications is often non-visible to humans. In this paper we investigate the inherent resilience of two typical embedded applications using a case study of a control system and a robot arm. Both studies were enabled by our mixed-mode fault injection simulator ETISS-ML, which allows RTL-accurate fault injection while being able to simulate very long scenarios, e.g. robot movements of several seconds. Our results indicate that full simulation of the embedded system and its environment are required to classify whether the system can tolerate the impact of a soft error. This is due to the fact that it is hard to predict the impact of a certain output deviation without investigating the change in the system behavior taking into account the control loop. Based on this classification method we hope to be able to exploit this resilience for lowering the cost of error detection mechanisms in future research.
人们早就认识到,一些应用程序具有针对软错误的固有弹性,例如,软错误对多媒体应用程序的影响通常是不可见的。在本文中,我们研究了两种典型的嵌入式应用程序的固有弹性,使用控制系统和机器人手臂的案例研究。这两项研究都是通过我们的混合模式故障注入模拟器ETISS-ML实现的,该模拟器允许rtl精确的故障注入,同时能够模拟很长时间的场景,例如机器人运动几秒钟。我们的研究结果表明,需要对嵌入式系统及其环境进行全面的仿真,才能对系统是否能够容忍软错误的影响进行分类。这是因为,如果不考虑控制回路,调查系统行为的变化,就很难预测某个输出偏差的影响。基于这种分类方法,我们希望能够在未来的研究中利用这种弹性来降低错误检测机制的成本。
{"title":"Investigating the Inherent Soft Error Resilience of Embedded Applications by Full-System Simulation","authors":"Uzair Sharif, Daniel Mueller-Gritschneder, Ulf Schlichtmann","doi":"10.1109/ASP-DAC47756.2020.9045132","DOIUrl":"https://doi.org/10.1109/ASP-DAC47756.2020.9045132","url":null,"abstract":"It has long been acknowledged that some applications feature inherent resilience against soft errors, e.g., the impact of soft errors on multimedia applications is often non-visible to humans. In this paper we investigate the inherent resilience of two typical embedded applications using a case study of a control system and a robot arm. Both studies were enabled by our mixed-mode fault injection simulator ETISS-ML, which allows RTL-accurate fault injection while being able to simulate very long scenarios, e.g. robot movements of several seconds. Our results indicate that full simulation of the embedded system and its environment are required to classify whether the system can tolerate the impact of a soft error. This is due to the fact that it is hard to predict the impact of a certain output deviation without investigating the change in the system behavior taking into account the control loop. Based on this classification method we hope to be able to exploit this resilience for lowering the cost of error detection mechanisms in future research.","PeriodicalId":125112,"journal":{"name":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"250 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115502863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
iGPU Leak: An Information Leakage Vulnerability on Intel Integrated GPU iGPU Leak:针对Intel集成GPU的信息泄露漏洞
Pub Date : 2020-01-01 DOI: 10.1109/ASP-DAC47756.2020.9045745
Wenjian He, Wei Zhang, Sharad Sinha, Sanjeev Das
Hardware accelerators such as integrated graphics processing units (iGPUs) are increasingly prevalent in modern systems. They typically provide multiplexing support where several user applications can share the iGPU acceleration resources. However, security in this setting has not received sufficient consideration. In this work, we disclose a critical information leakage vulnerability due to defective GPU context management. In essence, residual register values and shared local memory in the iGPU are not cleared during a context switch. As a result, adversaries can recover the secret key of a cryptographic algorithm running on an iGPU from a single snapshot of the leaking channel. User privacy is also under threat due to browser activity eavesdropping through website-fingerprinting attack with high accuracy and resolution. Moreover, this vulnerability can constitute a covert channel with a bandwidth of up to 8 Gbps.
集成图形处理单元(igpu)等硬件加速器在现代系统中越来越普遍。它们通常提供多路复用支持,其中多个用户应用程序可以共享iGPU加速资源。但是,这种情况下的安全问题没有得到充分的考虑。在这项工作中,我们揭示了由于GPU上下文管理缺陷而导致的关键信息泄露漏洞。实际上,在上下文切换期间,iGPU中的残留寄存器值和共享本地内存不会被清除。因此,攻击者可以从泄漏通道的单个快照中恢复在iGPU上运行的加密算法的密钥。通过高精度、高分辨率的网站指纹攻击对浏览器活动进行窃听,用户隐私也受到威胁。此外,此漏洞可以构成带宽高达8gbps的隐蔽通道。
{"title":"iGPU Leak: An Information Leakage Vulnerability on Intel Integrated GPU","authors":"Wenjian He, Wei Zhang, Sharad Sinha, Sanjeev Das","doi":"10.1109/ASP-DAC47756.2020.9045745","DOIUrl":"https://doi.org/10.1109/ASP-DAC47756.2020.9045745","url":null,"abstract":"Hardware accelerators such as integrated graphics processing units (iGPUs) are increasingly prevalent in modern systems. They typically provide multiplexing support where several user applications can share the iGPU acceleration resources. However, security in this setting has not received sufficient consideration. In this work, we disclose a critical information leakage vulnerability due to defective GPU context management. In essence, residual register values and shared local memory in the iGPU are not cleared during a context switch. As a result, adversaries can recover the secret key of a cryptographic algorithm running on an iGPU from a single snapshot of the leaking channel. User privacy is also under threat due to browser activity eavesdropping through website-fingerprinting attack with high accuracy and resolution. Moreover, this vulnerability can constitute a covert channel with a bandwidth of up to 8 Gbps.","PeriodicalId":125112,"journal":{"name":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126833958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Chiplet-Package Co-Design For 2.5D Systems Using Standard ASIC CAD Tools 使用标准ASIC CAD工具的2.5D系统的芯片封装协同设计
Pub Date : 2020-01-01 DOI: 10.1109/ASP-DAC47756.2020.9045734
M. Kabir, Yarui Peng
Chiplet integration using 2.5D packaging is gaining popularity nowadays which enables several interesting features like heterogeneous integration and drop-in design method. In the traditional die-by-die approach of designing a 2.5D system, each chiplet is designed independently without any knowledge of the package RDLs. In this paper, we propose a Chip-Package Co-Design flow for implementing 2.5D systems using existing commercial chip design tools. Our flow encompasses 2.5D-aware partitioning suitable for SoC design, Chip-Package Floorplanning, and post-design analysis and verification of the entire 2.5D system. We also designed our own package planners to route RDL layers on top of chiplet layers. We use an ARM Cortex-M0 SoC system to illustrate our flow and compare analysis results with a monolithic 2D implementation of the same system. We also compare two different 2.5D implementations of the same SoC system following the drop-in approach. Alongside the traditional die-by-die approach, our holistic flow enables design efficiency and flexibility with accurate cross-boundary parasitic extraction and design verification.
如今,使用2.5D封装的小片集成越来越受欢迎,它实现了一些有趣的功能,如异构集成和插入式设计方法。在设计2.5D系统的传统逐片方法中,每个芯片都是独立设计的,而不需要了解封装rdl。在本文中,我们提出了一个芯片封装协同设计流程,用于使用现有的商用芯片设计工具实现2.5D系统。我们的流程包括适用于SoC设计的2.5D感知分区,芯片封装平面图,以及整个2.5D系统的设计后分析和验证。我们还设计了自己的包计划器,将RDL层路由到小片层的顶部。我们使用ARM Cortex-M0 SoC系统来说明我们的流程,并将分析结果与同一系统的单片2D实现进行比较。我们还比较了采用插入式方法的同一SoC系统的两种不同的2.5D实现。除了传统的逐模方法外,我们的整体流程通过精确的跨界寄生提取和设计验证,提高了设计效率和灵活性。
{"title":"Chiplet-Package Co-Design For 2.5D Systems Using Standard ASIC CAD Tools","authors":"M. Kabir, Yarui Peng","doi":"10.1109/ASP-DAC47756.2020.9045734","DOIUrl":"https://doi.org/10.1109/ASP-DAC47756.2020.9045734","url":null,"abstract":"Chiplet integration using 2.5D packaging is gaining popularity nowadays which enables several interesting features like heterogeneous integration and drop-in design method. In the traditional die-by-die approach of designing a 2.5D system, each chiplet is designed independently without any knowledge of the package RDLs. In this paper, we propose a Chip-Package Co-Design flow for implementing 2.5D systems using existing commercial chip design tools. Our flow encompasses 2.5D-aware partitioning suitable for SoC design, Chip-Package Floorplanning, and post-design analysis and verification of the entire 2.5D system. We also designed our own package planners to route RDL layers on top of chiplet layers. We use an ARM Cortex-M0 SoC system to illustrate our flow and compare analysis results with a monolithic 2D implementation of the same system. We also compare two different 2.5D implementations of the same SoC system following the drop-in approach. Alongside the traditional die-by-die approach, our holistic flow enables design efficiency and flexibility with accurate cross-boundary parasitic extraction and design verification.","PeriodicalId":125112,"journal":{"name":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127395189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
期刊
2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1