Embedding GPU Computations in Hadoop

Jie Zhu, Hai Jiang, Juanjuan Li, Erikson Hardesty, Kuan-Ching Li, Zhongwen Li
{"title":"Embedding GPU Computations in Hadoop","authors":"Jie Zhu, Hai Jiang, Juanjuan Li, Erikson Hardesty, Kuan-Ching Li, Zhongwen Li","doi":"10.2991/ijndc.2014.2.4.2","DOIUrl":null,"url":null,"abstract":"As the size of high performance applications increases, four major challenges including heterogeneity, programmability, fault resilience, and energy efficiency have arisen in the underlying distributed systems. To tackle with all of them without sacrificing performance, traditional approaches in resource utilization, task scheduling and programming paradigm should be reconsidered. While Hadoop has handled data-intensive applications well in Clouds, GPU has demonstrated its acceleration effectiveness for computation-intensive ones. This paper addresses the approaches for Hadoop to exploiting both CPU and GPU resources effectively to handle aforementioned challenges. Hadoop schedules MapReduce’s Map and Reduce functions across multiple different computing nodes through Java, whereas CUDA code helps accelerate local computations further on attached GPUs. All available heterogeneous computational power will be utilized. MapReduce in Hadoop eases the programming task by hiding communication and scheduling details. Hadoop Distributed File System will help achieve data-level fault resilience. GPU’s energy efficiency characteristics help reduce the power consumption of the whole system. To utilize GPU in Hadoop, four approaches including Jcuda, JNI, Hadoop Streaming, and Hadoop Pipes, have been accomplished and analyzed. Experimental results have demonstrated and compared their effectiveness.","PeriodicalId":318936,"journal":{"name":"Int. J. Networked Distributed Comput.","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Networked Distributed Comput.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2991/ijndc.2014.2.4.2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

As the size of high performance applications increases, four major challenges including heterogeneity, programmability, fault resilience, and energy efficiency have arisen in the underlying distributed systems. To tackle with all of them without sacrificing performance, traditional approaches in resource utilization, task scheduling and programming paradigm should be reconsidered. While Hadoop has handled data-intensive applications well in Clouds, GPU has demonstrated its acceleration effectiveness for computation-intensive ones. This paper addresses the approaches for Hadoop to exploiting both CPU and GPU resources effectively to handle aforementioned challenges. Hadoop schedules MapReduce’s Map and Reduce functions across multiple different computing nodes through Java, whereas CUDA code helps accelerate local computations further on attached GPUs. All available heterogeneous computational power will be utilized. MapReduce in Hadoop eases the programming task by hiding communication and scheduling details. Hadoop Distributed File System will help achieve data-level fault resilience. GPU’s energy efficiency characteristics help reduce the power consumption of the whole system. To utilize GPU in Hadoop, four approaches including Jcuda, JNI, Hadoop Streaming, and Hadoop Pipes, have been accomplished and analyzed. Experimental results have demonstrated and compared their effectiveness.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在Hadoop中嵌入GPU计算
随着高性能应用程序规模的增加,底层分布式系统中出现了四个主要挑战,包括异构性、可编程性、故障恢复能力和能源效率。为了在不牺牲性能的情况下解决所有这些问题,应该重新考虑资源利用、任务调度和编程范式方面的传统方法。虽然Hadoop在云中处理数据密集型应用程序很好,但GPU已经证明了它对计算密集型应用程序的加速效率。本文讨论了Hadoop有效利用CPU和GPU资源来应对上述挑战的方法。Hadoop通过Java在多个不同的计算节点上调度MapReduce的Map和Reduce函数,而CUDA代码有助于在附加的gpu上进一步加速本地计算。将利用所有可用的异构计算能力。Hadoop中的MapReduce通过隐藏通信和调度细节来简化编程任务。Hadoop分布式文件系统将帮助实现数据级的故障恢复。GPU的能效特性有助于降低整个系统的功耗。为了在Hadoop中利用GPU,本文完成并分析了Jcuda、JNI、Hadoop Streaming和Hadoop Pipes四种方法。实验结果证明了该方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Adaptive Parameter Tuning for Constructing Storage Tiers in an Autonomous Distributed Storage System Application of 2‑gram and 3‑gram to Obtain Factor Scores of Statements Posted at Q&A Sites Bountychain: Toward Decentralizing a Bug Bounty Program with Blockchain and IPFS Secure Communications by Tit-for-Tat Strategy in Vehicular Networks Vehicle Platooning Systems: Review, Classification and Validation Strategies
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1