樊超, 凌捷. 改善Hadoop文件处理效率的技术研究[J]. 微电子学与计算机, 2014, 31(7): 125-128,132.
引用本文: 樊超, 凌捷. 改善Hadoop文件处理效率的技术研究[J]. 微电子学与计算机, 2014, 31(7): 125-128,132.
FAN Chao, LING Jie. Study on Improving the Efficiency for Dealing with Hadoop Small Files[J]. Microelectronics & Computer, 2014, 31(7): 125-128,132.
Citation: FAN Chao, LING Jie. Study on Improving the Efficiency for Dealing with Hadoop Small Files[J]. Microelectronics & Computer, 2014, 31(7): 125-128,132.

改善Hadoop文件处理效率的技术研究

Study on Improving the Efficiency for Dealing with Hadoop Small Files

  • 摘要: 提出一种改善Hadoop文件处理效率的方法,在Hadoop中添加一个小文件处理模块SFPM,根据文件名为海量小文件建立二级索引,同时采用预加载技术将索引提前存入缓存,可提高文件查找访问效率;在合并文件时,采取舍弃多余空间的策略,避免将一个文件拆分存储在两个block上,减少了文件访问时间开销.实验结果表明该方法能有效减轻NameNode的负荷,提高小文件读写效率.

     

    Abstract: An approach of imporving the efficiency for dealing with hadoop small files is proposed in this paper,adding a small file processing module (SFPM) to Hadoop,create indexs of two level according to the file names,and use the preload technology to make the indexs into cache in advance,in order to improve the efficiency of file research and access.When merging files,the strategy of giving up the extra memory space is taken so as to avoid to split one file stored in two blocks,which will reduce the time overhead of file access.The experimental results show that this method can reduce the load of NameNode and enhance its work efficiency of read and write files.

     

/

返回文章
返回