HDFS Java API应用

mac2025-06-24 15

1.需求: 从hdfs文件 /tmp/tianliangedu/input.txt中读取其文本内容,并打印出来 2.操作实现 2.1资源准备本地新建文件 index.txt，写入“HelloWorld Hadoop”内容，上传至 HDFS 文件系统的 /tmp/tianliangedu/input.txt 文件中。 2.2Maven环境搭建 Maven开发环境搭建,用Eclipse IDE工具,创建一个新的Maven项目 2.3pom配置依赖修改新建项目的pom.xml的配置文件,将Hadoop的依赖加入进去: 配置依赖jar包的坐标,即描述我是谁指定依赖的仓库配置打包插件 3.代码实现

package com.tianliangedu.utils; import java.io.ByteArrayOutputStream; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.log4j.Logger; /** * hdfs 文件操作工具类,从任意的 hdfs filepath 中读取文本内容 */ public class HdfsFileOperatorUtil { //添加日志输出能力 Logger logger = Logger.getLogger(HdfsFileOperatorUtil.class); // 加载配置文件到内存对象 static Configuration hadoopConf = new Configuration(); /** 从 HDFS 上读取文件 */ public static String readFromFile(String srcFile) throws Exception { //文件路径的空判断 if (srcFile == null || srcFile.trim().length() == 0) { throw new Exception("所要读取的源文件" + srcFile + ",不存在，请检查!"); } //将文件内容转换成字节数组 byte[] byteArray = readFromFileToByteArray(srcFile); if (byteArray == null || byteArray.length == 0) { return null; } //将 utf-8 编码的字节数组通过 utf-8 再进行解码 return new String(byteArray, "utf-8"); } /** * 将指定的文件路径从 hdfs 读取并转换为 byte array. * * @param srcFile * @return */ public static byte[] readFromFileToByteArray(String srcFile) throws Exception { if (srcFile == null || srcFile.trim().length() == 0) { throw new Exception("所要读取的源文件" + srcFile + ",不存在，请检查!"); } //获取 hadoopConf 对应的 hdfs 集群的对象引用 FileSystem fs = FileSystem.get(hadoopConf); //将给定的 srcFile 构建成一个 hdfs 的路径对象 Path Path hdfsPath=new Path(srcFile); FSDataInputStream hdfsInStream = fs.open(hdfsPath); //初始化一块字节数组缓冲区，大小为 65536。缓存每次从流中读取出来的字节数组 byte[] byteArray = new byte[65536]; //初始化字节数输出流，存放最后的所有字节数组 ByteArrayOutputStream bos = new ByteArrayOutputStream(); // 实际读过来多少 int readLen = 0; //只要还有流数据能读出来，就一直读下去 while ((readLen = hdfsInStream.read(byteArray)) > 0) { bos.write(byteArray); byteArray = new byte[65536]; } //读取完成，将 hdfs 输入流关闭 hdfsInStream.close(); //将之前写到字节输出流中的字节，转换成一个整体的字节数组 byte[] resultByteArray=bos.toByteArray(); bos.close(); return resultByteArray; } public static void main(String[] args) throws Exception { //定义要读入的 hdfs 的文件路径 String hdfsFilePath = "/tmp/tianliangedu/input.txt"; //将文件从 hdfs 读取下来，转化成字符串 String result = readFromFile(hdfsFilePath); //根据题意，将字符串通过命令行输出 System.out.println(result); } }

4.Maven打包右击项目，run as -> maven install 进行打包及上传至本地仓库中，生成 target 目录下的 TlHadoopCore-jar-with-dependencies.jar 文件 5.将运行包发布上传至Hadoop环境通过 rz 命令，将生成的 TlHadoopCore-jar-with-dependencies.jar 上传到 hdfs 环境中。 6.线上测试运行 yarn jar TlHadoopCore-jar-with-dependencies.jar com.tianliangedu.utils.HdfsFileOperatorUtil 7.查看验证结果

最新回复(0)