hadoop高可用集群搭建

mac2022-06-30  27

首先创建4台虚拟机,规划如下

 

IP地址主机名安装软件运行服务192.168.16.134hadoop1jdk,hadoop namenode,Journalnode,ZKFC,Resourcemanager192.168.16.135hadoop2jdk,hadoop,zookeeper namenode,datanode,Journalnode,ZKFC,Resourcemanager,zookeeper192.168.16.136hadoop3jdk,hadoop,zookeeper namenode,datanode,Journalnode,zookeeper192.168.16.137hadoop4jdk,hadoop,zookeeper namenode,datanode,zookeeper

 

接下来关闭防火墙和selinux,可以在每台服务器上运行如下命令(操作节点:所有)

sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config systemctl stop firewalld systemctl disable firewalld setenforce 0

更改每台主机主机名,如下命令(操作节点:所有)

hostnamectl set-hostname hadoop1

每台主机上添加hosts解析(操作节点:所有)

echo " 192.168.16.134 hadoop1 192.168.16.135 hadoop2 192.168.16.136 hadoop3 192.168.16.137 hadoop4">>/etc/hosts

所有主机上创建hadoop用户(操作节点:所有)

useradd hadoop passwd hadoop

切换到普通用户,配置免密,(操作节点:hadoop1)

su - hadoop #ssh-keygen,一路回车即可 ssh-keygen #本机也要拷贝一份 ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@hadoop1 ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@hadoop2 ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@hadoop3 ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@hadoop4

 切回root用户,安装jdk,openjdk安装方便,所以直接yum安装了,创建hadoop目录(操作节点:所有)

yum install -y java-1.8.0-openjdk.x86_64 #查看是否安装成功 java -version#JAVA_HOME地址:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.222.b10-0.el7_6.x86_64/jremkdir /hadoopchown -R hadoop:hadoop /hadoop

配置zookeeper,zookeeper下载地址:https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/

su - hadoop cd /hadoopcurl -O https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/zookeeper-3.5.5/apache-zookeeper-3.5.5-bin.tar.gz#解压安装包tar -zxvf apache-zookeeper-3.5.5-bin.tar.gzcd apache-zookeeper-3.5.5-bin/conf/#更改配置文件名字mv zoo_sample.cfg zoo.cfg#创建zookeeper数据存储的目录mkdir -p /hadoop/data/zookeeper

zoo.cfg配置文件

# The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. # do not use /tmp for storage, /tmp here is just # example sakes. dataDir=/hadoop/data/zookeeper # the port at which the clients will connect clientPort=2181 # the maximum number of client connections. # increase this if you need to handle more clients #maxClientCnxns=60 # # Be sure to read the maintenance section of the # administrator guide before turning on autopurge. # # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance # # The number of snapshots to retain in dataDir #autopurge.snapRetainCount=3 # Purge task interval in hours # Set to "0" to disable auto purge feature #autopurge.purgeInterval=1 server.1=hadoop2:2888:3888 server.2=hadoop3:2888:3888 server.3=hadoop4:2888:3888 View Code

在dataDir目录下创建myid文件,内容分别为1、2、3

#hadoop2 echo 1 > /hadoop/data/zookeeper/myid #hadoop3 echo 2 > /hadoop/data/zookeeper/myid #hadoop4 echo 3 > /hadoop/data/zookeeper/myid

配置环境变量

vim ~/.bashrc

# .bashrc # Source global definitions if [ -f /etc/bashrc ]; then . /etc/bashrc fi # Uncomment the following line if you don't like systemctl's auto-paging feature: # export SYSTEMD_PAGER= # User specific aliases and functions export ZOOKEEPER_HOME=/hadoop/apache-zookeeper-3.5.5-bin PATH=$PATH:$ZOOKEEPER_HOME/bin source ~/.bashrc

启动zookeeper集群

zkServer.sh start #查看是否启动成功$ zkServer.sh status /bin/java ZooKeeper JMX enabled by default Using config: /hadoop/apache-zookeeper-3.5.5-bin/bin/../conf/zoo.cfg Client port found: 2181. Client address: localhost. Mode: leader

 hadoop部署(操作码节点:hadoop1)

下载地址:https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/stable2/hadoop-3.2.0.tar.gz

cd /hadoop curl -O https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/stable2/hadoop-3.2.0.tar.gz tar -zxvf hadoop-3.2.0.tar.gz

增加环境变量

hadoop1

export HADOOP_HOME=/hadoop/hadoop-3.2.0 PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin View Code

hadoop2、3、4

# .bashrc # Source global definitions if [ -f /etc/bashrc ]; then . /etc/bashrc fi # Uncomment the following line if you don't like systemctl's auto-paging feature: # export SYSTEMD_PAGER= # User specific aliases and functions export ZOOKEEPER_HOME=/hadoop/apache-zookeeper-3.5.5-bin export HADOOP_HOME=/hadoop/hadoop-3.2.0 PATH=$PATH:$ZOOKEEPER_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin View Code

更改配置文件

hadoop-env.sh,任意位置加上以下语句

JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.222.b10-0.el7_6.x86_64/jre

core-site.xml,在configuration内加入以下语句

<property> <name>fs.defaultFS</name> <value>hdfs://mycluster</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/hadoop/data</value> </property>

hdfs-stie.xml

<property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.nameservices</name> <value>mycluster</value> </property> <property> <name> dfs.ha.namenodes.mycluster </name> <value> nn1,nn2 </value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn1</name> <value>hadoop1:8020</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn2</name> <value>hadoop2:8020</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn1</name> <value>hadoop1:50070</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn2</name> <value>hadoop2:50070</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/mycluster</value> </property> <property> <name>dfs.client.failover.proxy.provider.mycluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/hadoop/.ssh/id_rsa</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/hadoop/data</value> </property> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>hadoop2:2181,hadoop3:2181,hadoop4:2181</value> </property>

 mapred-site.xml

<configuration>

    <property>     <name>mapreduce.framework.name</name>     <value>yarn</value>     </property>     <property>      <name>mapreduce.jobhistory.address</name>      <value>hadoop1:10020</value>     </property>     <property>      <name>mapreduce.jobhistory.webapp.address</name>      <value>hadoop1:19888</value>     </property></configuration>

yarn-site.xml

<configuration> <!-- Site specific YARN configuration properties -->     <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value>     </property>     <!-- Site specific YARN configuration properties -->     <!--启用resourcemanager ha-->     <!--是否开启RM ha,默认是开启的-->     <property>      <name>yarn.resourcemanager.ha.enabled</name>      <value>true</value>     </property>     <!--声明两台resourcemanager的地址-->     <property>      <name>yarn.resourcemanager.cluster-id</name>      <value>rmcluster</value>     </property>     <property>      <name>yarn.resourcemanager.ha.rm-ids</name>      <value>rm1,rm2</value>     </property> <property> <name>yarn.resourcemanager.ha.id</name> <value>rm1</value> </property>     <property>      <name>yarn.resourcemanager.hostname.rm1</name>      <value>hadoop1</value>     </property>     <property>      <name>yarn.resourcemanager.hostname.rm2</name>      <value>hadoop2</value>     </property>     <!--指定zookeeper集群的地址-->     <property>      <name>yarn.resourcemanager.zk-address</name>         <value>hadoop2:2181,hadoop3:2181,hadoop4:2181</value>     </property>     <!--启用自动恢复,当任务进行一半,rm坏掉,就要启动自动恢复,默认是false-->     <property>      <name>yarn.resourcemanager.recovery.enabled</name>      <value>true</value>     </property>     <!--指定resourcemanager的状态信息存储在zookeeper集群,默认是存放在FileSystem里面。-->     <property>      <name>yarn.resourcemanager.store.class</name>      <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> </property> </configuration>

workers

hadoop2 hadoop3 hadoop4

scp hadoop文件夹至另外三台机器

分别在hadoop1和hadoop2的yarn-site.xml上添加

<property> <name>yarn.resourcemanager.ha.id</name> <value>rm1</value> </property> <property> <name>yarn.resourcemanager.ha.id</name> <value>rm2</value> </property>

启动zookeeper

启动journalnode,分别是hadoop1,hadoop2,hadoop3

hadoop-daemon.sh start journalnode

格式化hadoop1节点namenode

hdfs namenode –format

启动hadoop1的namenode

hadoop-deamon.sh start namenode

在hadoop2上同步hadoop1的元数据,在hadoop2上运行

hdfs namenode -bootstrapStandby

启动hadoop2上的namenode

hadoop-deamon.sh start namenode

格式化zookeeper,格式化完成后可以进zookeeper确认

hdfs zkfc –formatZK

在hadoop1上输入

start-dfs.sh

 

start-yarn.sh

 

转载于:https://www.cnblogs.com/hope123/p/11274172.html

相关资源:hadoop高可用集群搭建手册.docx
最新回复(0)