軟件包:VirtualBox
安裝包:hadoop-3.0.0.tar.gz,jdk-8u161-linux-x64.tar.gz
1. 環(huán)境準(zhǔn)備使用VirtualBox和下載的ubuntu鏡像文件新建三個U" />
時間:2023-06-28 20:12:01 | 來源:網(wǎng)站運營
時間:2023-06-28 20:12:01 來源:網(wǎng)站運營
虛擬機(jī)搭建Hadoop集群:操作系統(tǒng):ubuntu-16.04.3-desktop-amd64.iso
軟件包:VirtualBox
安裝包:hadoop-3.0.0.tar.gz,jdk-8u161-linux-x64.tar.gz
sudo apt-get install -y vim
執(zhí)行以下命令查看當(dāng)前網(wǎng)絡(luò)地址:ifconfig
/etc/network/interfaces
文件中,為當(dāng)前虛擬機(jī)配置靜態(tài)ip:auto enp0s8iface enp0s8 inet staticaddress 192.168.56.4netmask 255.255.255.0gateway 192.168.1.1dns-nameservers 202.120.111.3
如此按照上述方式對另外兩臺虛擬機(jī)配置網(wǎng)絡(luò)環(huán)境即可。sudo apt-get install -y openssh-server
安裝完成后執(zhí)行如下命令生成ssh公鑰:ssh-keygen -t rsa
該命令執(zhí)行期間一直按回車即可。公鑰生成后會在用戶目錄~
下生成一個.ssh文件夾,里面包含id_rsa
和id_rsa.pub
兩個文件。id_rsa.pub
文件中的內(nèi)容都復(fù)制到同一個文件中,命名為authorized_keys,并且將該文件在每臺虛擬機(jī)的.ssh文件夾下都創(chuàng)建一份,該文件相當(dāng)于為其中的每個公鑰所指代的機(jī)器提供ssh登錄的權(quán)限,由于三臺虛擬機(jī)的authorized_keys都有各自的公鑰,因而其相互之間可以通過ssh免密登錄。示例authorized_keys文件如下:ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDFez1asGIktruVI53uJHT3s8UZHoIi3X98G5mFV/7+MAs8xXeXV7HbHfi2FfJnMl/qTY/W4VZWdoFLizDBrtUDHTtigVxs5uK4re8qlvSApmqy9Xi0c+qpLKHSeFBpCSqKgngrwE/+DOFnkkTSH/hv6bIpGPTYArpOXdY203vyt6/MM/HKed0WeAcDbCdfKjke4Q2IHi6APghwjML3oD1N0rNGU28SRc8iGdg+vGp6Ajkr034VZCx7fY/BmjYhxPvJ6c5hnVSwqik05xdw2Dh+6eLkiOOnO1LknFw7KdFqa1435sOxxHhar8+ELiKu/mYzVcZMizN0AiPQGxjP96fl hadoop01@hadoop01ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDDCXKskhH0VFzh8KrJt3PmbR/Yxbgv5le4iEdvIPWWXAC7XDuPGrz1XH/ZYlZWauyV/LsMN3qjbeHzyfeuuNuV6Skpy/lofsIO88/XH0NFYcAxQtIQfSLwbOGVWziibOPY+gI8Bnzeb7hAYk10V2cI26hKWMpEHxOu/lCxcNuM5Y+CBs2kx2KzzvwgUjF12P6Jz4+SguCERi+Cz1JQ0YuXHBRLXGgwXMRyYUlC3KxIvyeZzI0+Gpew4nTFFXBoDIEaWn9Ma8+AcHNm9ejnO9ChSCN3zXJf7nnaXKUmi5jyQu88e+qmhDt2Pzj0E/kaKRkxso7e+sgHMBp8eXpJu/eT hadoop02@hadoop02ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCm4yk0TVpfhU0jSf4PMH60fOhMYrxCI9DeG/tcs0LTAUHGatuY3XRd6X3B5tShUlCvr9M1DVRgszk0Nz9VOzgqFsIXUxJLAir4dQIj+nVY0QcyTwzwbqm93YDZfaoYrO9xgEriZ6XVK78bWc8bMWpc9z35Kp4U6ytTQUufVwnsVXgAcBN6rQ/ZZFiJvCwnsZDtNsT/zVNWdrnVMKFbm+0rQHzt+jQEgfunwQeEkj8G21iPMpG9MxuHLmzOx+7XaxNLl/P2oHto8lQJgm0DYLJy6JLPa3rkd+NuBxYoqRxr1A1eC/7f3480bz+HHym5e0dSh8HuG3XJihIoR1SLm1Sd hadoop03@hadoop03
按照上述配置完成之后,可使用如下命令在一臺虛擬機(jī)上登錄另外兩臺虛擬機(jī)(這里hadoop02和hadoop03為目標(biāo)機(jī)器的用戶,@符號后的內(nèi)容為目標(biāo)機(jī)器的ip地址):ssh hadoop02@192.168.56.3ssh haddop03@192.168.56.8
這里需要注意,如此配置之后即可按照上述方式使用ssh登錄免密登錄其他機(jī)器,但是這種配置還不完全夠,因為在后續(xù)使用hive和hbase的過程中,其集群內(nèi)部是使用如下方式登錄的:ssh hadoop02ssh hadoop03
這種登錄方式其實可以理解為別名登錄,其將hadoop02指代為hadoop02@192.168.56.3,在ssh中可以通過配置config文件來設(shè)置別名,該config文件放置在.ssh文件夾下即可,具體的配置如下:Host hadoop02 hostname 192.168.56.3 user hadoop02Host hadoop03 hostname 192.168.56.8 user hadoop03
tar -zxvf jdk-8u161-linux-x64.tar.gz
使用ln命令為jdk和hadoop解壓包創(chuàng)建軟連接,創(chuàng)建軟連接的優(yōu)點在于如果后續(xù)需要更改jdk或者Hadoop的版本,只需要把軟連接指向新的解壓目錄即可,而不需要到處更改配置的環(huán)境變量,因為環(huán)境變量都是基于軟連接配置的:ln -s jdk1.8.0_161 jdkln -s hadoop-3.0.0 hadoop
創(chuàng)建軟連接之后的目錄結(jié)構(gòu)如下所示:drwxrwxr-x 5 hadoop01 hadoop01 4096 2月 23 17:46 ./drwxr-xr-x 19 hadoop01 hadoop01 4096 2月 23 20:46 ../lrwxrwxrwx 1 hadoop01 hadoop01 12 2月 22 22:24 hadoop -> hadoop-3.0.0/drwxr-xr-x 12 hadoop01 hadoop01 4096 2月 23 10:24 hadoop-3.0.0/lrwxrwxrwx 1 hadoop01 hadoop01 12 2月 22 22:06 jdk -> jdk1.8.0_161/drwxr-xr-x 8 hadoop01 hadoop01 4096 12月 20 08:24 jdk1.8.0_161/
編輯用戶目錄下的.profile
文件,在其中加上如下內(nèi)容:export JAVA_HOME=/home/hadoop01/xufeng.zhang/jdkexport CLASSPATH=$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$CLASSPATHexport PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATHexport HADOOP_HOME=/home/hadoop01/xufeng.zhang/hadoopexport PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATHexport HADOOP_COMMON_HOME=$HADOOP_HOMEexport HADOOP_HDFS_HOME=$HADOOP_HOMEexport HADOOP_MAPRED_HOME=$HADOOP_HOMEexport YARN_HOME=$HADOOP_HOMEexport HADOOP_CLIENT_CONF_DIR=$HADOOP_HOME/etc/hadoopexport HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoopexport HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
保存之后,使用如下source
命令才能使編輯的環(huán)境變量生效:source ~/.profile
使用java -version
命令查看配置的Java運行環(huán)境是否正常:hadoop01@hadoop01:~/xufeng.zhang$ java -versionjava version "1.8.0_161"Java(TM) SE Runtime Environment (build 1.8.0_161-b12)Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)
使用hadoop version
命令查看Hadoop是否配置正確:hadoop01@hadoop01:~/xufeng.zhang$ hadoop versionHadoop 3.0.0Source code repository https://git-wip-us.apache.org/repos/asf/hadoop.git -r c25427ceca461ee979d30edd7a4b0f50718e6533Compiled by andrew on 2017-12-08T19:16ZCompiled with protoc 2.5.0From source with checksum 397832cb5529187dc8cd74ad54ff22This command was run using /home/hadoop01/xufeng.zhang/hadoop-3.0.0/share/hadoop/common/hadoop-common-3.0.0.jar
對于另外兩臺虛擬機(jī),可以按照上述方式進(jìn)行類似的配置即可。core-site.xmlhdfs-site.xmlmapred-site.xmlyarn-site.xmlhadoop-env.sh
<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration> <property> <name>fs.defaultFS</name> <value>hdfs://192.168.56.3:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop01/xufeng.zhang/hadoop/tmp</value> </property> <property> <name>fs.trash.interval</name> <value>10080</value> </property></configuration>
<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.permissions.enabled</name> <value>false</value> </property> <property> <name>dfs.http.address</name> <value>192.168.56.3:50070</value> </property> <property> <name>dfs.secondary.http.address</name> <value>192.168.56.4:50090</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/home/hadoop01/xufeng.zhang/hadoop/data/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/hadoop01/xufeng.zhang/hadoop/data/datanode</value> </property></configuration>
<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>192.168.56.3:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>192.168.56.3:19888</value> </property> <property> <name>mapreduce.job.ubertask.enable</name> <value>true</value> </property> <property> <name>yarn.app.mapreduce.am.env</name> <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value> </property> <property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value> </property> <property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value> </property></configuration>
這里需要注意的是,yarn.app.mapreduce.am.env
,mapreduce.map.env
和mapreduce.reduce.env
三個參數(shù)一定要配上,否則在運行MapReduce任務(wù)時會報如下錯誤:Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
<?xml version="1.0"?><configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>192.168.56.4</value> </property> <property> <name>yarn.web-proxy.address</name> <value>192.168.56.4:8888</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>604800</value> </property> <property> <name>yarn.application.classpath</name> <value> $HADOOP_CLIENT_CONF_DIR, $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*, $HADOOP_COMMON_HOME/lib/*, $HADOOP_HDFS_HOME/*, $HADOOP_HDFS_HOME/lib/*, $YARN_HOME/*, $YARN_HOME/lib/* </value> </property></configuration>
hadoop-env.sh
的配置,因為該配置是比較容易掉的配置,而且在執(zhí)行任務(wù)的過程中沒有該配置則會報找不到環(huán)境變量的錯誤。該配置文件中主要需要配置JAVA_HOME參數(shù),如:# The java implementation to use. By default, this environment# variable is REQUIRED on ALL platforms except OS X!export JAVA_HOME=/home/hadoop01/xufeng.zhang/jdk
可以看到,在該文件中已經(jīng)說明了除了OS X操作系統(tǒng)可以部配該變量外,其余的操作系統(tǒng)都需要配置。core-site.xml
的hadoop.tmp.dir
更改為對應(yīng)虛擬機(jī)的路徑即可。hadoop namenode -format
格式化完成之后,分別在每臺機(jī)器上執(zhí)行如下命令即可啟動Hadoop,該命令是Hadoop的sbin目錄下的一個命令,在進(jìn)行環(huán)境變量配置時已將其添加到了PATH里了:start-all.sh
如果需要關(guān)閉Hadoop,可執(zhí)行如下命令:stop-all.sh
啟動完成后可以使用如下命令在每臺機(jī)器上查看Hadoop是否成功啟動了:jps
各節(jié)點的啟動信息如下:hadoop01@hadoop01:~/xufeng.zhang/hadoop/etc/hadoop$ jps3680 Jps3154 NodeManager2627 DataNode2788 SecondaryNameNode3381 WebAppProxyServer3038 ResourceManagerhadoop02@hadoop02:~$ jps2883 Jps2035 NameNode2659 NodeManager2152 DataNodehadoop03@hadoop03:~$ jps2083 DataNode2759 Jps2525 NodeManager
可以看到這里hadoop02機(jī)器為主NameNode節(jié)點,hadoop01為副NameNode節(jié)點;hadoop01,hadoop02和hadoop03都為DataNode節(jié)點。這里也可以打開如下鏈接查看各個節(jié)點的運行情況:http://192.168.56.4:8088/
http://192.168.56.3:50070/
關(guān)鍵詞:虛擬
微信公眾號
版權(quán)所有? 億企邦 1997-2025 保留一切法律許可權(quán)利。