2011/10/21
CDH3 Installati
Upgrading to CDH3 - Cloudera Support
1 $ hadoop version
2 Hadoop 0.20.2-cdh3u1
3 Subversionfile:///tm p/nightly_ 2011-07-18 _07-57-52_ 3/hadoop-0 .20-0.20.2 +923.97-1~ maverick -r bdafb1dbff d0d5f2fbc6 ee022e1c8d f6500fd638
4 Compiled by root on Mon Jul 18 09:40:07 PDT 2011
5 From source with checksum 3127e3d410455d2bacbf f7673bf328 4c
現在はCDH3u1がインストールされてます。
1 $ for x in /etc/init.d/hadoop-* ; do sudo $x stop ; done
2 [sudo] password for h-akanuma:
3 Stopping Hadoop datanode daemon: no datanode to stop
4 hadoop-0.20-datanode .
5 Stopping Hadoop jobtrackerdaemon: no jobtracker to stop
6 hadoop-0.20-jobtrack er.
7 Stopping Hadoop namenode daemon: no namenode to stop
8 hadoop-0.20-namenode .
9 Stopping Hadoop secondarynamenode daemon: no secondaryn amenode to stop
10 hadoop-0.20-secondar ynamenode.
11 Stopping Hadoop tasktracker daemon: no tasktracke r to stop
12 hadoop-0.20-tasktrac ker.
13 Stopping Hadoop HBase master daemon: no master to stop because kill -0 of pid 2271 failed with status 1
14 hbase-master.
15 Stopping Hadoop HBase regionserver daemon: stopping regionserv er........
16 hbase-regionserver.
17 JMX enabled by default
18 Using config: /etc/zookeeper/zoo.c fg
19 Stopping zookeeper ... STOPPED
20 $
21 $ jps
22 9534 Jps
23 $
24 $ ps aux | grep hadoop
25 1000 9544 0.0 0.0 5164 788 pts/0 S+ 21:56 0:00 grep --color=auto hadoop
Hadoop関連プロセスを停止。
1 $ sudo dpkg -i ダウンロード/cdh3-repository_1.0 _all.deb
2 未選択パッケージ cdh3-repository を選択しています。
3 (データベースを読み込んでいます ... 現在 262400 個のファイルとディレクトリがインストールされています。)
4 (.../cdh3-repository_ 1.0_all.de b から) cdh3-repos itory を展開しています...
5 cdh3-repository (1.0) を設定しています ...
6 gpg: 鍵輪「/etc/apt/secring.gpg 」ができました
7 gpg: 鍵輪「/etc/apt/trusted.gpg .d/clouder a-cdh3.gpg 」ができました
8 gpg: 鍵02A818DD: 公開鍵“Cloudera Apt Repository”を読み込みました
9 gpg: 処理数の合計: 1
10 gpg: 読込み: 1
ダウンロードしたパッケージをインストール
1 $ sudo apt-get update
2 ・・・
APTパッケージインデックスを更新
1 $ apt-cache search hadoop
2 ubuntu-orchestra-mod ules-hadoo p - Modules mainly used by orchestra- management -server
3 flume - reliable, scalable, and manageabledistribute d data collection applicatio n
4 hadoop-0.20 - A software platform for processing vast amounts of data
5 hadoop-0.20-conf-pse udo - Pseudo-dis tributed Hadoop configurat ion
6 hadoop-0.20-datanode - Data Node for Hadoop
7 hadoop-0.20-doc - Documentat ion for Hadoop
8 hadoop-0.20-fuse - HDFS exposed over a Filesystem in Userspace
9 hadoop-0.20-jobtrack er - Job Tracker for Hadoop
10 hadoop-0.20-namenode - Name Node for Hadoop
11 hadoop-0.20-native - Native libraries for Hadoop (e.g., compressio n)
12 hadoop-0.20-pipes - Interface to author Hadoop MapReduce jobs in C++
13 hadoop-0.20-sbin - Server-sid e binaries necessary for secured Hadoop clusters
14 hadoop-0.20-secondar ynamenode - Secondary Name Node for Hadoop
15 hadoop-0.20-source - Source code for Hadoop
16 hadoop-0.20-tasktrac ker - Task Tracker for Hadoop
17 hadoop-hbase - HBase is the Hadoop database
18 hadoop-hbase-doc - Documentat ion for HBase
19 hadoop-hbase-master - HMaster is the "master server" for a HBase
20 hadoop-hbase-regions erver - HRegionSer ver makes a set of HRegions available to clients
21 hadoop-hbase-thrift - Provides an HBase Thrift service
22 hadoop-hive - A data warehouse infrastruc ture built on top of Hadoop
23 hadoop-hive-metastor e - Shared metadata repository for Hive
24 hadoop-hive-server - Provides a Hive Thrift service
25 hadoop-pig- A platform for analyzing large data sets using Hadoop
26 hadoop-zookeeper - A high-perfo rmance coordinati on service for distribute d applicatio ns.
27 hadoop-zookeeper-ser ver - This runs the zookeeper server on startup.
28 hue-common- A browser-ba sed desktop interface for Hadoop
29 hue-filebrowser - A UI for the Hadoop Distribute d File System (HDFS)
30 hue-jobbrowser - A UI for viewing Hadoop map-reduce jobs
31 hue-jobsub- A UI for designing and submitting map-reduce jobs to Hadoop
32 hue-plugins - Plug-ins for Hadoop to enable integratio n with Hue
33 hue-shell - A shell for console based Hadoop applications
34 libhdfs0 - JNI Bindings to access Hadoop HDFS from C
35 libhdfs0-dev - Developmen t support for libhdfs0
36 mahout - A set of Java libraries for scalable machine learning.
37 oozie - A workflow and coordinator sytem for Hadoop jobs.
38 sqoop - Tool for easy imports and exports of data sets between databases and HDFS
39 cdh3-repository - Cloudera's Distributi on including Apache Hadoop
Hadoopパッケージの検索
1 $ sudo apt-get install hadoop-0.20
2 ・・・
3 $ hadoop version
4 Hadoop 0.20.2-cdh3u2
5 Subversionfile:///tm p/nightly_ 2011-10-13 _20-02-02_ 3/hadoop-0 .20-0.20.2 +923.142-1 ~maverick -r 95a824e400 5b2a94fe1c 11f1ef9db4 c672ba43cb
6 Compiled by root on Thu Oct 13 21:52:18 PDT 2011
7 From source with checksum 644e5db6c59d45bca96c ec7f220dda 51
Hadoopコアパッケージをインストール。
CDH3u2がインストールされました。
Hadoop各デーモンも同時にアップデートされています。
1 $ sudo apt-get install hadoop-hbase-master
2 ・・・
3 $ sudo apt-get install hadoop-zookeeper-ser ver
4 ・・・
5 $ hbase shell
6 11/10/26 22:36:54 WARN conf.Configuration: DEPRECATED : hadoop-sit e.xml found in the classpath. Usage of hadoop-sit e.xml is deprecated . Instead use core-site. xml, mapred-sit e.xml and hdfs-site. xml to override properties of core-defau lt.xml, mapred-def ault.xml and hdfs-defau lt.xml respective ly
7 HBase Shell; enter 'help<RETURN>' for list of supported commands.
8 Type "exit<RETURN>" to leave the HBase Shell
9 Version 0.90.4-cdh3u2, r, Thu Oct 13 20:32:26 PDT 2011
10
11 hbase(main):001:0>
HBase, Zookeeper もアップデート。CDH3u2にアップデートされました。
1 $ sudo /etc/init.d/hadoop-0 .20-nameno de start
2 Starting Hadoop namenode daemon: starting namenode, logging to /usr/lib/hadoop-0.20 /logs/hado op-hadoop- namenode-h -akanuma-C F-W4.out
3 hadoop-0.20-namenode .
4 $
5 $ sudo /etc/init.d/hadoop-0 .20-datano de start
6 Starting Hadoop datanode daemon: starting datanode, logging to /usr/lib/hadoop-0.20 /logs/hado op-hadoop- datanode-h -akanuma-C F-W4.out
7 hadoop-0.20-datanode .
8 $
9 $ sudo /etc/init.d/hadoop-0 .20-second arynamenod e start
10 Starting Hadoop secondarynamenode daemon: starting secondaryn amenode, logging to /usr/lib/h adoop-0.20 /logs/hado op-hadoop- secondaryn amenode-h- akanuma-CF -W4.out
11 hadoop-0.20-secondar ynamenode.
12 $
13 $ sudo /etc/init.d/hadoop-0 .20-jobtra cker start
14 Starting Hadoop jobtrackerdaemon: starting jobtracker , logging to /usr/lib/h adoop-0.20 /logs/hado op-hadoop- jobtracker -h-akanuma -CF-W4.out
15 hadoop-0.20-jobtrack er.
16 $
17 $ sudo /etc/init.d/hadoop-0 .20-tasktr acker start
18 Starting Hadoop tasktracker daemon: starting tasktracke r, logging to /usr/lib/h adoop-0.20 /logs/hado op-hadoop- tasktracke r-h-akanum a-CF-W4.ou t
19 hadoop-0.20-tasktrac ker.
20 $
21 $ sudo jps
22 12799 SecondaryNameNode
23 12672 DataNode
24 12552 NameNode
25 12895 JobTracker
26 13029 Jps
27 11574 QuorumPeerMain
28 12996 TaskTracker
Hadoop各デーモンを起動
1 $ hadoop jar /usr/lib/hadoop-0.20 /hadoop-0. 20.2-cdh3u 2-*examples.j ar pi 10 10000
2 Number of Maps = 10
3 Samples per Map = 10000
4 Wrote input for Map #0
5 Wrote input for Map #1
6 Wrote input for Map #2
7 Wrote input for Map #3
8 Wrote input for Map #4
9 Wrote input for Map #5
10 Wrote input for Map #6
11 Wrote input for Map #7
12 Wrote input for Map #8
13 Wrote input for Map #9
14 Starting Job
15 11/10/26 23:09:21 INFO mapred.FileInputForm at: Total input paths to process : 10
16 11/10/26 23:09:22 INFO mapred.JobClient: Running job: job_201110 262307_000 1
17 11/10/26 23:09:23 INFO mapred.JobClient: map 0% reduce 0%
18 11/10/26 23:09:42 INFO mapred.JobClient: map 20% reduce 0%
19 11/10/26 23:09:57 INFO mapred.JobClient: map 40% reduce 0%
20 11/10/26 23:10:12 INFO mapred.JobClient: map 60% reduce 0%
21 11/10/26 23:10:14 INFO mapred.JobClient: map 60% reduce 13%
22 11/10/26 23:10:20 INFO mapred.JobClient: map 80% reduce 20%
23 11/10/26 23:10:26 INFO mapred.JobClient: map 100% reduce 20%
24 11/10/26 23:10:29 INFO mapred.JobClient: map 100% reduce 33%
25 11/10/26 23:10:32 INFO mapred.JobClient: map 100% reduce 100%
26 11/10/26 23:10:34 INFO mapred.JobClient: Job complete: job_201110 262307_000 1
27 11/10/26 23:10:35 INFO mapred.JobClient: Counters: 23
28 11/10/26 23:10:35 INFO mapred.JobClient: Job Counters
29 11/10/26 23:10:35 INFO mapred.JobClient: Launched reduce tasks=1
30 11/10/26 23:10:35 INFO mapred.JobClient: SLOTS_MILL IS_MAPS=11 3667
31 11/10/26 23:10:35 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
32 11/10/26 23:10:35 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
33 11/10/26 23:10:35 INFO mapred.JobClient: Launched map tasks=10
34 11/10/26 23:10:35 INFO mapred.JobClient: Data-local map tasks=10
35 11/10/26 23:10:35 INFO mapred.JobClient: SLOTS_MILL IS_REDUCES =49553
36 11/10/26 23:10:35 INFO mapred.JobClient: FileSystem Counters
37 11/10/26 23:10:35 INFO mapred.JobClient: FILE_BYTES _READ=226
38 11/10/26 23:10:35 INFO mapred.JobClient: HDFS_BYTES _READ=2420
39 11/10/26 23:10:35 INFO mapred.JobClient: FILE_BYTES _WRITTEN=6 09632
40 11/10/26 23:10:35 INFO mapred.JobClient: HDFS_BYTES _WRITTEN=2 15
41 11/10/26 23:10:35 INFO mapred.JobClient: Map-Reduce Framework
42 11/10/26 23:10:35 INFO mapred.JobClient: Reduce input groups=2
43 11/10/26 23:10:35 INFO mapred.JobClient: Combine output records=0
44 11/10/26 23:10:35 INFO mapred.JobClient: Map input records=10
45 11/10/26 23:10:35 INFO mapred.JobClient: Reduce shuffle bytes=280
46 11/10/26 23:10:35 INFO mapred.JobClient: Reduce output records=0
47 11/10/26 23:10:35 INFO mapred.JobClient: Spilled Records=40
48 11/10/26 23:10:35 INFO mapred.JobClient: Map output bytes=180
49 11/10/26 23:10:35 INFO mapred.JobClient: Map input bytes=240
50 11/10/26 23:10:35 INFO mapred.JobClient: Combine input records=0
51 11/10/26 23:10:35 INFO mapred.JobClient: Map output records=20
52 11/10/26 23:10:35 INFO mapred.JobClient: SPLIT_RAW_ BYTES=1240
53 11/10/26 23:10:35 INFO mapred.JobClient: Reduce input records=20
54 Job Finished in 74.586 seconds
55 Estimated value of Pi is 3.141200000000000000 00
Hadoopジョブをテスト実行。
無事成功しました。
1 $ sudo /etc/init.d/hadoop-h base-maste r start
2 Starting Hadoop HBase master daemon: starting master, logging to /usr/lib/hbase/logs/ hbase-hbas e-master-h -akanuma-C F-W4.out
3 hbase-master.
4 $
5 $ sudo /etc/init.d/hadoop-h base-regio nserver start
6 Starting Hadoop HBase regionserver daemon: starting regionserv er, logging to /usr/lib/h base/logs/ hbase-hbas e-regionse rver-h-aka numa-CF-W4 .out
7 hbase-regionserver.
8 $
9 $ sudo jps
10 14202 Jps
11 12799 SecondaryNameNode
12 12672 DataNode
13 14134 HRegionServer
14 13996 HMaster
15 12552 NameNode
16 12895 JobTracker
17 11574 QuorumPeerMain
18 12996 TaskTracker
HBaseのデーモンも起動。
擬似分散モードなのでZookeeperは起動させません。
1 $ hbase shell
2 HBase Shell; enter 'help<RETURN>' for list of supported commands.
3 Type "exit<RETURN>" to leave the HBase Shell
4 Version 0.90.4-cdh3u2, r, Thu Oct 13 20:32:26 PDT 2011
5
6 hbase(main):001:0>
7 hbase(main):002:0* list
8 TABLE
9 courses
10 scores
11 2 row(s) in 2.0210 seconds
12
13 hbase(main):003:0>
hbase shell の listコマンドで動作確認。
こちらも成功です。
posted by
akanuma
on Wed 26 Oct 2011
at 22:43