-
Notifications
You must be signed in to change notification settings - Fork 73
Hadoop Streaming
greglu edited this page Jun 18, 2011
·
12 revisions
Example command for running a Hadoop streaming job:
~/hadoop$ bin/hadoop jar contrib/streaming/hadoop-0.20.2-streaming.jar -file ~/users/you/mapper.py -mapper ~/users/you/mapper.py -file ~/users/you/reducer.py -reducer ~/users/you/reducer.py -input /datasets/wikipedia/* -output job-output
-
-mapper
and-reducer
are paths on the LOCAL filesystem of the master node -
-file
is a repeated argument for each mapper and reducer script you have -
-input
and-output
are paths on the Hadoop filesystem (HDFS)
Culled from here: