Skip to content
greglu edited this page Jun 18, 2011 · 12 revisions

Example command for running a Hadoop streaming job:

~/hadoop$ bin/hadoop jar contrib/streaming/hadoop-0.20.2-streaming.jar -file ~/users/you/mapper.py -mapper ~/users/you/mapper.py -file ~/users/you/reducer.py -reducer ~/users/you/reducer.py -input /datasets/wikipedia/* -output job-output

Notes:

  • -mapper and -reducer are paths on the LOCAL filesystem of the master node
  • -input and -output are paths on the Hadoop filesystem (HDFS)

Culled from here:

Clone this wiki locally