Skip to content

mbhavik91/United-States-Census-Data-Analysis-Using-MapReduce_P1

Repository files navigation

#READ ME !!!

Dataset used is "unsorted-2"

Number of reducers used is 16. (change code if asked to use 32 reducers. change partitioner and main job)

Main Classname is MainJob

Package Structure: cs455.gigasort.hadoop

How to run? Please typr the following command:
$HADOOP_HOME/bin/hadoop jar workspace/Assignment3/dist/gigasort.jar cs455.gigasort.hadoop.MainJob /data/gigasort/unsorted-2 /home/output/final

Dataset Size = 19.9 GB

A shell script was developed to get the final results

The following commang was used to generate the hash value of each file
"openssl sha1 part-r-00006 | awk '{print $2}' >> hashvalue"

After that Generatehash.java was used to generate the finalHash in a recursive fashion

Merkle Tree Algorithm was used to generate the final hash value

About

Giga Sort

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages