GitHub - hovig/mapreduce: Alternative Mapreduce Simple Example

Mapreduce

MapReduce relies on <key, value> pairs when mapping. Every move from previous key to the new key is considered a single unique instance of an updated value from its previous state to a new state, reliance will be on the cumulative value.

inputFile ->
    map()<k_origin, v_origin>
    combine()<k_next, value_next>
    reduce()<k_final, v_final>
-> outputFile

Unzip purchases.txt and use it as an input file for the mapper.

Because of the permission issues that I spent time figuring it out and it requires core security changes for .ssh folder path permissions:

I ran the scripts local to Hadoop directory. Instead of feeding the data file to the mapper.py, I had to change sys.stdin and had to open input file within and write results to file as an output.

hadoop jar hadoop-streaming-2.3.0-cdh5.1.0.jar \
    -input myinput \
    -output joboutput \
    -mapper mapper.py \
    -reducer reducer.py \
    -file mapper.py \
    -file reducer.py

OR

cat purchases.txt | python mapper.py | sort -o mapper_output.txt mapper_output.txt | python reducer.py > joboutput

With the same efficiency, mapping/sorting/reducing taking place when running python mapper.py and python reducer.py with the difference of instead of relying on the key to retrieve the final, the scripts will store the results separately and take actions on them.

Check out the output results in joboutput for this sample example that finds the total sales values of the toys and consumer electronics:

Toys Total = 57463477.11
Consumer Electronics Total = 57452374.13

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
img		img
README.md		README.md
joboutput		joboutput
mapper.py		mapper.py
purchases.txt.zip		purchases.txt.zip
reducer.py		reducer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mapreduce

About

Releases

Packages

Languages

hovig/mapreduce

Folders and files

Latest commit

History

Repository files navigation

Mapreduce

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages