Chat with us, powered by LiveChat Cleaning and Profiling Code - EssayAbode

Cleaning and Profiling Code

Cleaning and Profiling Code

Use only Hadoop MapReduce in this part of your project.

Do not use anything else.

You must write and submit 2 separate MapReduce jobs:


MR Job 1.

Data profiling – to explore your data

– Name the files: CountRecs.java, CountRecsMapper.java, CountRecsReducer.java

(Please use these exact names for your classes)

– This MR job counts the number of records in a dataset

– Run it on the original dataset, before cleaning, and output the number of records

– Run it on the cleaned dataset (result of MR Job 2 described below), output number of records – If the number of records don’t match, you should figure out why that is

– Re-submit a schema if it has changed.

MR Job 2.

Data cleaning – to avoid nasty exceptions later on in your analytic

– Name the files: Clean.java, CleanMapper.java, CleanReducer.java

(Please use these exact names for your classes)

– This MR job cleans the data – for example, by dropping columns you don’t need.

– It should write out a new file with only the columns you will use in your analytic.

– The selected columns for your data schema

FOR FULL CREDIT, PROVIDE THE CLASSES FOR EACH JOB

Requirements: based on the question | .doc file

Related Tags

Academic APA Assignment Business Capstone College Conclusion Course Day Discussion Double Spaced Essay English Finance General Graduate History Information Justify Literature Management Market Masters Math Minimum MLA Nursing Organizational Outline Pages Paper Presentation Questions Questionnaire Reference Response Response School Subject Slides Sources Student Support Times New Roman Title Topics Word Write Writing