ISIT312 Big Data Management

School of Computing & Information TechnologySession: 4, 2024

ISIT312 Big Data Management

SIM S4 2024

Assignment 1


The objectives of Assignment 1 include implementation of HDFS applications,implementation of simple MapReduce applications, and describing an implementation ofcomplex MapReduce applications.This assignment is due on 20 October 2024 by 9:00 pm Singaporean Time (SGT).This assignment is worth 10% of the total evaluation in the subject.The assignment consists of 3 tasks and the specification of each task starts from a new page.Only electronic submission through Moodle at:https://moodle.uowplatform.edu.au/login/index.phpwill be accepted. A submission procedure is explained at the end of Assignment 1


A policy regarding late submissions is included in the subject outline. Only one submission ofAssignment 1 is allowed and only one submission per student is accepted.A late submission penalty (25% of the total mark) will be applied for every 24 hours late.A submission that contains an incorrect file attached is treated as a correct submission with allconsequences coming from the evaluation of the file attached.All files left on Moodle in a state "Draft(not submitted)" will not be evaluated.

An implementation that does not compile well due to one or more syntactical and/or run timeerrors scores no marks.The first assignment is an individual assignment and it is expected that all its tasks will besolved individually without any cooperation with the other students. However, it is allowedto declare in the submission comments that a particular component or task of this assignmenthas been implemented in cooperation with another student. In such a case evaluation of a taskor component may be shared with another student. In all other cases plagiarism will result in

a FAIL grade being recorded for entire assignment. If you have any doubts, questions, etc.please consult your lecturer or tutor during laboratory/tutorial classes or over e-mail.Task 1 (3 marks)

Iplementation of HDFS application Implement a HDFS application that merges two files located in HDFS into one file also locatedin HDFS.

The application must have the following parameters.

(1) A path to, and a name of the first input file in HDFS.

(2) A path to, and a name of the second input file in HDFS.

(3) A path to, and a new name of an output file to be created in HDFS. The file is supposedto contain the contents of the first input file followed by the contents of the second inputfile.Perform the following steps.

Implement the application and save its source code in a file solution1.java.Upload two files to HDFS. The contents, the name, and the locations of the files in HDSF areup to you.When ready, compile, create jar file, and process your application. Display the resultscreated by the application.Use Hadoop to provide a piece of evidence that two files uploaded into HDFS have beensuccessful merged into one file in HDFS.


A file solution1.java with a source code of the application that merges two HDFS files.A file solution1.pdf that contains the contents of Terminal window with a report fromcompilation, creation of jar file, uploading to HDFS two small files for testing, processing of

the application, and evidence that two files uploaded into HDFS has been successful mergesin one file in HDFS with explanation of how the statements work.Task 2 (4 marks)

Implementation of MapReduce application Assume, that a speed camera records the speed of passing cars and saves the measurements ina text file. The speed of each car is measured in kilometres per hour. Asingle row in the filecontains a car registration number, a location of the camera, a date when the speed has beenmeasured, and the speed of a car with the recorded registration number. The values are always

separated with a single blank.For example, a sample file (SpeedCamera.txt) with the speed measurements contains the

following lines:PKR856 AYE 14-NOV-2021 80

UPS234 CTE 20-FEB-2022 110

PKR856 PIE 20-MAR-2020 90

PKR856 PIE 17-JUN-2021 100

UPS234 CTE 22-SEP-2022 100

UPS234 CTE 03-AUG-2020 90

Assume, that a speed limit in a location of the speed camera is 90 kilometres per hour.Your task is to implement a MapReduce application, that finds an average speed of all cars,

that exceeded a speed limit in the location of the speed camera.An input file with the speed measurements must include the lines listed above and it mustcontain at least 20 measurements. All additional measurements are up to you.Save your solution in a file solution2.java.When ready, compile, create a jar file, and process your application. Display the resultscreated by the application. The result of your application includes (1) the content of your inputfile, (2) the car registration number, the location of the camera, and the average speed that

exceed the speed limit. When finished, Copy and Paste the messages from a Terminal screeninto a file solution2.pdf.

A sample output of the application is as follows:


A file solution2.java with a source code of the application that implement the

functionality of the problem statement specified above. A file solution2.pdf with a report

from the compilation of your code, the creation of the jar file, the processing of your

application, the listing of your input file with the speed measurements , and the results of

processing the solution2.java.Task 3 (3 marks)

Implementation of MapReduce application

An application MinMax described in an Exercise 2 has the functionality the same as the

following SQL statement.

SELECT key, MIN(value), MAX(value)

FROM Sequence-of-key-value-pairs


Extend Java code of the application such that it implements the functionality the same as the

following SQL statement.

SELECT key, MAX(value), MIN(value), AVG(value), SUM(value)

FROM Sequence-of-key-value-pairs


Save your solution in a file solution3.java.When ready, compile, create the jar file, and process your application. To test yourapplication, you can use a file sales.txtincluded in the zipped file of this specification.

Display the results created by the application. When finished, Copy and Paste the messages

rom a Terminal screen into a file solution3.pdf.A sample output of the application is as follows:


A file solution3.java with a source code of the application that implement thefunctionality of SELECT statement given above. A file solution3.pdf with a report fromcompilation, creation of the jar file, processing of your 代 写ISIT312 Big Data Management application, and screen captures of

the results of processing solution3.java.Submission of Assignment 1 Note, that you have only one submission. So, make absolutely sure that you submit the

correct files with the correct contents. Please submit an Academic Consideration in SOLS if an extension (1 week maximally) is required.

Please combine the files solution1.pdf, solution2.pdf, and solution3.pdf as a single pdf (solutions.pdf) first, then zip the files

solutions.pdf, solution1.java, solution2.java, and

solution3.java into a single zipped file (A1-solutions.zip). Please submit thezipped file through Moodle in the following way:

(1) Access Moodle at http://moodle.uowplatform.edu.au/

(2) To login use a Login link located in the right upper corner the Web page or in themiddle of the bottom of the Web page

(3) When logged select a site ISIT312 (SP424) Big Data Management

(4) Scroll down to a section SUBMISSIONS

(5) Click at Assignment 1 link.

(6) Click at a button Add Submission

(7) Move the zipped file A1-solutions.zip into an area You can drag and drop files here to add them. You can also use a link Add…

(9) Click at a button Save changes

(10)Click at a button Submit assignment

(11)Click at the checkbox with a text attached: By checking this box, I confirm that this submission is my own work, … in order toconfirm authorship of your submission.

(12)Click at a button Continue End of specification

