School of Natural and Computing Sciences Department of Computing Science 2024 – 2025 Programming assignment – Individually Assessed (no teamwork) Title: JC4001 – Distributed Systems Note: This assignment accountsfor 30% ofyour total mark of the course.
Learning Outcomes
On successful completion of this component a student will have demonstrated to be able to:
- Understand the principles of federated learning (FL) in distributed systems and how itdiffers from centralized machine learning.
- Implement a basic federated learning in distributed systems for image classification usingthe MNIST dataset.
- Simulate a federated learning environment in distributed systems where multiple clientsindependently train models and the server aggregates them.
- Explore the effects of model aggregation and compare with centralized training.
- Evaluate the performance of the FL model under different conditions, such as non-IID datadistribution and varying number of clients.
Information for Plagiarism and Collusion: The source code and your report may be submitted forlagiarism check. Please refer to the slides available at MyAberdeen for more information abouavoiding plagiarism before you start working on the assessment. The use of large languagemodels, such as ChatGPT, for writing the code or the report can also be considered as plagiarism.In addition, submitting similar work with another student can be considered as collusion. Also readthe following information provided by the university:Introduction In this assignment, your task is to build a federated learning (FL) algorithm in a distributed system.FL is a distributed approach to train machine learning models, designed to guarantee local dataprivacy by training learning models without centralized datasets. As shown in Fig. 1, the FL structureshould include two parts. The first part is an edge server for model aggregation. The second part
each device transmits the updated local model to the edge server for local model aggregation.Figure 1. Illustration of the FL structure.General Guidance and Requirements Your assignment code and report must conform to the requirements given below and include therequired content as outlined in each section. You must supply a written report, along with thecor代写JC4001 – Distributed Systemsresponding code, containing all distinct sections/subtasks that provide a full critical and reflectiveaccount of the processes undertaken.This assignment can be done in Python/PyCharm on your own device. If you work on your own device,then be sure to move your files to MyAberdeen regularly, so that wecan run the application andmark it.Note that it is your responsibility to ensure that your code runs on Python/PyCharm. By default,your code should run by directly clicking the “run” button. If your implementation uses some othercommand to start the code, it must be mentioned in the report.Submission Guideline. After you finish your assignment, please compress all your files in acompressed file and submit it in MyAberdeen (Content -> Assignment Submit -> View Instructions ->Submission (Drag and drop files here))page 3 of 4
Part 1: Understanding Federated Learning [5 points]
- Read the Research Paper: You should read a foundational paper on federated learning, suchas Communication-Efficient Learning of Deep Networks from Decentralized Data byMcMahan et al. (2017).
- Summary Task: Write a 500-word summary explaining the key components of federatedlearning (client-server architecture, data privacy, and challenges like non-IID data). [5 points]
Part 2: Centralized Learning Baseline [15 points]
- Implement Centralized Training: You should implement a simple neural network using a
centralized approach for classifying digits in the MNIST dataset. This will serve as a
baseline.
o Input: MNIST dataset. [5 points]
o Model: A basic neural network with several hidden layers. [5 points]
o Task: Train the model and evaluate its accuracy. [5 points]
Part 3: Federated Learning Implementation [30 points]
- Simulate Clients: Split the MNIST dataset into several partitions to represent data storedlocally at different clients. Implement a Python class that simulates clients, each holding asubset of the data. [10 points]
o Task: Implement a function to partition the data in both IID (independent andidentically distributed) and non-IID ways.
- Model Training on Clients: Modify the centralized neural network code so that each clienttrains its model independently using its local data. [5 points]
- Server-Side Aggregation: Implement a simple parameter serverthat aggregates modelupdates sent by clients. Use the Federated Averaging (FedAvg) algorithm: [10 points]
o Each client sends its model parameters to the server after training on local data.
o The server aggregates these parameters (weighted by the number of samples eachclient has) and updates the global model.
- Communication Rounds: Implement a loop where clients train theirlocal models and theserver aggregates them over multiple communication rounds. [5 points] Part 4: Experimentation and Analysis [20 points]page 4 of 4
- Experiment 1 - Impact of Number of Clients: [10 points]
o Vary the number of clients (e.g., 5, 10, 20) and evaluate the accuracy of the finalfederated model.
o Plot the training accuracy and loss over communication rounds for each case.
- Experiment 2 - Non-IID Data: [10 points]
o Modify the data distribution across clients to simulate a non-IID scenario (whereclients have biased or skewed subsets of the data).o Compare the performance ofthefederated learning model when clients have IIDdata vs. non-IID data. Plot the accuracy and loss over communication rounds forboth cases.Part 5: Performance Comparison with Centralized Learning [5 points]
- Compare the federated learning model(both IID and non-IID)to the centralized learningRequirements and Marking Criteria for the Project Report [25 points] You should write a report. Your report should describe the overall design of the federated learningin distributed system, as well as the challenges faced during programming federated learning.The marking criteria for the report is the following:
- Structure and completeness (all the aspects are covered) [5 points].
- Clarity and readability (the language is understandable) [5 points].
- Design explained [5 points].
- Challenges discussed [5 points].
- Referencesto the sources [5 points].
Submission You should submit the code and the report in MyAberdeen, using the Assignment Submit linked inMyAberdeen for the coursework assignment. The deadline is 22 December 2024. Please do not be
late than the deadline.
标签:code,JC4001,Distributed,points,Systems,learning,model,data,your From: https://www.cnblogs.com/CSE2425/p/18607068