AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System1
AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System
Introduction
Welcome to your final project for the Data Management course. This project isdesigned to integrate and apply the skills you've acquired throughout thecourse, including data acquisition, web scraping, ETL processes, SQL databasemanagement, automation with Bash scripts, and API development using R.
ou will create an Outfit Of The Day Recommendation System (Outfit RecSys)that recommends daily outfits based on the current weather in London. The
system will scrape clothing items from websites, store them in a database,retrieve weather data from a public API, and provide outfit recommendationsthrough an API endpoint.
Project Overview
The Outfit RecSys should:
Database: Contain a database of at least 25 clothing items, scraped froman appropriate fashion website - including:5 pairs of shoes5 bottoms (e.g., pants, skirts)5 tops (e.g., shirts, blouses)5 coats or jackets5 accessories (e.g., umbrellas, sunglasses)
API Endpoint: Be accessible through an API endpoint using Plumber in R.
Functionality:When the API is called, it should:
Check the current weather in London.AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System2
Generate an outfit from the closet database using simple rules.
Create a plot showing the weather forecast and images of therecommended outfit.
Project Components
This project consists of several interconnected components:
- Data Acquisition and Scraping: Scrape at least 25 clothing items, including5 from each category (shoes, bottoms, tops, coats, and accessories).
- Data Processing and ETL: Clean and store the scraped data into a SQLdatabase using the provided schema.
- Weather Data Integration: Use the Weatherstack API to get currentweather data for London and integrate it into your recommendation system.
- Recommendation System: Build a simple rules-based recommendersystem that generates an outfit based on the weather conditions in London.
- API Development: Implement an API using Plumber in R with twoendpoints:/ootdto get the outfit recommendation andrawdatato return allproduct data.
- Automation: Automate the entire workflow using Bash scripts.Detailed Instructions
Use the following names for your scripts:
- product_scraping.R
- weatherstack_api.R
- etl.R
- ootd_api.Rrunootd_api.Rrun_pipeline.sh
- Data Collection and Web Scraping
Objective: Scrape product images and information to populate your closetdatabase.AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System3
Instructions:
Choose a Website: Select an online clothing retailer that allows webscraping (ensure compliance with the website's terms of service).
Scrape Data:Collect at least 25 items covering the categories mentioned above.
For each item, collect the following information:Product Name
Category (e.g., shoes, tops)
Image URL
Download Images:Save the product images locally in a folder named
images, located inyour project folder.
Example Code Snippet:
# product_scraping.Rlibrary(rvest)
# Example: Scraping product names and image URLs
url <- "https://www.example.com/clothing"
webpage <- read_html(url)
product_names <- webpage %>% html_nodes(".product-name") %
>% html_text()
image_urls <- webpage %>% html_nodes(".product-image") %>%
html_attr("src")
# Download images
for(i in seq_along(image_urls)) {
download.file(image_urls[i], destfile = paste0("images/",
product_names[i], ".jpg"), mode = "wb")
}AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System
4
# Create a data frame
products <- data.frame(
name = product_names,
category = ..., # Extract category
image_path = paste0("images/", product_names, ".jpg"),
stringsAsFactors = FALSE
)# Save data frame for ETL processwrite.csv(products, "products_raw.csv", row.names = FALSE)
Note: Replace selectors like".product_name"
with the actual CSS selectors fromthe chosen website.
- Weather Data AcquisitionObjective: Retrieve current weather data for London using the WeatherstackAPI.
Instructions:You should already have a 代 写AM05 AUT24 Outfit Of The Day Recommendation Weatherstack API account and API key fromAssignment #1. Otherwise follow the instructions below:Sign Up: Register for a free API key at Weatherstack.
Store API Key: Save your API key in an environment variable named
YOUR_ACCESS_KEY
.Access Weather Data:
Example Code Snippet:
# weatherstack_api.R
library(httr)
library(jsonlite)
# Retrieve API key from environment variable
api_key <- Sys.getenv("YOUR_ACCESS_KEY")
# Construct API requestAM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System
5
response <- GET(
url = "http://api.weatherstack.com/current",
query = list(
access_key = api_key,
query = "London"
)
)
# Parse response
weather_data <- content(response, as = "text") %>% fromJSON
(flatten = TRUE)
# Extract relevant information
current_temperature <- weather_data$current$temperature
weather_descriptions <- weather_data$current$weather_descri
ptions
# Save weather data for use in recommendation logic
saveRDS(weather_data, "weather_data.rds")
- ETL Process and Database ManagementObjective: Clean and store product data into a SQL database.
Instructions:Create a Database: Use SQLite for simplicity (no server setup required).
Define Schema: Ensure all students use the same schema.Schema:
CREATE TABLE closet (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT,
category TEXT,
image_path TEXT
);
ETL Process:AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System
6
Read the raw product data from
products_raw.csv
.
Clean the data (e.g., handle missing values).
Insert the cleaned data into the
closet
table.Example Code Snippet:
# etl.R
library(RSQLite)
library(dplyr)
# Read raw data
products <- read.csv("products_raw.csv", stringsAsFactors =
FALSE)
# Data cleaning (example)
products_clean <- products %>%
filter(!is.na(name), !is.na(category), !is.na(image_pat
h))
# Connect to SQLite databaseconn <- dbConnect(SQLite(), dbname = "closet.db")
# Write data to databasedbWriteTable(conn, "closet", products_clean, overwrite = TRUE, row.names = FALSE)
# DisconnectdbDisconnect(conn)
- Outfit Recommendation LogicObjective: Implement rules-based logic to recommend outfits based onweather conditions.Instructions:Define Rules:Temperature > 25°C: Light clothing (e.g., t-shirts, shorts, sandals).AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System7Temperature 15°C - 25°C: Comfortable clothing (e.g., long-sleeve tops,jeans, sneakers).
Temperature < 15°C: Warm clothing (e.g., jackets, sweaters, boots).
Rain Forecast: Include a raincoat or umbrella.Sunny: Suggest sunglasses.
Implement Logic: Use R to query the database and select items matching
the rules.
Example Code Snippet (within
ootd_api.R
):
# ... within the /ootd endpoint function
# Load weather data
weather_data <- readRDS("weather_data.rds")
temperature <- weather_data$current$temperature
weather_desc <- weather_data$current$weather_descriptions
# Connect to database
conn <- dbConnect(SQLite(), dbname = "closet.db")
# Initialize outfit list
outfit <- list()
# Apply rules
if (temperature > 25) {
# Select light clothing
outfit$top <- dbGetQuery(conn, "SELECT * FROM closet WHER
E category = 't-shirt' LIMIT 1")
outfit$bottom <- dbGetQuery(conn, "SELECT * FROM closet W
outfit$shoes <- dbGetQuery(conn, "SELECT * FROM closet WH
ERE category = 'sandals' LIMIT 1")
} else if (temperature >= 15 && temperature <= 25) {
# Select comfortable clothing
} else {
# Select warm clothingAM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System8
}
# Check for rain
if (grepl("Rain", weather_desc)) {
outfit$accessory <- dbGetQuery(conn, "SELECT * FROM close
t WHERE category = 'umbrella' LIMIT 1")
}
# Disconnect
dbDisconnect(conn)
# Proceed to create the plot with selected items
- API Development with PlumberObjective: Develop two API endpoints using Plumber in R.Endpoints:
/ootd : Returns a plot showing the outfit recommendation.
/rawdata
: Returns all product data as a JSON object.Instructions:
Setup Plumber: Install and load theplumber
package.Define Endpoints:
Example Code Snippet:
# ootd_api.R
library(plumber)
library(DBI)
library(RSQLite)
library(jsonlite)
#* @apiTitle Outfit Recommendation API
#* Get Outfit of the DayAM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System
9
#* @get /ootd
function() {
# Implement recommendation logic (as per previous sectio
- n)
# Create a plot
plot.new()
# Example plot code:
plot.window(xlim=c(0,1), ylim=c(0,1))
text(0.5, 0.9, paste("Date:", Sys.Date()), cex=1.5)
text(0.5, 0.8, paste("Weather:", weather_desc), cex=1.2)
# Add images (this is a placeholder, you need to use func
tions like rasterImage)
# Return the plot
}
#* Get Raw Product Data
#* @get /rawdata
function() {
conn <- dbConnect(SQLite(), dbname = "closet.db")
data <- dbGetQuery(conn, "SELECT * FROM closet")
dbDisconnect(conn)
return(toJSON(data))
}
- Guidance on the Outfit of the Day Format
You are required to generate an outfit recommendation output that presents the
selected items in a clear and visually appealing manner. This output will be a
key component of your project's deliverables, particularly when testing your
/ootd
API endpoint. Below are the guidelines to help you create an effective
recommendation output.
Essential Components
Your
/ootd
recommendation output is an image that must include the following
elements:
- Date and Weather Forecast:
Today's Date: Display the current date prominently.AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System
10
Weather Forecast: Include a brief description of the weather
conditions, such as temperature, weather descriptions (e.g., sunny,
rainy), and any other relevant details retrieved from the Weatherstack
API.
- Outfit Images:
Clothing Categories: The outfit must consist of images representing
each of the following categories:
Shoes
Bottom (e.g., trousers, jeans, skirts)
Top (e.g., shirts, sweaters, blouses)
Outerwear (e.g., jackets, coats)
Accessory (e.g., sunglasses, umbrella, bag)
Image Quality: Ensure that the images are clear and of high quality sothat the details of each item are visible.Layout and Presentation You have creative freedom in how you present the outfit images, but yourlayout should adhere to the following guidelines:Clarity and Visibility: Arrange the images in a way that each item is fully visible and notobscured by other elements.Avoid overlapping images unless it enhances the presentation withoutcompromising clarity.Layout Options: Mosaic/Grid Layout: Place the images in a grid format, aligning themneatly in rows and columns. This approach ensures that each item hasits own space.
Stylistic Overlay: If you prefer a more creative approach, you canoverlay the images to mimic how the outfit would look when worntogether. Ensure that this methodstill allows each item to be distinctlyidentified.Labels and Annotations (Optional):AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System11You may include labels or brief descriptions next to each item toindicate the category or any special features.Use legible fonts and colours that contrast well with the backgroundand images.
Example Approaches
Here are some ideas on how you might structure your output:
- Mosaic/Grid Example:
- Stylistic Overlay Example:AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System12Technical Implementation TipsImage Processing with magick
: Use themagickpackage in R to manipulate and combine images.Ensure that all images are resized proportionally to maintain aspect
ratios.
Useimage_append()orimage_montage()functions to arrange images in agrid.For overlays, useimage_composite()with appropriate gravity and offsets.Adding Text Annotations: Useimage_annotate()to add the date and weather information at the topor bottom of the output image.Choose font sizes and styles that are readable and professional.File Formats and Sizes: Save the final output as a PNG or JPEG file.
Optimise the image size to balance quality and file size.Testing Your Output Visual Inspection:AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System13Open the generated image to ensure that all elements are displayedcorrectly.Check for any distortions or misalignments.
Consistency with Recommendation Logic: Verify that the selected items align with your recommendation logicbased on the weather data.Ensure that accessories like umbrellas are included on rainy days.
- Automation with Bash ScriptsObjective: Automate the entire pipeline so that the assessor can run your Bashscript and retrieve the outfit recommendation.Instructions:Create a Bash Script: Name irun_pipeline.sh.Script Requirements:Accept an input variable for the Weatherstack access key.Example:
#!/bin/bash
# Usage: ./run_pipeline.sh YOUR_ACCESS_KEY
YOUR_ACCESS_KEY=$1
export YOUR_ACCESS_KEY
# Run R scripts
Rscript product_scraping.R
Rscript weatherstack_api.R
Rscript etl.R
Rscript run_ootd_api.R &
# Wait for API to start
sleep 5
# Call the /ootd endpointAM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System
14curl "<http://localhost:8000/ootd>" --output ootd_plot.pngecho "Outfit of the Day plot saved as ootd_plot.png"Run OOTD API: Therun_ootd_api.Rscript should start the Plumber API onport 8000.
Example Code Snippet:
# run_ootd_api.R
library(plumber)
# Load the API
r <- plumb("ootd_api.R")
# Run the API on port 8000r$run(port = 8000)
Deliverables
Project Folder: A zipped folder namedwin-123456.zipor
mac-123456.zip,
where123456s your student number.
- Bash Script:A script namedrun_pipeline.sh
that:Takes an input variable for the Weatherstack access key
(
YOUR_ACCESS_KEY
).
Loads and runs all relevant R scripts.
Makes a call to the
/ootd
endpoint using
curl
to produce the plot
of the Outfit of the Day.
- R Scripts:
product_scraping.R
weatherstack_api.R
etl.RAM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System
15
ootd_api.R
run_ootd_api.R
- Product Images:
A folder namedimages
containing the product images associated withyour closet database.
Example outfit image
- Outfit generated from your
/ootdendpoint with file nameootd_plot.png.Readme fileREADME Updates: In yourREADME.md, how a section that explains how therecommendation output is generated.Provide any instructions necessary to reproduce the output.Important Notes Environment Variables: Ensure your API key is retrieved from anenvironment variable that is passed to the bash script from the commandline.
Example:
#!/bin/bash
# example_script.sh
# Usage: ./example_script.sh YOUR_API_KEY
# Check if the API key is providedif [ -z "$1" ]; thenecho "Usage: $0 YOUR_API_KEY"exit 1fi# Get the API key from the command-line argument
YOUR_API_KEY=$1AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System16
# Export the API key as an environment variableexport YOUR_API_KEY
# Now you can use the API key in your script or in scri
echo "API key has been set as an environment variable."
# Example usage within the scriptecho "Using the API key in the script:"
echo "The API key is: $YOUR_API_KEY"
# Example of running another script that uses the API ke
# Assuming you have a script called api_call_script.sh
# ./api_call_script.sh
# Alternatively, run an R script that uses the API key
# Rscript my_r_script.R
Suppose your API key is abcd1234. You would run the script as follows:
./example_script.sh abcd1234
Using the API Key in an R Script (e.g., my_r_script.R):
# my_r_script.R
# Retrieve the API key from the environment variable
api_key <- Sys.getenv("YOUR_API_KEY")
if (api_key == "") {
stop("API key not found. Please ensure YOUR_API_KEY is}
# Use the API key in your API calls
# For example:
library(httr)
response <- GET("https://api.example.com/data", add_headAM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System
17# Process the response as needed
Port Configuration: The API should run on port8000
. Ensure no otherservices are using this port.Dependencies: List any R packages required in aREADME.mdfile.Testing: Verify that your entire pipeline works on a different machine to
ensure it runs outside of your development environment.
Assessment Criteria (Total: 100 points)
- Data Collection and Scraping (15 points)Quality and completeness of the web scraping script (10 points).Variety and coverage of items across different categories (5 points).Database Design and Implementation (10 points)Correct SQL database design according to the specified schema (5points).Successful population of the database with scraped items (5 points).
- Weather Integration (10 points)Successful integration and automation of weather data retrieval (5points).Correct usage and storage of weather data in the system (5 points).
- Outfit Recommender System (20 points)Effectiveness of the recommendation logic (10 points).Proper implementation using R (10 points).
- Automation and Workflow (15 points)Use of Bash scripts to automate tasks (10 points).Correct execution of the entire pipeline from the script (5 points).
- Code Quality and Documentation (10 points)Code readability and adherence to best practices (5 points).AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System18Clear documentation and instructions in aREADME.mdfile (5 points).
- OOTD Endpoint Functionality (20 points)/ootdendpoint returns a plot showing date, weather forecast, and outfitimages (10 points)./rawdataendpoint returns all products in the closet database as JSON(10 points).
Bonus steps / functionality (10 bonus points) 50+ products are added to your closet database (5 points)/ootdendpoint has additional functionality to product two or more outfitchoices for each call rather than 1 outfit. (5 points)Submission Instructions Deadline: See canvas assignment page.
File Naming: Ensure your zipped folder follows the naming convention (win-
123456.zip
or
mac-123456.zip
).
Tips and Best Practices
Testing: Run your Bash script from start to finish to ensure all components
work seamlessly.
Error Handling: Include error checks in your scripts to handle potential
issues (e.g., missing data, API errors).
Comments: Comment your code to explain the logic and flow.
Dependencies: Use
renv
or list your packages to ensure the assessor can
install them easily.
Security: Do not hardcode your API keys in the scripts; always use
environment variables.
Data Privacy: Ensure compliance with data scraping regulations and
respect website terms of service.AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System
19
Getting Started
- Set Up Your Environment:
Install necessary R packages:
rvest
httr
jsonlite
DBI
RSQLite
plumber
Ensure you have
curl
installed for making HTTP requests in the Bash
script.
- Plan Your Approach:
Review the requirements and plan each step.
Start by setting up your database schema.
- Incremental Development:
Test each component individually before integrating.
Use print statements or logs to debug.
- Consult Course Materials:
Revisit workshops and assignments related to each component.
Support
If you have any questions or need clarification, please reach out during office
hours or via email at [email protected].
Good luck with your project!
APPENDIX 1.0 - Guidelines for Your README FileAM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System
20
Your
README.md
file is a crucial part of your project submission. It should provide
clear instructions and information to help others understand and run your
project without any confusion. Below are some key points you should include:
Project Title and Description:
Clearly state the name of your project.
Provide a brief overview of the project's purpose and functionality.
Table of Contents (Optional for Longer READMEs):
If your README is extensive, include a table of contents to help readers
navigate the document.
Prerequisites and Dependencies:
List all software, packages, and libraries required to run your project.
For example: R (version X.X.X), SQLite, Bash shell,
rvest
,
httr
,
jsonlite
, etc.
Include any system requirements or platform-specific instructions.
Provide commands or steps to install these dependencies.
Installation and Setup Instructions:
Step-by-step guidance on how to set up the project environment.
Cloning or downloading the project repository.
Setting up directories and files.
Instructions on obtaining and setting up the Weatherstack API key.
How to export the API key as an environment variable if needed.
Project Structure Overview:
Briefly describe the purpose of each major script and file in your
project.
product_scraping.R
: Scrapes product data and images from the web.
weatherstack
_
api.R
: Fetches current weather data using the
Weatherstack API.
etl.R
: Cleans data and populates the SQLite database.
ootd
_
api.R
: Defines the API endpoints using Plumber.AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System
21
run_ootd
_
api.R
: Runs the API server.
run
_
pipeline.sh
: Bash script that automates the entire pipeline.
images/
: Directory containing product images.
closet.db
: SQLite database file containing the closet data.
Mention any additional files or directories, such as logs or outputs.
Usage Instructions:
How to run the entire pipeline using the Bash script.
Example command:
./run_pipeline.sh YOUR_ACCESS_KEY
Instructions on how to start the API server independently if needed.
Example command:
Rscript run_ootd_api.R
How to access the API endpoints.
Accessing
/ootd
and
/rawdata
via a web browser or using
curl
.
Example:
curl "<http://localhost:8000/ootd>" --output ootd_plot.png
Any additional steps required to generate the outputs.
Recommendation Logic Explanation:
Describe how the weather data influences the outfit recommendation.
Temperature thresholds and corresponding clothing choices.
Handling of specific weather conditions (e.g., rain).
Any additional logic or rules implemented.
Output Description:
Details about the generated outputs, such as the outfit plot image.
Explain the contents and format of
ootd_plot.png
.
Includes date, weather forecast, and images of the outfit items.
Mention any other output files and their purposes.
Additional Features (Bonus Implementations):
Describe any extra items added to the closet beyond the required 25.
Detail any additional API endpoints you have created.
Their purposes and how to access them.AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System
22
Explain if you have implemented multiple outfit suggestions.
Troubleshooting and FAQs:
Common issues that might arise and their solutions.
API key errors.
Missing dependencies.
Port conflicts if the API server doesn't start.
Tips for ensuring the scripts run smoothly.
Dependencies and Package Installation:
Provide a list of R packages and how to install them.
Example:
install.packages(c("rvest", "httr", "jsonlite", "DBI", "RSQLite",
"plumber", "dplyr", "magick"))
Instructions for installing any system-level dependencies if applicable.
License Information (Optional):
Specify any licenses if you are using third-party code or resources.
Contact Information (Optional):
Your name and email address for any questions or feedback.
Acknowledgments (Optional):
Credit any resources, tutorials, or individuals that helped you.
Formatting Tips:
Use Markdown syntax to structure your README:
Headings (
#
,
##
,
###
) for sections and subsections.
Bullet points and numbered lists for clarity.
Backticks for inline code (
code
) and triple backticks for code blocks.
Hyperlinks for referencing external resources or documentation.
Example of a Command:
./run_pipeline.sh YOUR_ACCESS_KEY
Example of Inline Code:AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System
23
To install packages:
install.packages("package_name")
Final Checklist:
Clarity and Conciseness:
Ensure instructions are easy to follow and free of jargon.
Keep sentences and paragraphs short and to the point.
Completeness:
Double-check that all required sections are included.
Verify that all instructions are accurate and up-to-date.
Proofreading:
Check for spelling and grammatical errors.
Ensure consistent formatting throughout the document.
APPENDIX 2.0 - Passing Variables, Data, and Files
Between Scripts in a Pipeline
In a data processing pipeline, it's essential to pass variables, data, and files
from one script to another to ensure seamless execution and maintain
modularity. This practice allows different components of the pipeline to
communicate and share necessary information without tightly coupling the
scripts. Below are various methods to achieve this, along with explanations of
their importance and examples based on the Final Project Assignment:
Personal Outfit Recommendation System.
- Command-Line Arguments
Explanation:
Scripts can accept input parameters directly from the command line when
they are executed.
This method allows you to pass variables, such as API keys or file paths,
dynamically.
Why It's Important:AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System
24
Flexibility: Users can specify different inputs without modifying the script
code.
Security: Sensitive information like API keys can be passed at runtime
instead of hardcoding them.
Example from the Project:
Passing the Weatherstack API Key:
In the Bash script
run_pipeline.sh
, the API key is passed as a command-line
argument:
./run_pipeline.sh YOUR_ACCESS_KEY
Within
run
_
pipeline.sh
, the API key is captured and exported:
#!/bin/bash
# Check if the API key is provided
if [ -z "$1" ]; then
echo "Usage: $0 YOUR_ACCESS_KEY"
exit 1
fi
# Export the API key as an environment variable
export YOUR_ACCESS_KEY=$1
Each R script can then access the API key from the environment variable.
- Environment Variables
Explanation:
Environment variables are key-value pairs available to all processes in the
shell session.
Scripts can read environment variables to obtain necessary information.
Why It's Important:
Security: Keeps sensitive data out of the codebase.
Consistency: Ensures that all scripts access the same variable values.AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System
25
Portability: Environment variables can be easily configured on different
systems.
Example from the Project:
Accessing the API Key in R Scripts:
In
weatherstack
_
api.R
, the API key is retrieved from the environment:
# Retrieve the API key from the environment variable
api_key <- Sys.getenv("YOUR_ACCESS_KEY")
if (api_key == "") {
stop("API key not found. Please ensure YOUR_ACCESS_KEY
is set as an environment variable.")
}
- Reading and Writing Files
Explanation:
Scripts can write data to files, which subsequent scripts read and process.
Common file formats include CSV, JSON, RDS (R's binary format), and
databases.
Why It's Important:
Data Persistence: Stores intermediate results that can be reused or
inspected.
Decoupling: Allows scripts to operate independently, focusing on specific
tasks.
Debugging: Facilitates troubleshooting by examining intermediate files.
Example from the Project:
Sharing Scraped Data:
product_scraping.R
: Scrapes product data and saves it to a CSV file.
# Save raw product data
write.csv(products, "products_raw.csv", row.names = F
ALSE)AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System
26
etl.R
: Reads the CSV file for data cleaning and loading into the
database.
# Read the raw product data
products <- read.csv("products_raw.csv", stringsAsFac
tors = FALSE)
Storing Weather Data:
weatherstack
_
api.R
: Fetches weather data and saves it as an RDS file.
# Save weather data to an RDS file
saveRDS(weather_data, "weather_data.rds")
ootd
_
api.R
: Reads the weather data for generating outfit
recommendations.
# Load weather data
weather_data <- readRDS("weather_data.rds")
- Using Databases
Explanation:
Databases provide a structured way to store and retrieve data.
Scripts can insert data into a database, which other scripts can query as
needed.
Why It's Important:
Data Integrity: Enforces data types and constraints.
Concurrency: Allows multiple scripts to access data without conflicts.
Scalability: Handles larger datasets efficiently.
Example from the Project:
Centralized Data Storage:
etl.R
: Inserts cleaned product data into a SQLite database.AM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System
27
# Connect to the SQLite database
conn <- dbConnect(SQLite(), dbname = "closet.db")
# Write data to the 'closet' tabledbWriteTable(conn, "closet", products_clean, append =TRUE, row.names = FALSE)
ootd_api.R
: Queries the database to select items for the outfit.
# Connect to the SQLite databaseconn <- dbConnect(SQLite(), dbname = "closet.db")
# Query for outfit items based on category
outfit_item <- dbGetQuery(conn, "SELECT * FROM closetWHERE category = 'tops' ORDER BY RANDOM() LIMIT 1")Standard Input and Output (Pipes)Explanation: Scripts can read from standard input (stdin) and write to standard output
(stdout
).
Allows chaining commands using pipes (|), where the output of one
command serves as input to another.
Why It's Important: Stream Processing: Useful for processing data streams or large datasets.Flexibility: Enables quick data transformations without intermediate files.Example from the Project:
Chaining Commands (Hypothetical): While not explicitly used in the project, you could use pipes in the commandline:
# Pass the output of one script to another
Rscript script1.R | Rscript script2.RAM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System
28Function Calls Between Scripts (Sourcing)
Explanation:
One script can source another, effectively importing its functions andvariables.In R,source("script.R")runs the code from the sourced script in the currentenvironment.
Why It's Important:
Code Reusability: Share common functions without duplicating code.
Organisation: Keep code modular and maintainable.
Example from the Project:
Shared Functions (Hypothetical): If you have utility functions used across scripts:
In 'utils.R'
calculate_temperature_category <- function(temp) {
i (temp > 25) {
return("hot")
else if (temp >= 15) {
return("mild")
lse {
return("cold"}
# In 'ootd_api.R'source("utils.R")
# Use the functiontemp_category <- calculate_temperature_category(temperature)Conclusion: Choose the method that best fits the data's nature, the scripts'requirements, and the project's complexity. Combining these methods oftenAM05 AUT24 Final Project Assignment: Outfit Of The Day Recommendation System
29yields the best results in a real-world application.
标签:AUT24,Outfit,Project,API,key,Recommendation,data,Day From: https://www.cnblogs.com/goodlunn/p/18501699