MAT188: Homework 5
Background
Below is an illustration of the south western portion of the great province of British Columbia. Cities arelabelled in blue, and red circles indicate the location of public weather tations. Included with this assignmentis temperature data from each of these weather stations. Our goal is to use linear algebra, with the help ofMATLAB, to filter noise from the data, detect which weather stations were influenced by ocean temperaturesand which were influenced by forest fires.
Part 1 Step 1: Import weather station data Begin by loading the temperature data into Matlab script:
- Temperature data from all weather stations, recorded daily in Celcius, are available at https://github.com/dtxe/mat188_datasets/raw/refs/heads/main/bcweather/temperature.csv
- The locations of each weather station (latitude and longitude coordinates) are available at https://github.com/dtxe/mat188_datasets/raw/refs/heads/main/bcweather/stations.csv
- Use the readtable command to import both CSV files1Hint: readtable is able to import data directly from the internet, without needing to download it first.my_table_name= readtable('mydataset.csv'); % import CSVfile from current foldermy_table_name = readtable('https://data.sets/mydataset.csv'); % import CSV
ile from an internet source% Import the data below
Step 2: Preview and understand the dataset To get an idea of what the data in the table looks like, use the summary function to quickly summarize thedataset.
summary(T) % print summary of table TYou can also open the table in the MATLAB GUI by double-clicking on the variable name in your "Workspace".% Generate summaries of both tables here
To get a sense of the imported data, we can also preview the first or last few rows of the table using the head or
ail commands.head(T) % return the first 8 rows of table Thead(T,20) % return the first 20 rows of table Ttail(T) % return the last 8 rows of table TShow the first 5 rows of both the temperature and station Data Tables.Let's consider if the imported data makes sense:
- Are the values consistent with the dataset description? That is, is the temperature data consistent with yourexpectation of BC temperature.(eg. daily values in Celcius)?
- What is the date range of the data?
- How many weather stations do we have data for?
- 2 In the temperature table, what do rows represent? What do columns represent?
Step 3: Extract temperature data into matrices
In MATLAB a table and a matrix are different objects. Our data are in table format. In this section we convert thedata into matrices.We can index into MATLAB tables using the bracket notation.% Get data by column namevals = T.Station_15; % get data from the column named Station_15%Get data bycolumnindexvals = T(:,10); % get the 10th column from table Tvals = T(:,10:12); % get data from the 10th to 12th columns, inclusive% Get data by row indexvals = T(1:10,:); % get the rows 1-10 from table T% Table to matrixmat = table2array(T); % convert table T into a matrixLet's first extract the date values:Next, let's extract all the temperature values from all stations into one matrix:
From the code examples above, how can we subset a table from all rows of multiple columns by index?
- With a subset table, how can we convert this into a matrix?
Quick sense check: Verify your variable types in the Workspace.Are they matrices (NxM double) or tables (NxM table)?
3Step 4: Initial visualization
For a better sense of what the temperature dataset looks like, let's plot all the temperature values from every
weather station as a function of time.Plot date on the x-axis. Label the axis "Date".
- Plot temperature on the y-axis. Label the axis "Tamprature (C )".
- Each weather station should be an individual line on your line plot.% recall, to make a plot in MATLABfigure % to initialize a new blank figure% then, to generate the plots:plot(x, y) % make a line plot, with variable x on the x-axis, and variable(s) y on the
y-axisscatter(x, y) % make a scatter plot% plots with customized markersscatter(x, y, 'r') % scatter plot with red markersscatter(x, y, '+r') % scatter plot with red plus markers
scatter(x, y, 'xr') % scatter plot with red x markers% label axes and add titlexlabel('Variable (unit)')ylabel('Variable (unit)')title('My plot')
% we've also provided a showmap command to render a map of BC with coordinates roughly to scalefigureshowmaphold onscatter(x, y, 'xr')Let's also plot the location (Longitude and Latitude) of all the weather stations as a scatter plot. Longitude is thex-axis and Latitude the yaxis.
See above for hints on how to customize your plot markers
- Longitude should be on the x-axis, and Latitude should be on the y-axisRecall how to access values from a table by variable name! (See Step 3 above)
- Ensure your markers shape, size, and colour are easy to see! You can use use marker 'xr'.
Step 5: Find principal components using eigenvector decomposition First, we need to mean-center and scale our data:4• For each weather station, subtract the mean temperature from the temperature timeseries
- Divide the timeseries by the standard deviation of the timeseriesmean(X) % compute the mean for each column of matrix X (1st dimension)mean(X, 1) % compute the mean of matrix X along the 1st dimension
mean(X, 2) % compute the mean of matrix X along the 2nd dimensionstd(X) % compute the standard deviation for each column of X (1st dimension)std(X, [], 1) % compute the st dev of X along 1st dimensionstd(X, [], 2) % compute the st dev of X along 2nd dimensionC = cov(X) % compute the covariance matrix for all pairs of columns in X[V,D] = eig(X) % compute the Eigen decomposition of XRepeat this for all weather stations.Hint: What dimension of your matrix corresponds to time? Which dimension do we need to compute the mean
and std along?Find the covariance matrix describing the relationship between the station and the date for the temperature.Find the eigenvalues and eigenvectors of the covariance matrix.Hint: the 'diag' command may be useful for isolating the eigenvalues.
Step 6: Let's do some sense-checks Recall the 代写MAT188 principal components Eigen decomposition:Which MATLAB variable above corresponds to...The matrix V The matrix5The matrix A Recall that if th matrix A is symmetric eigenbasis can be chosen to be orthonormal and henceV becomesorthonormal. Let's verify this with MATLAB.the x and y axes equally spaced, so the matrix visualization isalso squarecolorbar % show the colorbar for interpretation
t's also confirm this numerically.Compute, then check that all values in this matrix are zero using the provided iszero function.zeros(10) % generate a matrix of zeroseye(10) %generate a 10 x 10 identity matrixiszero( zeros(10 ) % check if a matrix of zeros consists of all zerosans = logical1iszero( eye(10 ) % check if the identity matrix consists of all zerosans = logical0Step 5: Identifying the directions of maximum variance Normalize the eigenvaluesby dividing each eigenvalue by the total sum.6• These are the eigenvalues from a decomposition of the covariance matrix.
The normalized eigenvalues can be interpreted as a fraction of variance in the data captured by eacheigenvector (eg. component).We want to identify the components that encompass the most variance in the data.Sort the eigenvalues from highest to lowest. Then arrange the eigenvectors correspondingly.
orted_values, sort_index] = sort(values, 'descend'); % sort values from highest to lowestsorted_columns = X(:, sort_idx); % rearrange columns of another matrixin the sameorderPlot all the normalized and sorted eigenvalues as a line plot.The index of the eigenvalues should be on the x-axisThe normalized eigenvalue should be on the y-axisLet's zoom into the elbow and take a closer look... (eg
Project the temperature measurements into the Eigenspace of the covariance matrix
The Eigenspace of the covariance matrix projects the data along directions in which the temperaturemeasurements are maximally linearly correlatedCurrently, for one day's temperature measurements, each weather station can be considered adimension/axis
We will now project every day's temperature measurements into a new space, where the dimensions/axes correspond to a group of weather stations that tend to varytogetherFor the temperature measurements for each day, project the temperature measurement timeseries intotheeigenbases by computing:7Now, let's plot the temperature data along the first 5 dimensions over time.
- Be sure to add a title, legend, and label for the x-axis.
Which weather stations contribute to each principal component?Visualize the spatial influence of each principal component by plotting the PCA loadings (weights) for each
weather station on a geographic map.Explanation:
After PCA, the loadings for each principal component represent the influence (or "weight") that eachweather station has on that component. Higher loadings mean that thetemperature variations at thatstation significantly contribute to the patterns captured by that component.
Plotting these loadings on a map helps identify if certain geographical areas, like coastal or inland
egions, are more strongly associated with specific components.
Instructions:
- Use the latitude and longitude values in the stations.csv data.
- For each component (e.g., PC1, PC2, etc.), create a scatter plot where each weather station'scoordinates are marked, and the color of each point represents the magnitude of the loading for thatcomponent.For example, use a color gradient (e.g., blue to red) to show low to high loadings.
- This visualization will help you interpret the spatial patterns captured by each component
Interpretation
Weather patterns are complex and influenced by several factors, including incident sunlight, prevailing winds,sea surface temperature, and atmosphere particulates (like smoke from wildfires).
- Which principal components are capturing general seasonal variations?
- Which principal components capture the effect of sea surface temperature? (Hint: consider thegeographical distribution, and that ocean temperature lags seasonal air temperaturesWhich principal components capture the effect of forest fires? Can you identify where the forest fires areburning?
Which principal components capture only noise?
Problem 2: Using OLS to estimate the trend of the salmon population and wildfire likelihood based on yearly sea surface temperatures In a dataset separate from Problem 1, scientists have compiled yearly data on three interconnectedenvironmental variables:
- Estimated Salmon Population
- Yearly Trends in Forest Area Burned by Wildfires
- Mean Ocean Temperature for the Same YearYour task is to analyze this dataset and explore the relationships between these variables. Using the ordinary
least squares (OLS) method, you will model these relationships and interpret the ecological insights youranalysis provides.Part 1: Import and understand the dataset Use the readtable command to download the weather trends dataset from https://github.com/dtxe/mat188_datasets/raw/refs/heads/main/bcweather/trends.csv.Find the array size of the data (number of rows) using the size functioo get an idea of what the data in the table looks like, you can open the table in the "Workspace" by doubleclicking the name you gave it. Or you can print out a summary of the table in the "Command Window" by usingthe function summary() with your chosen table name in the parenthesis.
Part 2: Initial visualization Plot how the salmon population and temperature change throughout the years. Use yyaxis left andyyaxis right to specify the left (salmonpopluation) and right axis (temperature) on the same plot.Hints for plotting:
- Use xlabel and ylabel to set axis titles for your plot
- You can use xlim and ylim to change the limits of the x axis and y axis values if required
- Use legend to label the graphs corresponding to either Temperature or Salmon Population
- Plotting options such as line color and thickness can be added following the first two entries. Exampleplot(x,y,"LineWidth",1.5) will change the thickness of the lines to the input 1.5.figureyyaxis leftplot(x, y1) % plot y1 against xyyaxis rightplot(x, y2) % with a different y-axis on the right, plot y2 against x% Retrieve variables from a tablelot(T.Var1, T.Var2) % plot column Var1 from table T on the x-axis, against Var2 on the
y-axisNow, plot how the wildfire changes with temperature throughout the years.10Part 3: Use projections to perform Ordinary Least Squares to estimate the 1st order (linear)and 2nd order (quadratic) models Let's start with estimating how the salmon population trend changes with yearly sea temperature.Recall the least squares approximation to a solution learned in class that looks like the following:Consider matrix , how can you construct this matrix given the variable temp for both a linear and a quadraticapproximation? Hint: concatenate a ones column vector with the appropriate variable(s).Hint: you can use function inv to take the inverse of a matrix, and a matrix transpose is denoted by the prime
A'. Remeber, you can use the "Workspace" or "Command Window" to visually see how your matrices lookand if their sizes are compatible to perform matrix multiplication.% Perform OLS to find 1st order (Linear model)% Perform OLS to find 2nd order (Quadratic model)Use linspace function to create a column vector of 100 linearly spaced temperature values (these will be thex values on your plot)xpts = linspace(min(temp),max(temp),100)'; % (100 points is default value)Complete the linear and quadratic model matrices using the temp values from the above step. Recall, we arelooking for a linear functionand a quadratic functionthat can model the salmonpopuation trend with sea temperature.% Linear model y values (y = mx+b)% Quadratic model y values (y = ax^2 + bx + c)Part4: Plot the raw data, and your fitted curves 11Graph the salmon population (as the dependent variable) vs the temperature (as the independent variable) as ascatter plot using the scatter command. On the same plot, graph the linear and quaratic models.Hints for plotting: To ensure all three graphs are on one plot, make sure to use the hold on/hold off commands.% Plot Salmon popluation vs temperature data, linear and quatratic model
By visual inspection, which model provides the best fit for the data? In other words, what's the minimumorder of the model (1st, or 2nd order) that adequately captures most of the dependent variation in the databased on the plot? Why do you think so?
Part 5: Analyze trends for wildfires
Repeat the above steps except now replace the salmon population data with the wildfire data. % Perform OLS to find 1st order (Linear model)% Perform OLS to find 2nd order (Quadratic model)% Create a column of 100 linearly spaced x values for plotting (ensure it% encompases all of data)% Linear model y values (y = mx+b)% Quadratic model y values (y = ax^2 + bx + c12% Plot Wildfire popluation vs temperature data, linear and quadratic model
By visual inspection, which model provides the best fit for the data? In other words, what's the minimumorder of the model (1st, or 2nd order) that adequately captures most of the dependent variation in the databased on the plot? Why do you think so?
Well done! You've successfully completed Homework 5! We hope this provided a demonstration of the power of linear algebra and computation software in working withand manipulating, isolating trends in, and extracting insights from large datasets.
Submission instructions
- Run this Live Script from top to bottom to verify the correctness of your code. (Home tab > ClearWorkspace; Live Editor tab > Run)
- Export this Live Script as a PDF file (Live Editor > Export)
- Upload both the MLX and PDF files to GradescopeHelper functions ⚠ Do not delete the code below. ⚠
These functions exist to reduce the complexity of the assignment above, by abstracting away concepts thataren't core to the course curriculum. However, feel free to take a look if you're curious!function [] = showmap()
标签:plot,temperature,MAT188,matrix,values,components,table,data,principal From: https://www.cnblogs.com/CSE231/p/18574276