首页 > 编程语言 >【Python】数据分析 Section 6.4: Heatmaps | from Coursera “Applied Data Science with Python“

【Python】数据分析 Section 6.4: Heatmaps | from Coursera “Applied Data Science with Python“

时间:2024-05-23 23:01:19浏览次数:41  
标签:Applied Heatmaps Python two sample -- so Date data

Heatmaps are a way to visualize three-dimensions of data and to take advantage of spatial proximity of those dimensions.

In making revisions to this course I was really tempted to get rid of the section on heatmaps, as I've seen enough bad heatmaps to last me a lifetime. The problem is heatmaps are really quite powerful when you have the right data. Weather data is a great example. You have two dimensions, latitude and longitude, and then we can overlay on top of this a third dimension, say, temperature or rainfall amounts and use color to indicate its intensity.

In fact, anything with a two dimensional spatial aspect can make for a natural heat map. As an example, eye fixation points through gaze detection is used regularly by researchers and marketing experts to understand what people are looking at on websites. But where heatmaps break down is when there's no continuous - or at least ordinal - relationship between dimensions. Using a heatmap for categorical data, for instance, is just plain wrong. It misleads the viewer into looking for patterns and ordering through spatial proximity. And any such patterns would be purely spurious.

But I decided to keep this in the course, because it can be useful, and I've put together a new example using ordered data, so let's talk about the techniques. In matplotlib, a heatmap is simply a two-dimensional histogram where the x and the y values indicate potential points and the color plotted is the frequency of the observation.

# Let's bring in matplotlib and numpy, as well as pandas and some date time functionality
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from datetime import datetime

# In this example I want to show you some traffic data from New York City, which is
# all available from NYC open data portal.
df=pd.read_csv("../assets/NYC hourly traffic.csv")

# I'm going to convert the date column into a date and time
df["Date"]=df["Date"].apply(pd.to_datetime)

df.head()

# That might take a bit to load if you are following along on Coursera, it's a big 
# dataset. Let's pare it down in size and do some basic exploratory data analysis 
# with histograms. Let's say I'm interested in a single plaza (camera location) and 
# dates for the early part of 2017. I'm going to write a pandas query to do that
sample=df.query("`Plaza ID`==5 & Date>'2016-12-30' & Date<'2017-05-01'")
sample

This syntax might look a bit different to you from our first course. This is an alternative way to query a dataframe, and it uses a library called numexpr to take a query as a string and apply it to the dataframe. It's a bit like SQL in its syntax, if you are familiar with that, but it has a lot of caveats with it. I wanted to expose you to it so you were aware, but it's completely possible for you to use the regular boolean masking method I showed you in course 1. Now would be a great time to pause the video, open the notebook, and see if you could re-write this query using the knowledge you already have on boolean masking.

We have the hour of day here, so let's take a look at a histogram of activity over a day.

# since we have 24 hours in a day I'll set the bins there, and I want
# to see our frequency -- the weights for each bin -- as the number of
# vehicles which have the E-ZPass system. This system automatically bills
# drivers for using the road, and has already been aggregated (summed)
# for us from individual observations
plt.hist(sample["Hour"],bins=24,weights=sample["# Vehicles - E-ZPass"]);

Ok, what do you notice here? I see two spikes, mornings around 7:30 and afternoons starting at about 3 until 6. Sounds like rush hour!

In this example our hours are ordered, so it makes sense to view the data in this way. But we also have days of the week which are ordered. Let's extract the day of the week and look at a histogram of that.

# We can extract the day of the week from the Date column using the
# pandas date time features. The Series object in pandas has an
# attribute "dt" which stores numerous date time transformations for
# us because it's such a common need. In this case we just take the
# Date column (which is a Series object) and get the .dt.dayofweek
# from it.
sample["Day of Week"]=sample["Date"].dt.dayofweek

# Once we have done that we can just look at a histogram
plt.hist(sample["Day of Week"],bins=7,weights=sample["# Vehicles - E-ZPass"]);

Ok, so we can see that traffic flow is pretty steady except for days 5 and 6 in the week, which happen to be the weekends. Now, we could isolate those days and look at the individual histograms for hourly traffic, but we can also look at a joint histogram -- or a heat map -- for both the hourly and daily variables. When we do this we set one variable to be the x axis, another to be the y axis, and then we render our frequency (our weights) as different colors showing the third dimension.

# While it sounds like a lot of work, it isn't really in matplotlib!
# The API looks almost the same as a regular histogram, but in this
# case we have to specify the bin size for each axis
plt.figure(figsize=(12,8)) # make a slightly bigger figure
plt.hist2d(sample["Hour"],
           sample["Day of Week"],
           bins=[24,7],
           weights=sample["# Vehicles - E-ZPass"])

# This next part is optional, but adds a legend telling you the value
# of each bin (cell) in the histogram
plt.colorbar();

Great! So lets disect this. First, we see that across all days (our y axis) the first four or so hours of the day are dark blue (this is the first four columns or so), indicating relatively little traffic. Then we see that for the days 0 through 4 we get a spike in traffic, shown as more yellow intensity cells in the image, but this isn't really true for the last two days of the week (the weekend - the top two rows).

Time data is an interesting case for using heat maps, because we often have cycles of activity within a time period -- in this case weeks -- and that allows us to have two ordered dimensions.

Now would be a good time for you to grab control and see if you could look at a different dimension -- months. Could you plot a heat map where one axis is months and the other is, say, week of the month?

标签:Applied,Heatmaps,Python,two,sample,--,so,Date,data
From: https://blog.csdn.net/Yqalu/article/details/139159316

相关文章

  • Python-Turtle.一箭穿心
            一箭穿心图是一种简单的图形,通常由一个箭头穿过一个心形组成。在Python中,可以使用turtle库来绘制这样的图形。首先,导入turtle库,然后使用turtle库的函数来绘制箭头和心形,最后将箭头和心形组合在一起即可实现一箭穿心图画。        以下是一个简单的Pyt......
  • 蓝桥楼赛第30期-Python-第二天赛题 题解
    楼赛第30期Python模块大比拼解析网页元素目标本次挑战,我们需要使用Python访问软科世界大学排行榜来获取首页30所学校的信息。为避免目标网站的内容发生变化,我们使用保存之后的网页进行实验。链接如下:https://labfile.oss.aliyuncs.com/courses/4070/rank2021.h......
  • Python爬虫基本流程
    Python爬虫是指利用Python编程语言编写的程序,用于从网页上获取数据。通常,爬虫程序会模拟人类用户在网页上的行为,发送HTTP请求获取网页内容,然后解析这些内容以提取所需信息。常用的爬虫库包括requests用于发送HTTP请求,BeautifulSoup用于解析HTML或XML内容,以及Scrapy用于构建更复......
  • Python爬虫--爬取文字加密的番茄小说
    一、学爬虫,看小说很久没有去研究爬虫了,借此去尝试爬取小说查看小说,发现页面返回的内容居然都是加密的。 二、对小说目录进行分析通过分析小说目录页面,获取小说名称等内容引用parsel包,对页面信息进行获取url="https://fanqienovel.com/reader/7276663560427471412?e......
  • Python生成随机验证码
    importrandomfromPILimportImage,ImageDraw,ImageFont,ImageFilterdefcheck_code(width=120,height=30,char_length=5,font_file='Monaco.ttf',font_size=28):code=[]"""Image.new方法用于创建一个新的图像对象。mo......
  • 01-Python 图片转字符画
    fromPILimportImage"""将图片转换为字符画图片转字符画是将一张图片转换成由字符组成的图像,通常用于在命令行界面或者文本编辑器中显示。这个过程主要包括以下几个步骤:-读取图片文件-将图片转换为灰度图像-调整图片的尺寸以适应字符画的宽度......
  • 给大家分享一套非常棒的python机器学习课程
    给大家分享一套非常棒的python机器学习课程——《AI小天才:让小学生轻松掌握机器学习》,2024年5月完结新课,提供配套的代码+笔记+软件包下载!学完本课程,可以轻松掌握机器学习的全面应用,复杂特征工程,数据回归,分类,算法的项目实战应用,以小学生的视角和知识储备即可学会。课程名字:AI小天才......
  • Python多线程案例分析
    接下来,我们将在之前的基础上进一步扩展多线程爬虫案例,增加以下功能:1.动态URL发现与添加:爬虫在解析页面时,能够发现并添加新的URL到队列中。2.设置请求头:模拟浏览器行为,设置请求头中的`User-Agent`。3.使用会话:使用`requests.Session()`对象来保持连接,提高效率。4.避免重......
  • Python 将PowerPoint (PPT/PPTX) 转为HTML
    1.Python 将PowerPoint文档转为HTML格式要实现该转换,仅需加一个.ppt或.pptx文档,然后使用 Presentation.SaveToFile() 方法将其另存为HTML格式。fromspire.presentation.commonimport*fromspire.presentationimport*#加载PPT文档ppt=Presentation()ppt.L......
  • 利用Python训练手势模型代码
    importcv2ascvimportosimportnumpyasnpfromsklearn.decompositionimportPCAfromsklearn.model_selectionimporttrain_test_splitfromsklearn.svmimportSVCfromsklearn.treeimportDecisionTreeClassifierfromsklearn.neighborsimportKNeighborsClassifie......