首页 > 其他分享 >常见食物的营养--大数据分析

常见食物的营养--大数据分析

时间:2023-06-01 21:25:00浏览次数:31  
标签:数据分析 Category 20 -- nutrients Fat values fig 营养

常见食物的营养--大数据分析

  1. 选题背景

在整个生命历程中保持健康饮食有助于预防各种类型的营养不良和一系列非传染性疾病和病症。食物中的营养是我们获取燃料的方式,为我们的身体提供能量。我们需要每天用新的供应来补充体内的营养。脂肪、蛋白质和碳水化合物都是必需的。营养学是解释食物中与生物体的维持,生长,繁殖,健康和疾病有关的营养素和其他物质的科学。它包括摄入,吸收,同化,生物合成,分解代谢和排泄。然而,随着加工食品数量不断增多,快速城市化,以及不断演变的生活方式,人们的饮食模式发生了改变。现在,人们食用更多富含能量、脂肪、游离糖和盐/钠的食品,许多人没有食用足够的水果、蔬菜以及其他膳食纤维(例如全谷类)。多样化、平衡和健康饮食的确切构成因个人情况(例如年龄、性别、生活方式以及身体活动程度等)、文化背景、本地可获得的食物以及饮食习俗而异,而构成健康饮食的基本原则保持不变。

         2.大数据分析方案

从网址中下载完数据后,在python环境中导入pandas、plotly等库进行数据整理,经过数据清洗,检查数据等,然后进行可视化处理,找出常见食物的营养价值,完成数据分析。

数据集来源:网址:www.kaggle.com

参考:数据分析可视化方红生,关于常见食品营养大数据分析

         3. 数据分析步骤

  (1)数据清洗

#导入库并定义用于绘制数据的函数

import pandas as pd

import plotly.express as px

from plotly.subplots import make_subplots

import plotly.graph_objects as go

#读取文档

nutrients=pd.read_csv("D:/文档/python高级应用/nutrients_csvfile.csv")

nutrients.head()

#数据中的 t用0代替。t表示食品中的微量

nutrients = nutrients.replace("t", 0)

nutrients = nutrients.replace("t'", 0)

nutrients.head()

#检查数据集的大小

display(nutrients)

#将逗号转换为相应 int 或浮点变量的数字数据

nutrients = nutrients.replace(",","", regex=True)

nutrients['Protein'] = nutrients['Protein'].replace("-1","", regex=True)

nutrients['Fiber'] = nutrients['Fiber'].replace("a","", regex=True)

nutrients['Calories'][91] = (8+44)/2

#将克、卡路里、蛋白质、脂肪、饱和脂肪、纤维和碳水化合物数据类型转换为 int

nutrients['Grams'] = pd.to_numeric(nutrients['Grams'])

nutrients['Calories'] = pd.to_numeric(nutrients['Calories'])

nutrients['Protein'] = pd.to_numeric(nutrients['Protein'])

nutrients['Fat'] = pd.to_numeric(nutrients['Fat'])

nutrients['Sat.Fat'] = pd.to_numeric(nutrients['Sat.Fat'])

nutrients['Fiber'] = pd.to_numeric(nutrients['Fiber'])

nutrients['Carbs'] = pd.to_numeric(nutrients['Carbs'])

#检查数据类型改变的结果

nutrients.dtypes

#检查数据质量

print(nutrients.isnull().any())

print('-'*245)

print(nutrients.describe())

print('-'*245)

#空值清除

nutrients = nutrients.dropna()

display(nutrients)

#简化类别

nutrients['Category'] = nutrients['Category'].replace('DrinksAlcohol Beverages', 'Drinks, Alcohol, Beverages', regex=True)

nutrients['Category'] = nutrients['Category'].replace('Fats Oils Shortenings', 'Fats, Oils, Shortenings', regex=True)

nutrients['Category'] = nutrients['Category'].replace('Fish Seafood', 'Fish, Seafood', regex=True)

nutrients['Category'] = nutrients['Category'].replace('Meat Poultry', 'Meat, Poultry', regex=True)

nutrients['Category'] = nutrients['Category'].replace(['Breads cereals fastfoodgrains', 'Seeds and Nuts'], 'Grains', regex=True)

nutrients['Category'] = nutrients['Category'].replace(['Desserts sweets', 'Jams Jellies'], 'Desserts', regex=True)

nutrients['Category'] = nutrients['Category'].replace(['Fruits A-F', 'Fruits G-P', 'Fruits R-Z'], 'Fruits', regex=True)

nutrients['Category'] = nutrients['Category'].replace(['Vegetables A-E', 'Vegetables F-P', 'Vegetables R-Z'], 'Vegetables', regex=True)

#将克、卡路里、蛋白质、脂肪、饱和脂肪、纤维和碳水化合物值转换为每克

nutrients['Calories'] = nutrients['Calories'] / nutrients['Grams']

nutrients['Protein'] = nutrients['Protein'] / nutrients['Grams']

nutrients['Fat'] = nutrients['Fat'] / nutrients['Grams']

nutrients['Sat.Fat'] = nutrients['Sat.Fat'] / nutrients['Grams']

nutrients['Fiber'] = nutrients['Fiber'] / nutrients['Grams']

nutrients['Carbs'] = nutrients['Carbs'] / nutrients['Grams']

#检查最终的数据结果

category_dist = nutrients.groupby(['Category']).mean()

category_dist

  (2)数据可视化与分析

#所有指标的类别分布

fig = make_subplots(

rows=2, cols=3,

specs=[[{"type": "domain"},{"type": "domain"},{"type": "domain"}],

[{"type": "domain"},{"type": "domain"},{"type": "domain"}]])

fig.add_trace(go.Pie(values=category_dist['Calories'].values,title='CALORIES',labels=category_dist.index,marker=dict(colors=['#100b','#f00560'],line=dict(color='#FFFFFF',width=2.5))),row=1, col=1)

fig.add_trace(go.Pie(values=category_dist['Protein'].values,title='PROTEIN',labels=category_dist.index,marker=dict(colors=['#100b','#f00560'], line=dict(color='#FFFFFF',width=2.5))),row=1, col=2)

fig.add_trace(go.Pie(values=category_dist['Fat'].values,title='FAT',labels=category_dist.index,marker=dict(colors=['#100b','#f00560'], line=dict(color='#FFFFFF', width=2.5))),row=1, col=3)

fig.add_trace(go.Pie(values=category_dist['Sat.Fat'].values,title='SAT.FAT',labels=category_dist.index,marker=dict(colors=['#100b','#f00560'], line=dict(color='#FFFFFF', width=2.5))),row=2, col=1)

fig.add_trace(go.Pie(values=category_dist['Fiber'].values,title='FIBER',labels=category_dist.index,marker=dict(colors=['#100b','#f00560'], line=dict(color='#FFFFFF', width=2.5))),row=2, col=2)

fig.add_trace(go.Pie(values=category_dist['Carbs'].values,title='CARBS',labels=category_dist.index,marker=dict(colors=['#100b','#f00560'], line=dict(color='#FFFFFF', width=2.5))),row=2, col=3)

fig.update_layout(title_text="所有指标的类别分布",height=700, width=1000)

fig.show()

#寻找营养成分最高的 20 种食物

calories = nutrients.sort_values(by='Calories', ascending= False)

protein = nutrients.sort_values(by='Protein', ascending= False)

fat = nutrients.sort_values(by='Fat', ascending= False)

sat_fat = nutrients.sort_values(by='Sat.Fat', ascending= False)

fiber = nutrients.sort_values(by='Fiber', ascending= False)

carbs = nutrients.sort_values(by='Carbs', ascending= False)

top_20_calories = calories.head(20)

top_20_protein = protein.head(20)

top_20_fat = fat.head(20)

top_20_sat_fat = sat_fat.head(20)

top_20_fiber = fiber.head(20)

top_20_carbs = carbs.head(20)

#Top 20 Calories

fig = px.bar(top_20_calories, x='Food', y='Calories', color='Calories', title=' Top 20 Calories Rich Foods', template = 'plotly_white')

fig.show()

#Top 20 Protein

fig = px.bar(top_20_protein, x='Food', y='Protein', color='Protein', title=' Top 20 Protein Rich Foods', template = 'plotly_white')

fig.show()

#Top 20 Fat

fig = px.bar(top_20_fat, x='Food', y='Fat', color='Fat', title=' Top 20 Fat Rich Foods', template = 'plotly_white')

fig.show()

#Top 20 Sat.Fat

fig = px.bar(top_20_sat_fat, x='Food', y='Sat.Fat', color='Sat.Fat', title=' Top 20 Sat.Fat Rich Foods', template = 'plotly_white')

fig.show()

#Top 20 Fiber

fig = px.bar(top_20_fiber, x='Food', y='Fiber', color='Fiber', title=' Top 20 Fiber Rich Foods', template = 'plotly_white')

fig.show()

#Top 20 Carbs

fig = px.bar(top_20_carbs, x='Food', y='Carbs', color='Carbs', title=' Top 20 Carbs Rich Foods', template = 'plotly_white')

fig.show()

#脂肪和饱和脂肪之间的关系

fig = px.scatter(nutrients, x = 'Fat', y = 'Sat.Fat', trendline = 'lowess', color = 'Fat',color_discrete_map={'Fat':'#cd0000', 'Sat.Fat':'#3399ff'},hover_name='Food' ,template = 'plotly_white', title = '脂肪和饱和脂肪之间的关系')

fig.show()

#基于蛋白质含量的食物比较

Fig =go.Figure(go.Pie(values=category_dist['Protein'].values, text=category_dist.index, labels=category_dist.index,marker=dict(colors=['#100b','#f00560'], line=dict(color='#FFFFFF', width=2.5))))

fig.update_layout(title_text="基于蛋白质含量的食物比较",height=600, width=800)

fig.show()

#十大肉类高蛋白质含量

meats = nutrients[nutrients['Category'].isin(['Fish, Seafood', 'Meat, Poultry'])]

meats_protein=meats.sort_values(by='Protein', ascending= False)

meats_protein=meats_protein.head(10)

fig = go.Figure(go.Pie(values=meats_protein['Protein'].values, text=meats_protein['Food'],marker = {"colors": ['#100b','#f00560'],"line": {"color": '#FFFFFF', "width" : 2.5}}))

fig.update_layout(title_text="高蛋白质含量的肉类",height=500, width=800)

fig.show()

     4.总结

        当今时代,人们越来越关注饮食的营养价值和成分。为了更好地理解食物的营养价值以及如何在饮食中平衡各种营养素的摄入,营养数据可视化成为一种重要的方式。常见的食物营养数据可视化包括热量、脂肪、蛋白质、碳水化合物、纤维素、维生素、矿物质等成分的含量图表。这些图表不仅可以帮助人们了解食物的营养价值,还能够帮助他们调整饮食,以满足身体的需要。通过关于各种食物的营养数据可视化,这使人们更容易地查找并比较食物的营养成分,从而更好地规划饮食。总之,食物营养数据可视化成为了一个重要的工具,可以帮助人们更好地了解食物的营养价值,并提高饮食的质量。

 

全代码:

#导入库并定义用于绘制数据的函数

import pandas as pd

import plotly.express as px

from plotly.subplots import make_subplots

import plotly.graph_objects as go

#读取文档

nutrients=pd.read_csv("D:/文档/python高级应用/nutrients_csvfile.csv")

nutrients.head()

#数据中的 t用0代替。t表示食品中的微量

nutrients = nutrients.replace("t", 0)

nutrients = nutrients.replace("t'", 0)

nutrients.head()

#检查数据集的大小

display(nutrients)

 

#将逗号转换为相应 int 或浮点变量的数字数据

 

nutrients = nutrients.replace(",","", regex=True)

nutrients['Protein'] = nutrients['Protein'].replace("-1","", regex=True)

nutrients['Fiber'] = nutrients['Fiber'].replace("a","", regex=True)

nutrients['Calories'][91] = (8+44)/2

 

 

 

#将克、卡路里、蛋白质、脂肪、饱和脂肪、纤维和碳水化合物数据类型转换为 int

 

nutrients['Grams'] = pd.to_numeric(nutrients['Grams'])

nutrients['Calories'] = pd.to_numeric(nutrients['Calories'])

nutrients['Protein'] = pd.to_numeric(nutrients['Protein'])

nutrients['Fat'] = pd.to_numeric(nutrients['Fat'])

nutrients['Sat.Fat'] = pd.to_numeric(nutrients['Sat.Fat'])

nutrients['Fiber'] = pd.to_numeric(nutrients['Fiber'])

nutrients['Carbs'] = pd.to_numeric(nutrients['Carbs'])

 

 

#检查数据类型改变的结果

nutrients.dtypes

 

#检查数据质量

print(nutrients.isnull().any())

print('-'*245)

print(nutrients.describe())

print('-'*245)

 

#空值清除

nutrients = nutrients.dropna()

display(nutrients)

 

#简化类别

nutrients['Category'] = nutrients['Category'].replace('DrinksAlcohol Beverages', 'Drinks, Alcohol, Beverages', regex=True)

nutrients['Category'] = nutrients['Category'].replace('Fats Oils Shortenings', 'Fats, Oils, Shortenings', regex=True)

nutrients['Category'] = nutrients['Category'].replace('Fish Seafood', 'Fish, Seafood', regex=True)

nutrients['Category'] = nutrients['Category'].replace('Meat Poultry', 'Meat, Poultry', regex=True)

nutrients['Category'] = nutrients['Category'].replace(['Breads cereals fastfoodgrains', 'Seeds and Nuts'], 'Grains', regex=True)

nutrients['Category'] = nutrients['Category'].replace(['Desserts sweets', 'Jams Jellies'], 'Desserts', regex=True)

nutrients['Category'] = nutrients['Category'].replace(['Fruits A-F', 'Fruits G-P', 'Fruits R-Z'], 'Fruits', regex=True)

nutrients['Category'] = nutrients['Category'].replace(['Vegetables A-E', 'Vegetables F-P', 'Vegetables R-Z'], 'Vegetables', regex=True)

#将克、卡路里、蛋白质、脂肪、饱和脂肪、纤维和碳水化合物值转换为每克

nutrients['Calories'] = nutrients['Calories'] / nutrients['Grams']

nutrients['Protein'] = nutrients['Protein'] / nutrients['Grams']

nutrients['Fat'] = nutrients['Fat'] / nutrients['Grams']

nutrients['Sat.Fat'] = nutrients['Sat.Fat'] / nutrients['Grams']

nutrients['Fiber'] = nutrients['Fiber'] / nutrients['Grams']

nutrients['Carbs'] = nutrients['Carbs'] / nutrients['Grams']

#检查最终的数据结果

category_dist = nutrients.groupby(['Category']).mean()

category_dist

 

#所有指标的类别分布

fig = make_subplots(

    rows=2, cols=3,

    specs=[[{"type": "domain"},{"type": "domain"},{"type": "domain"}],

           [{"type": "domain"},{"type": "domain"},{"type": "domain"}]])

fig.add_trace(go.Pie(values=category_dist['Calories'].values,title='CALORIES',labels=category_dist.index,marker=dict(colors=['#100b','#f00560'],line=dict(color='#FFFFFF',width=2.5))),row=1, col=1)

fig.add_trace(go.Pie(values=category_dist['Protein'].values,title='PROTEIN',labels=category_dist.index,marker=dict(colors=['#100b','#f00560'], line=dict(color='#FFFFFF',width=2.5))),row=1, col=2)

fig.add_trace(go.Pie(values=category_dist['Fat'].values,title='FAT',labels=category_dist.index,marker=dict(colors=['#100b','#f00560'], line=dict(color='#FFFFFF', width=2.5))),row=1, col=3)

fig.add_trace(go.Pie(values=category_dist['Sat.Fat'].values,title='SAT.FAT',labels=category_dist.index,marker=dict(colors=['#100b','#f00560'], line=dict(color='#FFFFFF', width=2.5))),row=2, col=1)

fig.add_trace(go.Pie(values=category_dist['Fiber'].values,title='FIBER',labels=category_dist.index,marker=dict(colors=['#100b','#f00560'], line=dict(color='#FFFFFF', width=2.5))),row=2, col=2)

fig.add_trace(go.Pie(values=category_dist['Carbs'].values,title='CARBS',labels=category_dist.index,marker=dict(colors=['#100b','#f00560'], line=dict(color='#FFFFFF', width=2.5))),row=2, col=3)

fig.update_layout(title_text="所有指标的类别分布",height=700, width=1000)

fig.show()

 

#寻找营养成分最高的 20 种食物

calories = nutrients.sort_values(by='Calories', ascending= False)

protein = nutrients.sort_values(by='Protein', ascending= False)

fat = nutrients.sort_values(by='Fat', ascending= False)

sat_fat = nutrients.sort_values(by='Sat.Fat', ascending= False)

fiber = nutrients.sort_values(by='Fiber', ascending= False)

carbs = nutrients.sort_values(by='Carbs', ascending= False)

top_20_calories = calories.head(20)

top_20_protein = protein.head(20)

top_20_fat = fat.head(20)

top_20_sat_fat = sat_fat.head(20)

top_20_fiber = fiber.head(20)

top_20_carbs = carbs.head(20)

#Top 20 Calories

fig = px.bar(top_20_calories, x='Food', y='Calories', color='Calories', title=' Top 20 Calories Rich Foods', template = 'plotly_white')

fig.show()

#Top 20 Protein

fig = px.bar(top_20_protein, x='Food', y='Protein', color='Protein', title=' Top 20 Protein Rich Foods', template = 'plotly_white')

fig.show()

#Top 20 Fat

fig = px.bar(top_20_fat, x='Food', y='Fat', color='Fat', title=' Top 20 Fat Rich Foods', template = 'plotly_white')

fig.show()

#Top 20 Sat.Fat

fig = px.bar(top_20_sat_fat, x='Food', y='Sat.Fat', color='Sat.Fat', title=' Top 20 Sat.Fat Rich Foods', template = 'plotly_white')

fig.show()

#Top 20 Fiber

fig = px.bar(top_20_fiber, x='Food', y='Fiber', color='Fiber', title=' Top 20 Fiber Rich Foods', template = 'plotly_white')

fig.show()

#Top 20 Carbs

fig = px.bar(top_20_carbs, x='Food', y='Carbs', color='Carbs', title=' Top 20 Carbs Rich Foods', template = 'plotly_white')

fig.show()

#脂肪和饱和脂肪之间的关系 

fig  = px.scatter(nutrients, x = 'Fat', y = 'Sat.Fat', trendline = 'lowess', color = 'Fat',color_discrete_map={'Fat':'#cd0000', 'Sat.Fat':'#3399ff'},hover_name='Food' ,template = 'plotly_white', title = '脂肪和饱和脂肪之间的关系')

fig.show()

#基于蛋白质含量的食物比较

Fig =go.Figure(go.Pie(values=category_dist['Protein'].values, text=category_dist.index, labels=category_dist.index,marker=dict(colors=['#100b','#f00560'], line=dict(color='#FFFFFF', width=2.5))))

fig.update_layout(title_text="基于蛋白质含量的食物比较",height=600, width=800)

fig.show()

#十大肉类高蛋白质含量

meats = nutrients[nutrients['Category'].isin(['Fish, Seafood', 'Meat, Poultry'])]

meats_protein=meats.sort_values(by='Protein', ascending= False)

meats_protein=meats_protein.head(10)

fig = go.Figure(go.Pie(values=meats_protein['Protein'].values, text=meats_protein['Food'],marker = {"colors": ['#100b','#f00560'],"line": {"color": '#FFFFFF', "width" : 2.5}}))

fig.update_layout(title_text="高蛋白质含量的肉类",height=500, width=800)

fig.show()

标签:数据分析,Category,20,--,nutrients,Fat,values,fig,营养
From: https://www.cnblogs.com/yaozong/p/17450219.html

相关文章

  • 远程desk工具利用总结
    NO.1Todesk根据目标软件安装情况有以下两种利用方法1.目标机已有完整版todesk。1)改配置文件。老版本可替换至本地查看密码(此法在最近更新的几个版本中已经失效),新版本只可更改密码。改C:\ProgramFiles(x86)\ToDesk下conf.ini文件tempAuthPassEx字段tga5h42db219c2a861e......
  • 算法之二分法、三元表达式、列表生成式、字典生成式(了解)、匿名函数、常见的内置函数
    算法之二分法二分概念二分算法,又称折半查找,即在一个单调有序的集合中查找一个解。每次分为左右两部分,判断解在哪个部分中并调整上下界,直到找到目标元素,每次二分后都将舍弃一半的查找空间。定义and实现:算法就是解决问题的高效办法常见的算法:二分法算法还可以锻炼我们的......
  • ant design vue 下的a-input 使用v-decorator(修改数据)回显
    a-input使用v-decorator回显不应该用v-model,可以使用this.form.setFieldsValue来动态改变表单值。定义form:<template><divclass="main"><a-formid="formLogin"class="user-layout-login"ref="formLogin"......
  • Grafana Query类型模板变量的使用
    一、背景假设我有2种类型的服务器,一种是本地电脑(每个指标名称都存在{nodename=‘mac-local’}),一种是阿里云服务器(每个指标名称都存在{nodename=‘aliyun’}),同时每个指标下都存在一个{instance=‘具体的服务器的ip地址’}标签。即我们采集的时间序列大致上都有如下标签:eg:no......
  • [浅谈] 高斯消元
    \(\color{purple}\text{P3389【模板】高斯消元法}\)所谓高斯消元就是解个\(n\)元一次方程。用矩阵记录每个方程的系数满足第\(i\)个方程:\(a[i][1]x_1+a[i][2]x_2+\dots+a[i][n]x_n=a[i][n+1]\)然后从消元,一个一个项消元,如消除\(i\)项。先选定一个此项系数绝对值最大的......
  • 2023.6.1——软件工程日报
    所花时间(包括上课):6h代码量(行):0行博客量(篇):1篇今天,上午学习,下午学习。我了解到的知识点:1.了解了一些数据库的知识;2.了解了一些python的知识;3.了解了一些英语知识;5.了解了一些Javaweb的知识;4.了解了一些数学建模的知识;6.了解了一些计算机网络的知识;......
  • 每日总结4.23
    看了很多关于文件上传的帖子,感觉写的都很复杂,于是做了一个总结,写个精简版,希望对大家有所帮助。前端选用<inputtype="file">原生组件,实现该组件美化与图片预览功能。前端代码:<!DOCTYPEhtml><html><head><metacharset="UTF-8"><scriptsrc="js/jquery-3.2.1.min.js"......
  • k8s问题解决 - 删除命名空间长时间处于terminating状态
    一行命令解决,注意替换两处待删命名空间字样kubectlgetnamespace"待删命名空间"-ojson\|tr-d"\n"|sed"s/\"finalizers\":\[[^]]\+\]/\"finalizers\":[]/"\|kubectlreplace--raw/api/v1/namespaces/待删命名空间/finali......
  • 0x7A51EF8C (ucrtbased.dll)处(位于 contact.exe 中)引发的异常
    c语言在使用vs提供的scanf_s时  <p>charname[60];<br/>scanf_s("%s",name,60);<br/>printf("%s",name);<br/>return0;</p>debug结果为:0x7A51EF8C(ucrtbased.dll)处(位于contact.exe中)引发的异常:0xC0000005:写......
  • 日志脱敏之后,无法根据信息快速定位怎么办?
    日志脱敏之殇小明同学在一家金融公司上班,为了满足安全监管要求,最近天天忙着做日志脱敏。无意间看到了一篇文章金融用户敏感数据如何优雅地实现脱敏?感觉写的不错,用起来也很方便。不过日志脱敏之后,新的问题就诞生了:日志脱敏之后,很多问题无法定位。比如身份证号日志中看到的是3......