首页 > 其他分享 >【Coursera学习笔记】 Executive Data Science(A Crash Course in Data Science)

【Coursera学习笔记】 Executive Data Science(A Crash Course in Data Science)

时间:2023-06-20 11:34:37浏览次数:38  
标签:What Crash science question data Science Data



文章目录

  • 1.What is statistics good for?
  • 1.1 Statistics
  • 2. What is machine learning ?
  • 2.1 Two main activities of machine learning
  • 2.2 Some characteristics of ML
  • 3. What is Software Engineering for Data Science?
  • 3.1 Types of Software
  • 4. The Structure of a Data Science Project
  • 4.1 Five phases of a data science project
  • 4.2 Two main goals to exploratory data analysis
  • 4.3 There is another approach that can be taken
  • 5. The outputs of a data science experiment
  • 5.1 The type of the output
  • 5.2 a few hallmarks of a good data science report
  • 6. The four secrets of a successful data science experiment
  • 7. Data Scientist Toolbox
  • 8. Separating Hype from Value


1.What is statistics good for?

1.1 Statistics

  1. Descriptive statistics
    Descriptive statistics includes exploratory data analysis, unsupervised learning, clustering and basic data summaries.
  2. Inference
    Inference is the process of making conclusions about populations from samples.
  3. Prediction
    Prediction overlaps quite a bit with inference, but modern prediction tends to have a different mindset.
  4. Experimental Design
    Experimental design is the act of controlling your experimental process to optimize the chance of arriving at sound conclusions.

2. What is machine learning ?

2.1 Two main activities of machine learning

  • Unsupervised learning -trying to uncover unobserved factors in the data.
  • supervised learning

2.2 Some characteristics of ML

  • the emphasis on predictions;
  • evaluating results via prediction performance;
  • having concern for overfitting but not model complexity per se;
  • emphasis on performance;
  • obtaining generalizability through performance on novel datasets;
  • usually no superpopulation model specified;
  • concern over performance and robustness.

3. What is Software Engineering for Data Science?

3.1 Types of Software

  • just some code
  • that you wrote code at all is the fitst step;
  • encapsulating automation with a loop or similar
  • some sort of function
  • first level of abstraction; defuined “interface”
  • software package
  • API + convenience for user

4. The Structure of a Data Science Project

4.1 Five phases of a data science project

  • question
  • exploratory data analysis
  • formal modeling
  • interpretation
  • communication.

4.2 Two main goals to exploratory data analysis

  • are the data suitable for the question?
  • Sketch the solution.

4.3 There is another approach that can be taken

So often there will be a data set available, But, it won’t be immediately clear kind of what the data set will be useful for. So it can be useful to kind of do some exploratory data analysis, to look at the data, to summarize it a little bit, make some plots, and see what’s there. And to generate some interesting questions based on the data. So this is sometimes called hypothesis generating because it kind of produces questions that were already there.


5. The outputs of a data science experiment

5.1 The type of the output

  1. Reports
  2. Presentations
  3. Interactive web pages
  4. Apps

5.2 a few hallmarks of a good data science report

  • Be clearly written
  • Involve a narrative around the data
  • Discuss the creation of the analytic dataset
  • Have concise conclusions
  • Omit unnecessary details
  • Reproducible

6. The four secrets of a successful data science experiment

  1. New knowledge is created.
  2. Decisions or policies are made based on the outcome of the experiment.
  3. A report, presentation or app with impact is created.
  4. It is learned that the data can’t answer the question being asked of it.

7. Data Scientist Toolbox

  • Large scale data sets
  • Hadoop
  • Spark
  • Communicate with others
  • Slack
  • Solve questions
  • Stack Overflow
  • Reproducible or literate ducumentation
  • R Markdown
  • IPython notebooks
  • Build quickly data products
  • Shink

8. Separating Hype from Value

  • What is the question you are trying to answer with the data?
  • Do you have the data to actually answer that question?
  • If you could answer the question, could you use the question?


标签:What,Crash,science,question,data,Science,Data
From: https://blog.51cto.com/u_16165815/6521639

相关文章

  • Databend 开源周报 第 98 期
    Databend是一款现代云数仓。专为弹性和高效设计,为您的大规模分析需求保驾护航。自由且开源。即刻体验云服务:https://app.databend.cn。What'sOnInDatabend探索Databend本周新进展,遇到更贴近你心意的Databend。后台服务Databend的内置存储引擎FuseTable是一种与Ap......
  • DataX介绍及应用实例
    一、DataX简介DataX是阿里巴巴集团内被广泛使用的离线数据同步工具/平台,实现包括MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、TableStore(OTS)、MaxCompute(ODPS)、DRDS等各种异构数据源之间高效的数据同步功能。DataX本身作为数据同步框架,将不同数据源的......
  • doris 报错: Insert has filtered data in strict mode, tracking url=
    最近使用doris插入数据时,报了如下错误: Inserthasfiltereddatainstrictmode,trackingurl=点击trackingurl的连接地址,可以查看报错具体详情我的程序报错时因为插入的数据长度超过字段长度,所以需要修改对应字段长度。通过命令进行修改即可ALTERTABLEmy_tableMODI......
  • Loop or Iterate over all or certain columns of a dataframe in Python-pandas 遍历
    Inthisarticle,wewilldiscusshowtolooporIterateoverallorcertaincolumnsofaDataFrame?Therearevariousmethodstoachievethistask.Let’sfirstcreateaDataframeandseethat: Code:  Python3  #importpandaspackageim......
  • 从pandas dataframe保存csv文件,不带双引号
    为了保存来自pandasdataframe的csv文件,我尝试了以下方法:res.to_csv('seq_test.fa',header=False,index=False,sep='\t',quoting=csv.QUOTE_NONE)复制这给出了以下错误:needtoescape,butnoescapecharset如果我不使用quoting=csv.QUOTE_NONE。我通过以下方式......
  • c盘里的appdata隐藏的显示方法
    按“win+R”,打开运行,输入:controlfolders点击确定,执行命令打开文件夹选项。文件夹选项,选择:查看,去掉【隐藏受保护的操作系统文件】前的对勾,再选中【隐藏文件和文件夹】→【显示所有文件和文件夹】,最后点击确定。找到自己的账户Administrator,双击打开。就能显示了。......
  • ETCD连接报错:database space exceeded
    ETCD连接报错:databasespaceexceeded一:背景此etcd不是k8s集群中的etcd,是kuboard中使用etcd报错,kuboard稳定运行了一年多,上周还正常访问,今天上班访问kuboard报错,然后顺着排查发现kuboard中使用了etcd(之前一直没注意),查看kuboard日志,发现如下报错信息:二:报错分析:大致意思就是kubo......
  • 数据库管理软件-DataGrip 2023 mac/win版
    DataGrip2023是由JetBrains开发的一款全功能数据库管理工具。它旨在提供一个集成的开发环境,方便开发人员管理和操作各种类型的数据库。DataGrip2023支持多种数据库系统,包括MySQL、PostgreSQL、Oracle、SQLServer等。它具有直观的用户界面,使用户能够轻松地连接到数据库服务器,并......
  • Android dataBinding简单的封装
    一、简介本文是databinding使用的简单封装,主要是在基类BaseActivity和BaseFragment中二、具体步骤1.在build.gradle中开启databindingdataBinding{enabled=true}2.在BaseActivity的封装,主要是通过反射的方式获取。如下packagecom.zw.databindingdemo.java;importandroid.o......
  • How to Tell if the I/O of the Database is Slow - 2
    IO的类型:平均响应时间直接关联到具体的IO类型:1.读或写2.单块或多块dbfilesequentialread”,表明正在等待需要的块。dbfilescatteredread”,表明正在等待需要的块。3.同步或异步    同步(阻塞)操作等待硬件完成物理IO,完成后能得到通知,合理地管理操作的成功或失败(成......