首页 > 其他分享 >INFS5710 Information Business Analytics

INFS5710 Information Business Analytics

时间:2024-07-19 15:51:33浏览次数:13  
标签:Information Business INFS5710 analysis project bike weather data your

INFS5710 Information Technology Infrastructure for Business Analytics

Project Statement

(Due by noon 12 PM on Monday 29 July 2024 via Moodle)

• This project accounts for 25% of the total marks for this course.

• The deliverables area project report (pdf) and a PowerPoint file.

Bike sharing has become increasingly popular across the globe. Today, such programs operate in more than 1,000 cities, with more than half a million bicycles in use. The principle of bike sharing is simple: individuals use bicycles on an as-needed basis without the costs and responsibilities of bike ownership. It is short-term bicycle access, which provides its users with an environmentally friendly form of public transportation. This flexible scheme targets daily mobility and allows users to access public bicycles at unattended bike stations; bicycle reservations, pickup, and drop-off are all self-service. Commonly concentrated in urban settings, bike sharing programs also provide multiple bike station locations that enable users to pick up and return bicycles to different stations.

This project is about theMetro Bike Share(Metro) in the metropolitan area of Los Angeles (LA), a large city in the US with a population of several million.  You area business consultant working for the bike-sharing program.

Bike-sharing data

Your manager just referred you to download historical bike-sharing data from the following site:

https://bikeshare.metro.net/about/data/, which contains data of millions bike trips from July 2016 Q3 - 2024 Q1. Since the data come from the US, please be aware of the difference in dateformats between the US (mm/dd/yyyy) and Australia (dd/mm/yyyy).

You can find the locations of bike stations in the GPS coordinate system. For example, the coordinate of astation is (x, y), where x is the longitude coordinate andy is the latitude coordinate. The following link helps you to understand more about the GPS coordinate system :https://www.ubergizmo.com/how-to/read-gps-coordinates/. If you want to locate a place on Google Map by its latitude and longitude, you can also do it. For details, see the following linkhttps://support.google.com/maps/answer/18539. If you are interested in estimating the distance traveled for a ride, assuming that a bike rental starts   from (x1, y1) and ends at (x2, y2), it is recommended that you estimate it using the so-called taxicab distance, which is |x1  − x2 | + |y1  − y2 |. See the following figure for interpretation. For more information, please seehttps://study.com/academy/lesson/taxicab-geometry-history-formula.html. Note that whether the distance is in degrees (without any conversion from longitude-latitude coordinates), miles, or kms, your findings, interpretations or insights should not change. It is recommended that you just quote the distances in degrees.

Weather data

Weather plays an important role when people decide whether or not to use bike-sharing. You are required to explore the relationship between weather (e.g., temperature, wind speed and humidity) and the bike-sharing rentals in this project. There are two known ways to download free historical weather data. The first way is to manually capture weather data month by month from Weather Underground

(wunderground.com).

•    First visithttps://www.wunderground.com/and try to search the weather condition in LA. (There are other locations that you may also try where there are many bike stations as well.)

•    You will be led to the site of a nearby weather station, which may be different from time to time. For example, consider the following site about Los Angeles (at some station):

https://www.wunderground.com/weather/us/ca/los-angeles

•    Click the History tab on the page, and then choose to view Monthly weather data. Once you choose a month, click View.  For example, the following link shows the weather data of June 2020 measured at the Burbank station (near Los Angeles):

https://www.wunderground.com/history/monthly/us/ca/burbank/KBUR/date/2020-6

•    Scroll down the page,and you will see the table of Daily Observations. Use your mouse to copy the table and paste it to an Excel spreadsheet.

•    Copy only the data required,i.e., July 2016 Q3 - 2024 Q1, for this project.

Another way, as suggested by a former student, is to download weather data from NOAA. Try:

https://www.ncdc.noaa.gov/cdo-web/search. Search for "Daily Summaries" at relevant weather stations for a time period then "Add to Cart" - NOTE that this is a free service, but you'll have to type in an email   address so that you can get the data download link once it processes.

Holiday data

Another factor that influences bike-sharing rentals is holidays. You can easily search the dates of the US federal holidays and/or California state holidays each year.

The Task

Your manager asked you to collect and analyze the data to “let the data speak.”  You understand that the company wants to further grow the market and attract more users. To do so, having business insights from the data is crucial.

In this project, you are expected to manage and clean the data collected; some of them may contain missing data, different formatting, and incomplete information. The goal is to overcome such obstacles commonly encountered when dealing with real data, while deriving business insights from the datasets that can be used to promote Metro bike-sharing business.

Borrowing the terms from Data Warehouse (to be covered in Week 6), the following are some “dimensions” for the analysis in this project: station(s), time (including holidays), weather, membership, region, and bike-type. There is one obvious “measure” in this context, which is the bike use, the number of rides,or the demand. We define an “analysis topic” as one that studies how a measure changes according to one or multiple dimensions. For example, you may study the daily demand pattern and how it has changed over the past 10 years, under different weather conditions and/or whether the day is a holiday. In this example analysis topic, weather data and holiday data are utilized. Note that you need to make sure that your analysis topic must be meaningful.

For this project, you are expected to choose no more than three analysis topics to study. It is preferable that you study one topic in depth, rather than multiple ones superficially. There are two constraints for your study: (i) You must conduct achronological analysifor each topic. That is, one dimension must be the time horizon from the distant past to the recent past (at least across multiple years). For example, the introduction of motor bikes in late 2020 and the COVID pandemic must have impacted the customers’ demand for bike sharing. Their impact can only be seen from chronological analysis. (ii) You must utilize the weather and holiday data in your study. You do not need to use both in each analysis topic. But ultimately, each of them must be used in some of your analysis topics. Utilizing the regional data is optional; but doing so may help you receive a higher mark to reward your additional effort.

You are expected to use SAS Enterprise Guide (EG) for this project. To begin with the ETL (extract, transform. and loading) process, you need to prepare your data in proper tables that will go into SAS, like how you uploaded CSV data files to SAS in your Homework 1. That is, you need to create tables in the SAS environment.

Whenever you want to conduct an analysis, you must write a query to select relevant attributes by properly joining multiple tables to obtain a resultant table for a specific analysis. Please properly document your queries in the appendix of your report. The Appendix of this statement shows you some common data analysis and visualization functions of SAS EG. More features of SAS EG will be introduced  in a tutorial session later. (Note: it is possible that you may not be able to plot your desired graphs using  SAS EG. If necessary, you may use other software such as Excel for graphing.)

Finally, please note that the management (or the LIC) does not know anything beyond this project statement. Therefore, you need to use your own judgement and make necessary and reasonable assumptions when doing this project. Make sure to document all assumptions made in your report.

Project Deliverable

There are two required deliverables for this project: a report of no more than 10 pages (in pdf format) and a PowerPoint file. Their suggested structures are detailed below.

Report structure

The following is a suggested structure of your report:

•    Cover page for your group project, including group number, e.g., T15A Group 2,and each member’s name and photo (not included in the 10-page limit).

•    Fill out a table that contains summary information of your analysis topics. The following is merely an example (in red):

Summary of Analyses

 

No.

 

Topic Description

Chronological Analysis

Weather Analysis

Holiday Analysis

Note (e.g., special

efforts that you want to let the marker know)

1

We study the weekly demand

pattern and how it changes over the past 10 years, under different weather conditions.

 

 

2

We study how different holidays

influence the demand over the past 10 years

 

Our result is

unique, as special

efforts were spent

on xxxx.

3

N/A

 

 

 

 

o Column 1, No.: no more than 3 topics should be presented.

o Column 2, Topic Description: briefly describe what you do in this topic

o Columns 3 - 6: tick if the corresponding analysis is involved in your topic

o Column 7, Note: If you have anything that you want the marker to know (e.g., special efforts to gain additional marks), please write here.

• Introduction

•    Detail how you prepare the data for analysis. This includes how you clean data, manage missing information, and how you organize tables that go into SAS.

•    Detail how you perform data analysis using SAS Enterprise Guide (referring to the queries that you used for analysis in the appendix). It is suggested that this section contains subsections such that each subsection presents an analysis topic, the corresponding research question (e.g., what station is most popular?) and the analysis result.

•    Business insights and recommendation

•    An appendix that contains the queries used or functions from Enterprise Guide in your project.

The appendix has no page limits.

Slide structure

•   You will need to make a presentation in your tutorial session in Week 10. Your presentation should be limited to 8 minutes with 4 static slides that contain no animations or'movement'of any

description. These 4 slides should have the following headings: “Group Information” (1st slide

including group number, e.g., T15A Group 2, and each member’s name and photo), “Major findings – I”, “Major findings – II”, and “Major findings – III” (2nd – 4th slides).

•   Name your PPT file using your group number,e.g., T15A_Group2.pptx, and upload it via Moodle by the due time.

Marking guideline

Item (%)

Description

Data preparation (25%)

Do you properly manage missing data?

Do you properly preprocess the tables used by SAS?

Quality of the data analysis (40%)

Are your analysis topics interesting and not trivial?

Are your analysis topics meaningful to the Metro business?

Have you properly analyzed the data with the right functions or steps? Have you provided proper data visualization (for example, table or

graph) to present and support your analysis?

Are there special efforts invested in processing or analyzing some data?

Quality of business

insights obtained and recommendation (25%)

Do you obtain business insights from the data? Are your obtained insights helpful for business?

Do you provide proper recommendations to make use of the obtained insights?

Presentation and

recording quality (10%)

Is your presentation clear and effective to professional standards?

Total (100%)

 

Appendix: Using Enterprise Guide for Data Analysis and Visualization

Given a data file opened in SAS Enterprise Guide, you can see some analysis and visualization functions available (from the tool bar below).

 

Most functions are straightforward to use. Graphs can be found under Graph; some useful analysis tools can be found under Analyze in the tool bar. You are expected to try them by yourself.

Note that the data visualization functions only apply to a SAS data file only.  When you write a query,

before you can graph the table of the query outcome, you need to save the result table as a SAS data file using “create” statement, which has been introduced previously.

Graphing:

Line Chart

 

Bar Chart

 

Histogram

If you are not familiar with the concept of histogram, please read the following site abouthistogram. To plot a histogram, choose Bar Chart Wizard. In Step 2 out of 4, choose Percentage for the Bar height.

Correlation Analysis

You may plot a 2D scatter chart first for the two variables that you want to study their correlation.

 

If a correlation is revealed from the scatter chart, you may also calculate the exact correlation between these two variables. Assume these two variables are “Amount” and “Visits”. The following figures show how their correlation can be calculated.

 

 

Drag Amount and Visits from the left pane to the right pane.

 

标签:Information,Business,INFS5710,analysis,project,bike,weather,data,your
From: https://www.cnblogs.com/qq-99515681/p/18311630

相关文章

  • Linux PSI--Pressure Stall Information
    Google在在Android11及之后版本的LMKD中,使用了psi作为杀进程的策略,本文简单介绍下psi。转载自使用PSI(PressureStallInformation)监控服务器资源_Linux_gameneedless_InfoQ写作社区1.概述当CPU、内存或IO设备争夺激烈的时候,系统会出现负载的延迟峰值、吞吐量下降,并可能触发......
  • [1036] Extracting hyperlink information from an Excel file
    Certainly!ExtractinghyperlinkinformationfromanExcelfile(specifically.xlsxformat)inPythoncanbedoneusingtheopenpyxllibrary.Let’sdiverightin:Usingopenpyxl:First,makesureyouhavetheopenpyxllibraryinstalled.Ifnot,youcan......
  • TikTok for Business 发布《短剧出海营销白皮书》 | 最新快讯
    7月12日消息,TikTokforBusiness发布《2024短剧出海营销白皮书》(以下简称白皮书),从海外用户洞察、入局指南、爆剧制作、投放推广、业务撮合等多个维度,帮助厂商理解行业发展现状,降低爆剧制作门槛、提升投资回报。白皮书显示,目前,出海短剧产业链主要分为内容生产、......
  • GOLLIE : ANNOTATION GUIDELINES IMPROVE ZERO-SHOT INFORMATION-EXTRACTION
    文章目录题目摘要引言方法实验消融题目Gollie:注释指南改进零样本信息提取论文地址:https://arxiv.org/abs/2310.03668摘要    大型语言模型(LLM)与指令调优相结合,在泛化到未见过的任务时取得了重大进展。然而,它们在信息提取(IE)方面不太成功,落后于特定任......
  • 三星 NAND FLASH命名规范 Samsung NAND Flash Code Information
    一共有三页,介绍了前面主要的编号和横杠后面的编号,当前文档只关注前面的编号。从前面的命名规范中可以得知当前芯片的容量、技术等概要信息,对芯片有一个整体了解。详细解释SmallClassification表示存储单元的类型和应用,比如SLC1ChipXDCard表示是SLC的,包含1个Chip的XD......
  • 2024 4th International Conference on Electronic Information Engineering and Comp
    20244thInternationalConferenceonElectronicInformationEngineeringandComputerTechnologyhttp://www.eiect.org/截稿日期:2024-10-17通知日期:2024-10-21会议日期:2024-11-9会议地点:Shenzhen,China届数:4Withtherapiddevelopmentofscienceandtechnolog......
  • IIS(Internet Information Services)是Windows操作系统中的一种Web服务器软件。以下是一
    IIS(InternetInformationServices)是Windows操作系统中的一种Web服务器软件。以下是一些常见的与IIS相关的命令和工具:IISRESET:作用:重启IIS服务。语法:iisreset[/noforce][/restart][/stop][/start][/status][/reboot][/help]APPCMD:作用:用于配置IIS7.x及以上版......
  • IISRESET 是用于重启 Microsoft Internet Information Services(IIS)的命令行工具。它通
    IISRESET命令起源于Microsoft开发的InternetInformationServices(IIS),这是一种用于Windows操作系统的强大的Web服务器软件。IIS早在WindowsNT3.51的时候就已经存在,而IISRESET命令则是作为管理和操作IIS服务的一部分而引入的。具体来说,IISRESET命令的主要功能是......
  • Boosting Weakly-Supervised Temporal Action Localization with Text Information
    标题:利用文本信息增强弱监督时间动作定位源文链接:https://openaccess.thecvf.com/content/CVPR2023/papers/Li_Boosting_Weakly-Supervised_Temporal_Action_Localization_With_Text_Information_CVPR_2023_paper.pdfhttps://openaccess.thecvf.com/content/CVPR2023/papers/......
  • 论文阅读:《Chinese Relation Extraction with Multi-Grained Information and Externa
    LiZ,DingN,LiuZ,etal.Chineserelationextractionwithmulti-grainedinformationandexternallinguisticknowledge[C]//Proceedingsofthe57thAnnualMeetingoftheAssociationforComputationalLinguistics.2019:4377-4386.该方法的github实现引言针......