IS3240-A1 October 12, 2024 IS3240 Homework Assigment #1
1 Q1: Numpy Basics [15%]
- Import Numpy and Pandas [1%]
- Create a random number generator, with seed being 99 [2%]
- Generate a Numpy array (name it as data) with three dimensions in the shape of [10, 4].All values should be random integers between 0 and 100 (inclusive). [7%]
- Sort the Numpy array based on the 2nd dimension, from the largest to the smallest. [5%]
2 Q2: Pandas Basics [25%]
- Read the data HKO.csv (data source: https://www.hko.gov.hk/en/abouthko/opendata_intro.htm)[2%]
- Make the first three columns all integers.
- Create a new column date using pd.to_datetime so that it contains the year/month/day.(You need to combine values from three separate columns). [8%]
- When was the coldest? When was the hottest? [4%]
- Create a new column season so that Mar-May should be labelled as Spring; Jun-Aug asSummer; Sep-Nov as Fall; Dec-Feb as Winter. [6%]
- Compute the average values and standard deviations for the four seasons. [5%]
3 Q3: Finding News on Tesla using APIs [60%] You are a financial analyst. You are asked by your manager to identify 50 news articles relatedto Tesla in NY Times. Please use Article Search API for this purpose. Specifically, the followinginformation is needed.
- pub_date: The publish time of this article. Turn this into datetime data type.1• abstract: The abstract of this article
- lead_paragraph: The lead paragraph of this article
- web_url: the hyperlink pointing to the article information (e.g., link)Please acheive the following.
- Collect 50 news articles. [25%]
- Which one is the most recent news article? Show the abstract and date of that article. [5%]
- Find out the new articles which have the longest/shortest abstracts respectively. [10%]
- Across all the news abstracts, find the the top 10 frequent words. [20%]
- HINT: You need to combine all abstracts, remove all punctuations, and split the textby space to get a list of words.2