Complete the following exercises using the code discussed during computer lab. Save your work in an R script as well as a Word document containing the necessary output and comments. Be sure to use notes in the script to justify any computations. If you have any questions, do not hesitate to ask
1 Probability Distributions
- Generate four vectors with binomial random numbers of sample sizes of 10, 100, 1000, and 10000 using n = 50, p = 0.4, and seed 5. Find the mean and standard deviation of each of these vectors and compare them to the theoretical mean and standard deviation. What do you see as n increases?
- Generate the same four vectors as in the previous exercise. Print out four histograms to graphically represent the data. What distribution do the histograms appear to be approaching as n increases
- Generate a vector of 1000 random numbers from a χ 2 distribution with 8 degrees of freedom using seed 100. Find the five number summary, mean, and standard deviation. Represent the vector graphically using a probability histogram with pdf overlayed on the same graph. Assess the normality of the sample data
- . Clearly the sample data from the F distribution generated earlier in this chapter was not normal. To assess the fit of a random variable to the proper distribution, one uses a Quantile Quantile plot. Using seed 1, generate 300 random numbers from an F distribution with 5 and 10 degrees of freedom. Create a QQ plot by finding the theoretical sample quantiles of F as well as the sample quantiles of the random data. Plot these vectors to see if the random number generator is indeed providing sample data from an F distribution. Hint: This problem requires the use of the quantile() function to find the sample quantiles of a data set.
- Find the following probabilities: (a) P(B = 5) where B ∼ Binom(12, 0.6) (b) P(B ≥ 5) where B ∼ Binom(12, 0.6) (c) P(Z < 1.12) where Z ∼ N(0, 1) (d) P(6.5 < X) where X ∼ N(7, 4) (e) P(−1021 < t < −664) where t ∼ t(1) (f) P(t > 1.96) where t ∼ t(500)
- Find the following quantiles: (a) 30th quantile for Z ∼ N(0, 1) (b) 30th quantile for X ∼ N(7, 4) (c) 95th quantile for t ∼ t(1
- (d) 95th quantile for t ∼ t(500) (e) Q1, Q2, and Q3 for F(5, 10).
2 Representing Categorical Data
A rehabilitation study for cocaine users included administering two drugs and a placebo to determine effectiveness. There were 24 subjects in each group. Fourteen of the users given Desipramine relapsed, 18 of the uses given Lithium relapsed and 20 of the placebo group relapsed. Create two tables, one containing the counts and the other containing the marginal distributions for each drug. Print the tables and represent the data graphically. Use a bar graph with bars for both outcomes as well as two pie charts, one for each outcome.
3 Exploratory Data Analysis
- Using the ‘datasets’ library in R, save the mtcars data set as cars matrix. Find the summary statistics of the mpg column, as well as a boxplot. Create a boxplot of the mpg column by the cylinder column. The output should have three plots on the same set of axis. Summarize the boxplot in words.
- Refer to the previous exercise. Check the normality of the mpg column. Perform pairwise hypothesis tests to determine if the average mpg differs depending on the number of cylinders. Use both methods discussed in this section. Based on the normality assessment, which testing method should be used
Day 1a Lab Activities - 解答
Probability Distributions
1.
Sample Size | µ | σ | s | |
10 | 20 | 3.4641 | 20.5 | 3.6591 |
100 | 20 | 3.4641 | 20.24 | 3.5337 |
1000 | 20 | 3.4641 | 19.994 | 3.4912 |
10000 | 20 | 3.4641 | 20.0253 | 3.4795 |
As n increases, the standard deviation approaches the true standard deviation. The mean also approaches the true mean, but this happens with a much smaller sample size than what is needed for the standard deviation.
2. The histograms appear to be approaching a normal distribution with mean 20.
3. Min. 1st Qu. Median Mean 3rd Qu. Max. St. Dev
0.7603 4.8880 7.4250 7.9070 10.1300 26.2100 3.942555
The normal quantile plot is not linear, therefore, the data is not normal.
4. The random numbers appear to be an F distribution with the exception of the 7 largest numbers.
5. a) 0.1009024
b) 0.8417877
c) 0.8686431
d) 0.5497382
e) 0.0001676192
f) 0.02527539
6. a) -0.5244005
b) 4.902398
c) 6.313752
d) 1.647907
e) 0.5291417, 0.9319332, 1.5853233
Representing Categorical Data
`1. Count Data
Desipramine Lithium Placebo
Yes 14 18 20
No 10 6 4
Marginal Distributions
Desipramine Lithium Placebo
Yes 0.5833333 0.75 0.8333333
No 0.4166667 0.25 0.1666667`