1. Vitamin C and tooth growth
Lack of vitamin C leads to severe health issues. It is not produced in the human body and must be supplied with food. At the same time, personnel that have limited access to fresh vegetables (sailors, spacemen, travelers, etc) may suffer from the insufficiency of this compound in their food. Thus, a vitamin C formulation that can preserve its properties for a long time is of great need.
Researchers developed such a formulation. In vitro tests showed its efficiency. Now, they performed an in vivo trial. Guinea pigs received the newly developed formulation of Vitamin C or fresh orange juice (normalized according to the concentration of vitamin C) in addition to their standard diet (supp
). Each type of additives included three concentrations (dose
) of vitamin C: 0.5, 1, and 2 mg/ml. The measured outcome is the tooth length (len
) in mm (stem cells that become teeth are sensitive to vitamin C).
Import, check, and organize the data appropriately. Reformat columns if needed.
library(dplyr)
teeth <- read.csv("teeth.csv")
teeth <- teeth %>%
mutate(dose = factor(dose, levels=c(0.5,1,2),ordered = T),
supp = as.factor(supp)) %>%
relocate(supp,dose)
summary(teeth) # check the data
head(teeth)
unique(teeth$dose)
unique(teeth$supp)
Plot the data in a useful way.
library(ggplot2)
# 假设 teeth 是已经导入的数据框
p1 <- ggplot(teeth, aes(y = len, x = supp, fill = factor(dose))) +
geom_boxplot(width = 0.5, alpha = 0.5) + # 绘制箱型图
geom_point(position = position_jitter(width = 0.25), alpha = 0.75, size = 0.5) + # 绘制散点图,位置略有偏移以避免重叠
labs(title = "Tooth Length by Supplement Type and Dose",
x = "Supplement Type", y = "Tooth Length (mm)") +
theme_minimal() +
facet_wrap(~dose)
p1
Choose, justify, state the statistical hypotheses, and carry out an appropriate test to answer whether the vitamin C formula is useful.
Use two way ANOVA
H0: Means of different supp groups are the same
H1: Means of different supp groups are NOT the same
aov_model <- aov(len ~ supp * dose, data = teeth)
# Normality of residuals
plot(aov_model,2)
# Equality of variance
plot(aov_model,1)
summary(aov_model)
posthoc_res <- TukeyHSD(aov_model)
posthoc_res
Present and discuss your results. Is this novel formula useful? What would you suggest doing next?
That means the formulations and their concentrations have different effects on tooth growth.
new formulation generally worth than OJ(~1.7mm,p<0.0001)
\newpage
2. Mutation and survival
You work on the mutation of a certain gene (Gene_X) that likely causes developmental abnormalities in humans but is quite rare, and the precise role of the mutation is not known. You created a mouse model by introducing a similar mutation in a similar location within the murine genome.
You set several breeding pairs and crossed mice as Gene_XWT/mut $\times$ Gene_XWT/mut. You recorded the genotype of the newborn mice. Your genotyping record (genotype.csv
) includes mouse_ID
, birth date (BD
), sex
, and genotype
.
Answer the questions below, provide your analysis, and explain your results. Given the genotyping records you got, what can you say about the studied mutation?
Import and organize the data.
genotype <- read.csv("genotype.csv")
genotype <- genotype %>% mutate(genotype= as.factor(genotype), sex = as.factor(sex))
head(genotype)
anyNA(genotype)
anyDuplicated(genotype)
summary(genotype)
unique(genotype$genotype)
Describe the data in a useful way.
p2 <- ggplot(genotype,aes(x = genotype, fill = sex)) +
geom_bar(position = "stack")
p2
What would you expect under Mendelian inheritance?
The distribution expected is 1:2:1 and female and male is 1:1.
tab_o <- table(genotype$sex, genotype$genotype)
tab_e <- matrix(c(1/4, 1/8, 1/8, 1/4, 1/8, 1/8), nrow = 2, byrow = TRUE)
tab_o
colnames(tab_e) <- c("WT","het","mut")
rownames(tab_e) <- c("Female","Male")
chi_res <- chisq.test(tab_o,p = tab_e)
chi_res
Choose and justify the appropriate statistical test, state the statistical hypotheses, and carry the test out an appropriate test on whether the mutation affects the survival of mice.
Present and discuss your results. What would you suggest doing next?
\newpage
3. Coffee shop opening hours
A new coffee shop has opened on campus. Hooray! Coffee shops are normally open from 6am-5pm but the owners are aware that students often sleep later than other members of the society. After being open for one month, they run a month-long trial opening 10am-9pm to see if students prefer these times. They leave an iPad at the serving counter where customers can record if they are 'satisfied' or 'unsatisfied' with the opening times.
During the 6am-5pm opening times, the iPad records 864 presses of the 'satisfied' button by customers and 714 presses of the 'unsatisfied' button. When they change these times to 10am-9pm, they receive 980 'satisfied' pressed and 473 'unsatisfied'.
What would be a suitable statistical test for these data and why?
early_s <- 864
early_u <- 714
late_s <- 980
late_u <- 473
early_bootstrap <- vector()
late_bootstrap <- vector()
a_res <- c(rep(1,early_s),rep(0,early_u))
b_res <- c(rep(1,early_u),rep(0,early_u))
for (i in 1:1000) {
a_sample <- mean(sample(a_res,length(a_res),replace = T))
b_sample <- mean(sample(b_res,length(b_res),replace = T))
early_bootstrap <- c(early_bootstrap,a_sample)
late_bootstrap <- c(late_bootstrap,b_sample)
}
first_upper <- quantile(early_bootstrap,probs = c(0.975))
boxplot(early_bootstrap,late_bootstrap)