首页 > 其他分享 >ADS R知识总结

ADS R知识总结

时间:2024-05-30 13:21:39浏览次数:20  
标签:总结 shown H1 ADS 知识 ANOVA factor test data

Knit skills

eval=false has no output shown

hist(rnorm(10000), col = 'tomato') # eval

echo=false has no code shown

hist(rnorm(10000), col = 'tomato') # echo

include=false has no output and code shown

hist(rnorm(10000), col = 'tomato') # include

warning=false has no warning shown

library(ggplot2)

error=false has no error shown

#num <- "ADS" + 100

library:: maybe useful when knit fail

#knitr::kable(summary(data))

convert to xelatex:

output: 
  pdf_document:
    latex_engine: xelatex

Data process

import data

data <- read.csv("path", header = TRUE, sep = ",")
# data <- read.table("path) # when process txt
head(data)

basic examination

summary(data)
unique(data$col)
str(data)

examine NA

is.na(data)
# summary(is.na(data))
# sum(is.na(data))
# anyNA(data)
# sum(!complete.cases(data)) # library DMwR

data <- na.omit(data)
# data.dropna(subset=['case_id', 'date_onset', 'age'])

examine duplication

duplicated(data)

data <- distinct(data) # dplyr
# data <- unique(data)

to factor

teeth <- teeth %>% 
	mutate(dose = factor(dose, levels = c(0.5, 1, 2), ordered = T), 
           supp = as.factor(supp)) %>% 
	relocate(supp, dose) str(teeth)

wide to long

library(tidyr)
data <- data.frame( 
	ID = 1:10, 
	Before = c(5, 3, 6, 7, 4, 6, 8, 7, 5, 6), 
	After = c(7, 4, 6, 9, 5, 8, 9, 8, 7, 7))
data_long <- data %>% 
	pivot_longer(cols = c(Before, After), names_to = "Time", values_to = "Value")

long to wide

data_wide <- data_long %>% 
	pivot_wider(names_from = Time, values_from = Value)

Visualization

三分组变量箱线图:

ggplot(data, aes(x = factor_variable, 
				 y = numeric_variable, 
				 fill = group_variable)) + 
	geom_boxplot() + 
	labs(title = "Boxplot of Numeric Variable by Factor and Group Variables", x = "Factor Variable", y = "Numeric Variable") + 
	scale_fill_brewer(palette = "Set3") + 
	theme_minimal() + 
	theme(plot.title = element_text(hjust = 0.5))

Statistics

t-test:

t.test(variable1, variable2, var.equal = TRUE)

u-test:

  • H0: 两组分布相同
  • HA: 两组分布不同
wilcox.test(variable1, variable2)

ANOVA:

  1. Choose: 先说有几组,有几个factor,确定用什么ANOVA(或Kruskal-Wallis test)
anova_result <- aov(variable1 ~ group1 * group2, data = data) 
  1. Justify: Assumptions for a 2-way ANOVA test are:
    1. Independence of observations.
      1. We can assume it at once.
    2. Normality of residuals:
      1. Can be checked visually (plot(anova_model, 2)) or by running a suitable hypothesis test
    3. Equality of variance:
      1. Can be checked visually (plot(anova_model, 1)) or by running a suitable hypothesis test
    4. Equal group size (have to use different types of SS calculation for the ANOVA table if this requirement is violated):
      1. The group size can be noticed in the data diagnosis step
    5. Finally, we can use parameter ANOVA.
  2. statistical hypotheses:
    1. H0: means of different supp groups are the same
    2. H1: means of different supp groups are NOT the same
  3. Carry test:
summary(anova_result)
TukeyHSD(anova_result) 
  1. Suggestions:
    1. 缺乏未治疗组(对照)
    2. 其他生物学意义

Chi-square:

Justify:

  • Chi-square goodness-of-fit test
  • Chi-square test for homogeneity
  • Chi-square test for independence
    Assumptions:
    • The variables must be categorical.
    – Fits.
    • Observations must be independent.
    – Can assume from the task. Fits.
    • Cells in the contingency table are mutually exclusive.
    – Fits.
    • The expected value of cells should be 5 or greater in at least 80% of cells.
    – See Table. Fits.
chisq.test(x = table, p = predict) # goodness-of-fit

Bootstrapping

  • if data are categorical but seriously lacking in independence
first_satisfied <- 864 
first_unsatisfied <- 714 
second_satisfied <- 980 
second_unsatisfied <- 473 
first_bootstraps <- vector() 
second_bootstraps <- vector() 
first_results <- c(rep(1, first_satisfied), rep(0, first_unsatisfied))
second_results <- c(rep(1, second_satisfied), rep(0, second_unsatisfied)) for (a in 1:100) { 
	first_sample <-
		mean(sample(first_results, length(first_results),replace=T))
	second_sample <- 
		mean(sample(second_results,length(second_results),replace=T))

	first_bootstraps<-c(first_bootstraps,first_sample) 
	second_bootstraps<-c(second_bootstraps,second_sample)
}
first_upper<-quantile(first_bootstraps,probs= c(0.975)) 
second_lower<-quantile(second_bootstraps,probs=c(0.025)) 
boxplot( 
	first_bootstraps, 
	second_bootstraps, 
	notch= T, 
	names= c('early', 'late'), 
	ylab='Prop.ofsatisfiedbuttonpresses' )

然后看是否一组的上interval和一组的下interval相交

first_upper < second_lower

Bayes

P(A|B) = P(B|A)*P(A)/P(B)

# compare two hypotheses:
P(H1|D)/P(H2|D) = P(D|H1)/P(D|H2) * P(H1)/P(H2)

# Bayes Factor:
P(D|H1)/P(D|H2)

K-means

visualization

ggplot(data, aes(x = ln_hap1, y = hap2, color = type)) +
# if cluster result then type -> cluster
  geom_point() +
  theme_minimal() +
  labs(title = "xxx") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

clustering

features <- data[, c("hap1", "hap2")]
set.seed(123)
k <- 5
kmeans_result <- kmeans(features, centers = k, nstart = 1)

Finally:

Sys.time()

标签:总结,shown,H1,ADS,知识,ANOVA,factor,test,data
From: https://www.cnblogs.com/Qbio/p/18222152

相关文章

  • css小知识点
    canvas知识点总结:判断画笔是否存在conntctx=canvas.getContext("2d")画矩形,方式一api画方式二路径rect(x,y,w,h)fill()//样式fillStyle("red")//实心填充strokeStyle("#ccc")//空心lineCap="butt"//线端lineWidth=10//线宽fillRect(x,y,w,h)stroke......
  • JS小知识点
    js是单线程的所有的同步任务都是按顺序依次执行的,前面的执行完了之后才会执行后面的任务../上级目录./当前文件夹目录说出==和===的区别普通相等:==在比较类型不相同的情况下,会将运算元先转成Number的值,再进行比较(即会进行隐式转换)严格不等:===在进行比......
  • js事件基础知识
    事件的基础知识事件三要素:事件源:  事件被触发的对象 谁被触发事件类型:如何触发,什么事件例如鼠标点击,鼠标经过,键盘按下等事件处理程序:通过函数赋值的方式完成常用的事件:1)、鼠标事件onclick   当点击鼠标时运行的事件onmousedown  当按下鼠标按钮时运行的事......
  • Multi-Target ADS-B Signal Simulator ADSB-SIM600
    目录IntroductionSpecificationsSystemArchitectureUsagePrecautionsConfigurationSoftwareIntroductionADSB-SIM600isamulti-targetADS-BsignalsimulatorthattransmitsADS-Bsignalsat1090MHz,capableofsimulatingthetransmissionofADS-Bsigna......
  • 前端需要知道的缓存知识总结
    HTTP缓存是一种用于提高网站性能和减少带宽使用的技术。当用户访问一个网页时,浏览器会下载页面上的所有资源(如HTML、CSS、JavaScript等),这些资源会占用大量的带宽和时间。为了减少这些资源的加载时间,HTTP缓存机制被引入。......
  • 日常开发中注意点总结(三)对于分页查询、详情查询总到底哪些字段该回传回来,数据库的回传
    还有个问题,对于分页查询、详情查询这些接口中,到底是哪些字段应该回传给前台,其实还是依赖于前台需要对哪些字段做展示,需要使用哪些字段。一般对于resVo响应实体,都是包含哪些应该返回的字段(前端应该展示的字段),这种的再后面查询数据库的时候,直接查询该展示的字段,这是没有任何异......
  • 使用 Python 总结 excel 工作簿
    我有一个excel工作簿,其中有许多选项卡。每个选项卡都有合并单元格。这是我需要做的,也是我目前所掌握的:1-遍历工作表2-读取工作表数据3-取消合并单元格,将第一个值复制到下面未合并的空单元格中4-按列组合分组,并求和某些列的值5-输出最下面几行的值,这些值是上面几行值的......
  • 日常开发中注意点总结(一)
    对于更新的功能,在开发的时候,需要注意一些内容问题一:举例子:对于应付结算单修改页面中那些不支持修改的内容,我在前台通过点击按钮调用一次更新方法,不能被更新,即使在前台页面中是有限制不能修改,但是如果通过F12,是可以拿到请求url、请求参数的,如果此时如果通过F12,将url请求重发一下......
  • 学完《编辑器扩展精讲》总结
    学完《编辑器扩展精讲》总结思维导图思维导图pos下载结构POS文件下载代码仓库gitee......
  • ctf 总结-赛博朋克 bugku
    今天看了bugku的赛博朋克的题,binwalk分离出了20.zlib。总以为要解压20.zlib。strings了20.zlib什么都没有。找了writeup,原来在binwalk分离的20.png中,要用zsteg自动查看png就能看到text: 原来还有zsteg这种利器,ctf水平高不高,不在于水平如何,要眼界广。脑......