coplot(weight~height|sex,col="blue")#不同性别,体重与身高的散点图>coplot(weight~height|age,col="blue")#不同年龄,体重与身高" />

工作环境仍是linux。
第二章答案:
Ex2.1
x<-c(1,2,3)
y<-c(4,5,6)
e<-c(1,1,1)
z=2*x+y+e
z1=crossprod(x,y)#z1为x1与x2的内积 或者 x%*%y
z2=tcrossprod(x,y)#z1为x1与x2的外积 或者 x%o%y
z;z1;z2
要点:基本的列表赋值方法,内积和外积概念。内积为标量,外积为矩阵。
Ex2.2
A<-matrix(1:20,c(4,5));A
B<-matrix(1:20,nrow=4,byrow=TRUE);B
C=A+B;C
#不存在AB这种写法
E=A*B;E
F<-A[1:3,1:3];F
H<-matrix(c(1,2,4,5),nrow=1);H
#H起过渡作用,不规则的数组下标
G<-B[,H];G
要点:矩阵赋值方法。默认是byrow=FALSE,数据按列放置。
取出部分数据的方法。可以用数组作为数组的下标取出数组元素。
Ex2.3
x<-c(rep(1,times=5),rep(2,times=3),rep(3,times=4),rep(4,times=2));x #或者省略times=,如下面的形式
x<-c(rep(1,5),rep(2,3),rep(3,4),rep(4,2));x
要点:rep()的使用方法。rep(a,b)即将a重复b次
Ex2.4
n <- 5; H<-array(0,dim=c(n,n))
for (i in 1:n){for (j in 1:n){H[i,j]<-1/(i+j-1)}};H
G <- solve(H);G #求H的逆矩阵
ev <- eigen(H);ev #求H的特征值和特征向量
要点:数组初始化;for循环的使用
待解决:如何将很长的命令(如for循环)用几行打出来再执行?每次想换行的时候一按回车就执行了还没打完的命令...
Ex2.5
StudentData<-data.frame(name=c("zhangsangyi"),sex=c("F
要点:数据框的使用
待解决:SSH登陆linux服务器中文显示乱码。此处用英文代替。
Ex2.6
write.table(StudentData,file="studentdata.txt")
#把数据框StudentData在工作目录里输出,输出的文件名为studentdata.txt.
StudentData_a<-read.table("studentdata.txt");StudentData_a
#以数据框的形式读取文档studentdata.txt,存入数据框StudentData_a中。
write.csv(StudentData_a,"studentdata.csv")
#把数据框StudentData_a在工作目录里输出,输出的文件名为studentdata.csv,可用Excel打开.
要点:读写文件。read.table("file")
write.table(Rdata,"file")
read.csv("file")
write.csv(Rdata,"file")
外部文件,不论是待读入或是要写出的,命令中都得加双引号。
Ex2.7
Fun<-function(n){
if(n <= 0)
list(fail="please input a integer above 0!")
else{
repeat{
if(n==1) break
else if(n%%2==0){n<-n/2}
else n<- 3*n+1
}
list("sucess!")
}
}
在linux下新建一个R文件,输入上述代码,保存为"2.7.R"
然后在当前目录下进入R环境,输入source("2.7.R"),即打开了这个程序脚本。
然后就可以执行函数了。
输入Fun(67),显示
"sucess!"
输入Fun(-1),显示
$fail
[1] "please input a integer above 0!"
待解决:source("*.R")是可以理解为载入这个R文件吧?如何在R环境下关闭R文件呢?
OK,自己写的第一个R程序~~
第二章答案:
Ex2.1
x<-c(1,2,3)
y<-c(4,5,6)
e<-c(1,1,1)
z=2*x+y+e
z1=crossprod(x,y)#z1为x1与x2的内积或者 x%*%y
z2=tcrossprod(x,y)#z1为x1与x2的外积或者 x%o%y
z;z1;z2
要点:基本的列表赋值方法,内积和外积概念。内积为标量,外积为矩阵。
Ex2.2
A<-matrix(1:20,c(4,5));A
B<-matrix(1:20,nrow=4,byrow=TRUE);B
C=A+B;C
#不存在AB这种写法
E=A*B;E
F<-A[1:3,1:3];F
H<-matrix(c(1,2,4,5),nrow=1);H
#H起过渡作用,不规则的数组下标
G<-B[,H];G
要点:矩阵赋值方法。默认是byrow=FALSE,数据按列放置。
取出部分数据的方法。可以用数组作为数组的下标取出数组元素。
Ex2.3
x<-c(rep(1,times=5),rep(2,times=3),rep(3,times=4),rep(4,times=2));x #或者省略times=,如下面的形式
x<-c(rep(1,5),rep(2,3),rep(3,4),rep(4,2));x
要点:rep()的使用方法。rep(a,b)即将a重复b次
Ex2.4
n <- 5; H<-array(0,dim=c(n,n))
for (i in 1:n){for (j in 1:n){H[i,j]<-1/(i+j-1)}};H
G <- solve(H);G #求H的逆矩阵
ev <- eigen(H);ev #求H的特征值和特征向量
要点:数组初始化;for循环的使用
待解决:如何将很长的命令(如for循环)用几行打出来再执行?每次想换行的时候一按回车就执行了还没打完的命令...
Ex2.5
StudentData<-data.frame(name=c("zhangsan
要点:数据框的使用
待解决:SSH登陆linux服务器中文显示乱码。此处用英文代替。
Ex2.6
write.table(StudentData,file="studentdata.txt")
#把数据框StudentData在工作目录里输出,输出的文件名为studentdata.txt.
StudentData_a<-read.table("studentdata.txt");StudentData_a
#以数据框的形式读取文档studentdata.txt,存入数据框StudentData_a中。
write.csv(StudentData_a,"studentdata.csv")
#把数据框StudentData_a在工作目录里输出,输出的文件名为studentdata.csv,可用Excel打开.
要点:读写文件。read.table("file")
write.table(Rdata,"file")
read.csv("file")
write.csv(Rdata,"file")
外部文件,不论是待读入或是要写出的,命令中都得加双引号。
Ex2.7
Fun<-function(n){
if(n <= 0)
list(fail="please input a integer above 0!")
else{
repeat{
if(n==1) break
else if(n%%2==0){n<-n/2}
else n<- 3*n+1
}
list("sucess!")
}
}
在linux下新建一个R文件,输入上述代码,保存为"2.7.R"
然后在当前目录下进入R环境,输入source("2.7.R"),即打开了这个程序脚本。
然后就可以执行函数了。
输入Fun(67),显示
"sucess!"
输入Fun(-1),显示
$fail
[1] "please input a integer above 0!"
待解决:source("*.R")是可以理解为载入这个R文件吧?如何在R环境下关闭R文件呢?
Ex3.1
新建txt文件如下:3.1.txt
74.3 79.5 75.0 73.5 75.8 74.0 73.5 67.2 75.8 73.5 78.8 75.6 73.5 75.0 75.8
72.0 79.5 76.5 73.5 79.5 68.8 75.0 78.8 72.0 68.8 76.5 73.5 72.7 75.0 70.4
78.0 78.8 74.3 .3 76.5 74.3 74.7 70.4 72.7 76.5 70.4 72.0 75.8 75.8 70.4
76.5 65.0 77.2 73.5 72.7 80.5 72.0 65.0 80.3 71.2 77.6 76.5 68.8 73.5 77.2
80.5 72.0 74.3 69.7 81.2 67.3 81.6 67.3 72.7 84.3 69.7 74.3 71.2 74.3 75.0
72.0 75.4 67.3 81.6 75.0 71.2 71.2 69.7 73.5 70.4 75.0 72.7 67.3 70.3 76.5
73.5 72.0 68.0 73.5 68.0 74.3 72.7 72.7 74.3 70.4
编写一个函数(程序名为data_outline.R)描述样本的各种描述性统计量。
data_outline<-function(x){
n<-length(x)
m<-mean(x)
v<-var(x)
s<-sd(x)
me<-median(x)
cv<-100*s/m
css<-sum((x-m)^2)
uss<-sum(x^2)
R <- max(x)-min(x)
R1 <-quantile(x,3/4)-quantile(x,1/4)
sm <-s/sqrt(n)
g1 <-n/((n-1)*(n-2))*sum((x-m)^3)/s^3
g2 <-((n*(n+1))/((n-1)*(n-2)*(n-3))*sum((x-m)^4)/s^4-(3*(n-1)^2)/((n-2)*(n-3)))
data.frame(N=n,Mean=m,Var=v,std_dev=s,Median=me,std_mean=sm,CV=cv,CSS=css,USS=uss,R=R,R1=R1,Skewness=g1,Kurtosis=g2,row.names=1)
}
进入R,
source("data_outline.R") #将程序调入内存
serumdata<-scan("3.1.txt");serumdata #将数据读入向量serumdata。
data_outline(serumdata)
结果如下:
N Mean Var std_dev Median std_mean CV CSS USS R
1 100 73.696 15.41675 3.9217 73.5 0.39217 5.327857 1526.258 544636.3 20
R1 Skewness Kurtosis
1 4.6 0.03854249 0.07051809
要点:read.table()用于读表格形式的文件。上述形式的数据由于第七行缺几个数据,故用read.table()不能读入。 scan()可以直接读纯文本文件。scan()和matrix()连用还可以将数据存放成矩阵形式。 X<-matrix(scan("3.1.txt",0),ncol=10,byrow=TRUE) #将上述数据放置成10*10的矩阵。
scan()还可以从屏幕上直接输入数据。
Y<-scan()
然后按提示输入即可。结束输入时按回车即可。
Ex3.2
>hist(serumdata,freq=FALSE,col="purple",border="red",density=3,angle=60,main=paste("the histogram of serumdata"),xlab="age",ylab="frequency")#直方图。col是填充颜色。默认空白。border是边框的颜色,默认前景色。density是在图上画条纹阴影,默认不画。angle是条纹阴影的倾斜角度(逆时针方向),默认45度。main, xlab, ylab是标题,x和y坐标轴名称。
>lines(density(serumdata),col="blue")#密度估计曲线。
>x<-:85
> lines(x,dnorm(x,mean(serumdata),sd(serumdata)),col="green") #正态分布的概率密度曲线
> plot(ecdf(serumdata),verticals=TRUE,do.p=FALSE) #绘制经验分布图
> lines(x,pnorm(x,mean(serumdata),sd(serumdata)),col="blue") #正态经验分布
> qqnorm(serumdata,col="purple") #绘制QQ图
> qqline(serumdata,col="red") #绘制QQ直线
Ex3.3
> stem(serumdata,scale=1) #作茎叶图。原始数据小数点后数值四舍五入。
The decimal point is at the |
| 300
66 | 23333
68 | 00888777
70 | 34444442222
72 | 0000000777777755555555555
74 | 033333333700000004688888
76 | 5555555226
78 | 0888555
80 | 355266
82 |
84 | 3
>boxplot(serumdata,col="lightblue",notch=T) #作箱线图。notch表示带有缺口。
> fivenum(serumdata) #五数总结
[1] .3 71.2 73.5 75.8 84.3
Ex3.4
> shapiro.test(serumdata) #正态性Shapori-Wilk检验方法
Shapiro-Wilk normality test
data: serumdata
W = 0.97, p-value = 0.37
结论:p值>0.05,可认为来自正态分布的总体。
> ks.test(serumdata,"pnorm",mean(serumdata),sd(serumdata)) #Kolmogrov-Smirnov检验,正态性
One-sample Kolmogorov-Smirnov test
data: serumdata
D = 0.0701, p-value = 0.7097
alternative hypothesis: two-sided
Warning message:
In ks.test(serumdata, "pnorm", mean(serumdata), sd(serumdata)) :
cannot compute correct p-values with ties
结论:p值>0.05,可认为来自正态分布的总体。
注意,这里的警告信息,是因为数据中有重复的数值,ks检验要求待检数据时连续的,不允许重复值。
Ex3.5
> y<-c(2,4,3,2,4,7,7,2,2,5,4,5,6,8,5,10,7,12,12,6,6,7,11,6,6,7,9,5,5,10,6,3,10) #输入数据
> f<-factor(c(rep(1,11),rep(2,10),rep(3,12))) #因子分类
> plot(f,y,col="lightgreen") #plot()生成箱线图
> x<-c(2,4,3,2,4,7,7,2,2,5,4)
> y<-c(5,6,8,5,10,7,12,12,6,6)
> z<-c(7,11,6,6,7,9,5,5,10,6,3,10)
> boxplot(x,y,z,names=c("1
结论:第2和第3组没有显著差异。第1组合其他两组有显著差异。
Ex3.6
数据太多,懒得录入。离散图应该用plot即可。
Ex3.7
> studata<-read.table("3.7.txt") #读入数据
> data.frame(studata) #转化为数据框
V1 V2 V3 V4 V5 V6
1 1 alice f 13 56.5 84.0
2 2 becka f 13 65.3 98.0
3 3 gail f 14 .3 90.0
4 4 karen f 12 56.3 77.0
5 5 kathy f 12 59.8 84.5
6 6 mary f 15 66.5 112.0
7 7 sandy f 11 51.3 50.5
8 8 sharon f 15 62.5 112.5
9 9 tammy f 14 62.8 102.5
10 10 alfred m 14 69.0 112.5
11 11 duke m 14 63.5 102.5
12 12 guido m 15 67.0 133.0
13 13 james m 12 57.3 83.0
14 14 jeffery m 13 62.5 84.0
15 15 john m 12 59.0 99.5
16 16 philip m 16 72.0 150.0
17 17 robert m 12 .8 128.0
18 18 thomas m 11 57.5 85.0
19 19 william m 15 66.5 112.0
> names(studata)<-c("stuno
stuno name sex age height weight
1 1 alice f 13 56.5 84.0
2 2 becka f 13 65.3 98.0
3 3 gail f 14 .3 90.0
...
> attach(studata) #将数据框调入内存
> plot(weight~height,col="red") #体重对于身高的散点图
> coplot(weight~height|sex,col="blue") #不同性别,体重与身高的散点图
> coplot(weight~height|age,col="blue") #不同年龄,体重与身高的散点图
> coplot(weight~height|age+sex,col="blue") #不同年龄和性别,体重与身高的散点图
Ex3.8
> x<-seq(-2,3,0.05)
> y<-seq(-1,7,0.05)
> f<-function(x,y) x^4-2*x^2*y+x^2-2*x*y+2*y^2+4.5*x-4*y+4
> z<-outer(x,y,f) #必须做外积运算才能绘出三维图形
> contour(x,y,z,levels=c(0,1,2,3,4,5,10,15,20,30,40,50,60,80,100),col="blue") #二维等值线
> persp(x,y,z,theta=120,phi=0,expand=0.7,col="lightblue") #三位网格曲面
Ex3.9
> attach(studata)
> cor.test(height,weight) #Pearson相关性检验
Pearson's product-moment correlation
data: height and weight
t = 7.5549, df = 17, p-value = 7.887e-07
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.7044314 0.9523101
sample estimates:
cor
0.8777852
由此可见身高和体重是相关的。
Ex4.2
指数分布,λ的极大似然估计是n/sum(Xi)
> x<-c(rep(5,365),rep(15,245),rep(25,150),rep(35,100),rep(45,70),rep(55,45),rep(65,25))
> lamda<-length(x)/sum(x);lamda
[1] 0.05
Ex4.3
Poisson分布P(x=k)=λ^k/k!*e^(-λ)
其均数和方差相等,均为λ,其含义为平均每升水中大肠杆菌个数。
取均值即可。
> x<-c(rep(0,17),rep(1,20),rep(2,10),rep(3,2),rep(4,1))
> mean(x)
[1] 1
平均为1个。
Ex4.4
> obj<-function(x){f<-c(-13+x[1]+((5-x[2])*x[2]-2)*x[2],-29+x[1]+((x[2]+1)*x[2]-14)*x[2]) ;sum(f^2)} #其实我也不知道这是在干什么。所谓的无约束优化问题。
> x0<-c(0.5,-2)
>nlm(obj,x0)
$minimum
[1] 48.98425
$estimate
[1] 11.4127791 -0.68052
$gradient
[1] 1.411401e-08 -1.493206e-07
$code
[1] 1
$iterations
[1] 16
Ex4.5
> x<-c(54,67,68,78,70,66,67,70,65,69)
> t.test(x) #t.test()做单样本正态分布区间估计
One Sample t-test
data: x
t = 35.947, df = 9, p-value = 4.938e-11
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
63.1585 71.15
sample estimates:
mean of x
67.4
平均脉搏点估计为 67.4 ,95%区间估计为 63.1585 71.15 。
> t.test(x,alternative="less",mu=72) #t.test()做单样本正态分布单侧区间估计
One Sample t-test
data: x
t = -2.4534, df = 9, p-value = 0.01828
alternative hypothesis: true mean is less than 72
95 percent confidence interval:
-Inf 70.83705
sample estimates:
mean of x
67.4
p值小于0.05,拒绝原假设,平均脉搏低于常人。
要点:t.test()函数的用法。本例为单样本;可做双边和单侧检验。
Ex4.6
> x<-c(140,137,136,140,145,148,140,135,144,141);x
[1] 140 137 136 140 145 148 140 135 144 141
> y<-c(135,118,115,140,128,131,130,115,131,125);y
[1] 135 118 115 140 128 131 130 115 131 125
> t.test(x,y,var.equal=TRUE)
Two Sample t-test
data: x and y
t = 4.6287, df = 18, p-value = 0.0002087
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
7.53626 20.06374
sample estimates:
mean of x mean of y
140.6 126.8
期望差的95%置信区间为 7.53626 20.06374 。
要点:t.test()可做两正态样本均值差估计。此例认为两样本方差相等。
ps:我怎么觉得这题应该用配对t检验?
Ex4.7
> x<-c(0.143,0.142,0.143,0.137)
> y<-c(0.140,0.142,0.136,0.138,0.140)
> t.test(x,y,var.equal=TRUE)
Two Sample t-test
data: x and y
t = 1.198, df = 7, p-value = 0.2699
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.001996351 0.006096351
sample estimates:
mean of x mean of y
0.14125 0.13920
期望差的95%的区间估计为-0.001996351 0.006096351
Ex4.8
接Ex4.6
> var.test(x,y)
F test to compare two variances
data: x and y
F = 0.2353, num df = 9, denom df = 9, p-value = 0.04229
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.05845276 0.94743902
sample estimates:
ratio of variances
0.2353305
要点:var.test可做两样本方差比的估计。基于此结果可认为方差不等。
因此,在Ex4.6中,计算期望差时应该采取方差不等的参数。
> t.test(x,y)
Welch Two Sample t-test
data: x and y
t = 4.6287, df = 13.014, p-value = 0.0004712
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
7.359713 20.240287
sample estimates:
mean of x mean of y
140.6 126.8
期望差的95%置信区间为 7.359713 20.240287 。
要点:t.test(x,y,var.equal=TRUE)做方差相等的两正态样本的均值差估计
t.test(x,y)做方差不等的两正态样本的均值差估计
Ex4.9
> x<-c(rep(0,7),rep(1,10),rep(2,12),rep(3,8),rep(4,3),rep(5,2))
> n<-length(x)
> tmp<-sd(x)/sqrt(n)*qnorm(1-0.05/2)
> mean(x)
[1] 1.904762
> mean(x)-tmp;mean(x)+tmp
[1] 1.494041
[1] 2.315483
平均呼唤次数为1.9
0.95的置信区间为1.49,2,32
Ex4.10
> x<-c(1067,919,1196,785,1126,936,918,1156,920,948)
> t.test(x,alternative="greater")
One Sample t-test
data: x
t = 23.9693, df = 9, p-value = 9.148e-10
alternative hypothesis: true mean is greater than 0
95 percent confidence interval:
920.8443 Inf
sample estimates:
mean of x
997.1
灯泡平均寿命置信度95%的单侧置信下限为 920.8443
要点:t.test()做单侧置信区间估计
统计建模与R软件第五章习题答案(假设检验)
Ex5.1
> x<-c(220, 188, 162, 230, 145, 160, 238, 188, 247, 113, 126, 245, 1, 231, 256, 183, 190, 158, 224, 175)
> t.test(x,mu=225)
One Sample t-test
data: x
t = -3.4783, df = 19, p-value = 0.002516
alternative hypothesis: true mean is not equal to 225
95 percent confidence interval:
172.3827 211.9173
sample estimates:
mean of x
192.15
原假设:油漆工人的血小板计数与正常成年男子无差异。
备择假设:油漆工人的血小板计数与正常成年男子有差异。
p值小于0.05,拒绝原假设,认为油漆工人的血小板计数与正常成年男子有差异。
上述检验是双边检验。也可采用单边检验。备择假设:油漆工人的血小板计数小于正常成年男子。
> t.test(x,mu=225,alternative="less")
One Sample t-test
data: x
t = -3.4783, df = 19, p-value = 0.001258
alternative hypothesis: true mean is less than 225
95 percent confidence interval:
-Inf 208.4806
sample estimates:
mean of x
192.15
同样可得出油漆工人的血小板计数小于正常成年男子的结论。
Ex5.2
> pnorm(1000,mean(x),sd(x))
[1] 0.5087941
> x
[1] 1067 919 1196 785 1126 936 918 1156 920 948
> pnorm(1000,mean(x),sd(x))
[1] 0.5087941
x<=1000的概率为0.509,故x大于1000的概率为0.491.
要点:pnorm计算正态分布的分布函数。在R软件中,计算值均为下分位点。
Ex5.3
> A<-c(113,120,138,120,100,118,138,123)
> B<-c(138,116,125,136,110,132,130,110)
> t.test(A,B,paired=TRUE)
Paired t-test
data: A and B
t = -0.6513, df = 7, p-value = 0.5357
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-15.628 8.878
sample estimates:
mean of the differences
-3.375
p值大于0.05,接受原假设,两种方法治疗无差异。
Ex5.4
(1)
正态性W检验:
>x<-c(-0.7,-5.6,2,2.8,0.7,3.5,4,5.8,7.1,-0.5,2.5,-1.6,1.7,3,0.4,4.5,4.6,2.5,6,-1.4)
>y<-c(3.7,6.5,5,5.2,0.8,0.2,0.6,3.4,6.6,-1.1,6,3.8,2,1.6,2,2.2,1.2,3.1,1.7,-2)
> shapiro.test(x)
Shapiro-Wilk normality test
data: x
W = 0.9699, p-value = 0.7527
> shapiro.test(y)
Shapiro-Wilk normality test
data: y
W = 0.971, p-value = 0.7754
ks检验:
> ks.test(x,"pnorm",mean(x),sd(x))
One-sample Kolmogorov-Smirnov test
data: x
D = 0.1065, p-value = 0.977
alternative hypothesis: two-sided
Warning message:
In ks.test(x, "pnorm", mean(x), sd(x)) :
cannot compute correct p-values with ties
> ks.test(y,"pnorm",mean(y),sd(y))
One-sample Kolmogorov-Smirnov test
data: y
D = 0.1197, p-value = 0.9368
alternative hypothesis: two-sided
Warning message:
In ks.test(y, "pnorm", mean(y), sd(y)) :
cannot compute correct p-values with ties
pearson拟合优度检验,以x为例。
> sort(x)
[1] -5.6 -1.6 -1.4 -0.7 -0.5 0.4 0.7 1.7 2.0 2.5 2.5 2.8 3.0 3.5 4.0
[16] 4.5 4.6 5.8 6.0 7.1
> x1<-table(cut(x,br=c(-6,-3,0,3,6,9)))
> p<-pnorm(c(-3,0,3,6,9),mean(x),sd(x))
> p
[1] 0.044712 0.24990009 0.62002288 0.90075856 0.98828138
> p<-c(p[1],p[2]-p[1],p[3]-p[2],p[4]-p[3],1-p[4]);p
[1] 0.044712 0.20095298 0.37012278 0.28073568 0.09924144
> chisq.test(x1,p=p)
Chi-squared test for given probabilities
data: x1
X-squared = 0.5639, df = 4, p-value = 0.967
Warning message:
In chisq.test(x1, p = p) : Chi-squared approximation may be incorrect
p值为0.967,接受原假设,x符合正态分布。
(2)
方差相同模型t检验:
> t.test(x,y,var.equal=TRUE)
Two Sample t-test
data: x and y
t = -0.19, df = 38, p-value = 0.5248
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.326179 1.206179
sample estimates:
mean of x mean of y
2.065 2.625
方差不同模型t检验:
> t.test(x,y)
Welch Two Sample t-test
data: x and y
t = -0.19, df = 36.086, p-value = 0.525
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.32926 1.20926
sample estimates:
mean of x mean of y
2.065 2.625
配对t检验:
> t.test(x,y,paired=TRUE)
Paired t-test
data: x and y
t = -0., df = 19, p-value = 0.5257
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.373146 1.253146
sample estimates:
mean of the differences
-0.56
三种检验的结果都显示两组数据均值无差异。
(3)
方差检验:
> var.test(x,y)
F test to compare two variances
data: x and y
F = 1.5984, num df = 19, denom df = 19, p-value = 0.3153
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.6326505 4.0381795
sample estimates:
ratio of variances
1.598361
接受原假设,两组数据方差相同。
Ex5.5
> a <- c(126,125,136,128,123,138,142,116,110,108,115,140)
> b <- c(162,172,177,170,175,152,157,159,160,162)
正态性检验,采用ks检验:
> ks.test(a,"pnorm",mean(a),sd(a))
One-sample Kolmogorov-Smirnov test
data: a
D = 0.14, p-value = 0.9266
alternative hypothesis: two-sided
> ks.test(b,"pnorm",mean(b),sd(b))
One-sample Kolmogorov-Smirnov test
data: b
D = 0.2222, p-value = 0.707
alternative hypothesis: two-sided
Warning message:
In ks.test(b, "pnorm", mean(b), sd(b)) :
cannot compute correct p-values with ties
a和b都服从正态分布。
方差齐性检验:
> var.test(a,b)
F test to compare two variances
data: a and b
F = 1.96, num df = 11, denom df = 9, p-value = 0.3200
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.5021943 7.0488630
sample estimates:
ratio of variances
1.9622
可认为a和b的方差相同。
选用方差相同模型t检验:
> t.test(a,b,var.equal=TRUE)
Two Sample t-test
data: a and b
t = -8.8148, df = 20, p-value = 2.524e-08
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-48.24975 -29.78358
sample estimates:
mean of x mean of y
125.5833 1.6000
可认为两者有差别。
Ex5.6
二项分布总体的假设检验:
> binom.test(57,400,p=0.147)
Exact binomial test
data: 57 and 400
number of successes = 57, number of trials = 400, p-value = 0.8876
alternative hypothesis: true probability of success is not equal to 0.147
95 percent confidence interval:
0.1097477 0.1806511
sample estimates:
probability of success
0.1425
P 值>0.05,故接受原假设,表示调查结果支持该市老年人口的看法
Ex5.7
二项分布总体的假设检验:
> binom.test(178,328,p=0.5,alternative="greater")
Exact binomial test
data: 178 and 328
number of successes = 178, number of trials = 328, p-value = 0.06794
alternative hypothesis: true probability of success is greater than 0.5
95 percent confidence interval:
0.4957616 1.0000000
sample estimates:
probability of success
0.5426829
不能认为这种处理能增加母鸡的比例。
Ex5.8
利用pearson卡方检验是否符合特定分布:
> chisq.test(c(315,101,108,32),p=c(9,3,3,1)/16)
Chi-squared test for given probabilities
data: c(315, 101, 108, 32)
X-squared = 0.47, df = 3, p-value = 0.9254
接受原假设,符合自由组合定律。
Ex5.9
利用pearson卡方检验是否符合泊松分布:
> n<-length(z)
> y<-c(92,68,28,11,1,0)
> x<-0:5
> q<-ppois(x,mean(rep(x,y)));n<-length(y)
> p[1]<-q[1];p[n]=1-q[n-1]
> chisq.test(y,p=p)
Chi-squared test for given probabilities
data: y
X-squared = 2.1596, df = 5, p-value = 0.8267
Warning message:
In chisq.test(y, p = p) : Chi-squared approximation may be incorrect
重新分组,合并频数小于5的组:
> z<-c(92,68,28,12)
> n<-length(z);p<-p[1:n-1];p[n]<-1-q[n-1]
> chisq.test(z,p=p)
Chi-squared test for given probabilities
data: z
X-squared = 0.9113, df = 3, p-value = 0.8227
可认为数据服从泊松分布。
Ex5.10
ks检验 两个分布是否相同:
> x<-c(2.36,3.14,752,3.48,2.76,5.43,6.54,7.41)
> y<-c(4.38,4.25,6.53,3.28,7.21,6.55)
> ks.test(x,y)
Two-sample Kolmogorov-Smirnov test
data: x and y
D = 0.375, p-value = 0.6374
alternative hypothesis: two-sided
Ex5.11
列联数据的性检验:
> x <- c(358,2492,229,2745)
> dim(x)<-c(2,2)
> chisq.test(x)
Pearson's Chi-squared test with Yates' continuity correction
data: x
X-squared = 37.4143, df = 1, p-value = 9.552e-10
P 值<0.05 ,拒绝原假设,有影响。
Ex5.12
列联数据的性检验:
> y
[,1] [,2] [,3]
[1,] 45 12 10
[2,] 46 20 28
[3,] 28 23 30
[4,] 11 12 35
> chisq.test(y)
Pearson's Chi-squared test
data: y
X-squared = 40.401, df = 6, p-value = 3.799e-07
P 值<0.05 ,拒绝原假设,不,有关系。
Ex5.13
因有的格子的频数小于5,故采用fiser确切概率法检验性。
> fisher.test(x)
Fisher's Exact Test for Count Data
data: x
p-value = 0.6372
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.04624382 5.13272210
sample estimates:
odds ratio
0.521271
p值大于0.05,两变量,两种工艺对产品的质量没有影响。
Ex5.14
由于是在相同个体上的两次试验,故采用McNemar检验。
> mcnemar.test(x)
McNemar's Chi-squared test
data: x
McNemar's chi-squared = 2.8561, df = 3, p-value = 0.4144
p值大于0.05,不能认定两种方法测定结果不同。
Ex5.15
符号检验:
H0:中位数>=14.6;
H1: 中位数<14.6
> x<-c(13.32,13.06,14.02,11.86,13.58,13.77,13.51,14.42,14.44,15.43)
> binom.test(sum(x)>14.6,length(x),al="l")
Exact binomial test
data: sum(x) > 14.6 and length(x)
number of successes = 1, number of trials = 10, p-value = 0.01074
alternative hypothesis: true probability of success is less than 0.5
95 percent confidence interval:
0.0000000 0.3941633
sample estimates:
probability of success
0.1
拒绝原假设,中位数小于14.6
Wilcoxon符号秩检验:
> wilcox.test(x,mu=14.6,al="l",exact=F)
Wilcoxon signed rank test with continuity correction
data: x
V = 4.5, p-value = 0.01087
alternative hypothesis: true location is less than 14.6
拒绝原假设,中位数小于14.6
Ex5.16
符号检验法:
> x<-c(48,33,37.5,48,42.5,40,42,36,11.3,22,36,27.3,14.2,32.1,52,38,17.3,20,21,46.1)
> y<-c(37,41,23.4,17,31.5,40,31,36,5.7,11.5,21,6.1,26.5,21.3,44.5,28,22.6,20,11,22.3)
> binom.test(sum(x>y),length(x))
Exact binomial test
data: sum(x > y) and length(x)
number of successes = 14, number of trials = 20, p-value = 0.1153
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.4572108 0.8810684
sample estimates:
probability of success
0.7
接受原假设,无差别。
Wilcoxon符号秩检验:
> wilcox.test(x,y,paired=TRUE,exact=FALSE)
Wilcoxon signed rank test with continuity correction
data: x and y
V = 136, p-value = 0.005191
alternative hypothesis: true location shift is not equal to 0
拒绝原假设,有差别。
Wilcoxon秩和检验:
> wilcox.test(x,y,exact=FALSE)
Wilcoxon rank sum test with continuity correction
data: x and y
W = 274.5, p-value = 0.04524
alternative hypothesis: true location shift is not equal to 0
拒绝原假设,有差别。
正态性检验:
> ks.test(x,"pnorm",mean(x),sd(x))
One-sample Kolmogorov-Smirnov test
data: x
D = 0.1407, p-value = 0.8235
alternative hypothesis: two-sided
Warning message:
In ks.test(x, "pnorm", mean(x), sd(x)) :
cannot compute correct p-values with ties
> ks.test(y,"pnorm",mean(y),sd(y))
One-sample Kolmogorov-Smirnov test
data: y
D = 0.1014, p-value = 0.973
alternative hypothesis: two-sided
两组数据均服从正态分布。
方差齐性检验:
> var.test(x,y)
F test to compare two variances
data: x and y
F = 1.1406, num df = 19, denom df = 19, p-value = 0.7772
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.4514788 2.88176
sample estimates:
ratio of variances
1.140639
可认为两组数据方差相同。
综上,该数据可做t检验。
t检验:
> t.test(x,y,var.equal=TRUE)
Two Sample t-test
data: x and y
t = 2.2428, df = 38, p-value = 0.03082
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.812553 15.877447
sample estimates:
mean of x mean of y
33.215 24.870
拒绝原假设,有差别。
综上所述,Wilcoxon符号秩检验的差异检出能力最强,符号检验的差异检出最弱。
Ex5.17
spearman秩相关检验:
> x<-c(24,17,20,41,52,23,46,18,15,20)
> y<-c(8,1,4,7,9,5,10,3,2,6)
> cor.test(x,y,method="spearman",exact=F)
Spearman's rank correlation rho
data: x and y
S = 9.5282, p-value = 4.536e-05
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.9422536
kendall秩相关检验:
> cor.test(x,y,method="kendall",exact=F)
Kendall's rank correlation tau
data: x and y
z = 3.2329, p-value = 0.001225
alternative hypothesis: true tau is not equal to 0
sample estimates:
tau
0.8090398
二者有关系,呈正相关。
Ex5.18
> x<-rep(1:5,c(0,1,9,7,3));y<-rep(1:5,c(2,2,11,4,1))
> wilcox.test(x,y,exact=F)
Wilcoxon rank sum test with continuity correction
data: x and y
W = 266, p-value = 0.05509
alternative hypothesis: true location shift is not equal to 0
p值大于0.05,不能拒绝原假设,尚不能认为新方法的疗效显著优于原疗法。
