
⏹应变量Y是一个二值变量,取值为0和1
⏹自变量X1,X2,……,Xm。
⏹P表示在m个自变量作用下事件发生的概率。
图像:
程序:
data ceshi;
input x1-x18 y;
cards;
……
;
proc logistic des;
model y=x1-x18/selection=stepwise;
run;
例:
三种药物drug取值0-2, 病情程度degree 分重-轻两类(0-1);因变量response为治疗效果的效与无效(1-0)
Data ex12_1;
Input drug degree response count;
Datalines;
0 1 1 38
0 1 0
0 0 1 10
0 0 0 82
1 1 1 95
1 1 0 18
1 0 1 50
1 0 0 35
2 1 1 88
2 1 0 26
2 0 1 34
2 0 0 37
;
Proc logistic data=ex12_1 descending;
Freq count;
Class drug/param=ref descending;
Model response=drug degree/rsq scale=n aggregate;
Run;
Rsq显示R2
Scale, SCALE= specifies method to correct overdispersion,指定参数,=n表示不需要修正。
Aggregate计算卡方检验统计量
Class 语句将分类变量化成虚拟变量,三种药用两个虚拟变量表示。
The LOGISTIC Procedure
Model Information
Data Set WORK.EX12_1
Response Variable response
Number of Response Levels 2
Frequency Variable count
Model binary logit
Optimization Technique Fisher's scoring
Number of Observations Read 12
Number of Observations Used 12
Sum of Frequencies Read 577
Sum of Frequencies Used 577
Response Profile
Ordered Total
Value response Frequency
1 1 315
2 0 262
Probability modeled is response=1.
Class Level Information
Design
Class Value Variables
drug 2 1 0
1 0 1
0 0 0
Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
Deviance and Pearson Goodness-of-Fit Statistics
Criterion Value DF Value/DF Pr > ChiSq
Deviance 0.3749 2 0.1874 0.8291
Pearson 0.36 2 0.1844 0.8316
模型拟合集优度检验,
Number of unique profiles: 6
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 797.017 1.326
SC 801.375 658.757
-2 Log L 795.017 633.326
R-Square 0.2444 Max-rescaled R-Square 0.3268
The LOGISTIC Procedure
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 161.6907 3 <.0001
Score 148.1598 3 <.0001
Wald 118.1394 3 <.0001
检验模型全部系数为0,拒绝则模型有意义
Type 3 Analysis of Effects
Wald
Effect DF Chi-Square Pr > ChiSq
drug 2 95.0859 <.0001
degree 1 47.4607 <.0001
Analysis of Maximum Likelihood Estimates
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -1.9594 0.2229 77.2441 <.0001
drug 2 1 1.8342 0.2406 58.0936 <.0001
drug 1 1 2.2850 0.2479 84.9472 <.0001
degree 1 1.3806 0.2004 47.4607 <.0001
参数估计与检验
Odds Ratio Estimates
Point 95% Wald
Effect Estimate Confidence Limits
drug 2 vs 0 6.260 3.906 10.033
drug 1 vs 0 9.826 6.044 15.974
degree 3.977 2.685 5.1
Association of Predicted Probabilities and Observed Responses
Percent Concordant 72.2 Somers' D 0.568
Percent Discordant 15.4 Gamma 0.9
Percent Tied 12.4 Tau-a 0.282
| Pairs 82530 c 0.784 |
data ingots;
input Heat Soak r n @@;
datalines;
7 1.0 0 10 14 1.0 0 31 27 1.0 1 56 51 1.0 3 13
7 1.7 0 17 14 1.7 0 43 27 1.7 4 44 51 1.7 0 1
7 2.2 0 7 14 2.2 2 33 27 2.2 0 21 51 2.2 0 1
7 2.8 0 12 14 2.8 0 31 27 2.8 1 22 51 4.0 0 1
7 4.0 0 9 14 4.0 0 19 27 4.0 1 16
;
proc logistic data=ingots;
model r/n=Heat Soak;
run;
The LOGISTIC Procedure
Model Information
Data Set WORK.INGOTS
Response Variable (Events) r
Response Variable (Trials) n
Model binary logit
Optimization Technique Fisher's scoring
实验次数n,事件发生次数r
Number of Observations Read 19
Number of Observations Used 19
Sum of Frequencies Read 387
Sum of Frequencies Used 387
Response Profile
Ordered Binary Total
Value Outcome Frequency
1 Event 12
2 Nonevent 375
响应变量分析,发生12次,不发生375次。
Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 108.988 101.346
SC 112.947 113.221
-2 Log L 106.988 95.346
用于选择最优级模型,越小越优级
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 11.28 2 0.0030
Score 15.1091 2 0.0005
Wald 13.0315 2 0.0015
模型检验
似然比检验(likelihood ratiotest)、计分检验(score test)、Wald检验(Wald test)三种
Analysis of Maximum Likelihood Estimates
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -5.5592 1.1197 24.6503 <.0001
Heat 1 0.0820 0.0237 11.9454 0.0005
Soak 1 0.0568 0.3312 0.0294 0.8639
系数检验
Odds Ratio Estimates
Point 95% Wald
Effect Estimate Confidence Limits
Heat 1.085 1.036 1.137
Soak 1.058 0.553 2.026
The LOGISTIC Procedure
Association of Predicted Probabilities and Observed Responses
Percent Concordant .4 Somers' D 0.460
Percent Discordant 18.4 Gamma 0.555
Percent Tied 17.2 Tau-a 0.028
| Pairs 4500 c 0.730 |
Logit(p)=log(p/1-p)=-5.5592+0.082 × Heat+0.0568 × Soak
If Heat=7 and Soak=1, then logit(p)=-4.92584. Using this logit estimate, you can calculate as follows:
P=1/(1+e4.9284)=0.0072
Y表示骑车上班(Y=1bike,Y=0,BUS),X1年龄,X2月收入,X3性别(1男,0女)
| X3 | X1 | X2 | y |
| 0 | 18 | 850 | 0 |
| 0 | 21 | 1200 | 0 |
| 0 | 23 | 850 | 1 |
| 0 | 23 | 950 | 1 |
| 0 | 28 | 1200 | 1 |
| 0 | 31 | 850 | 0 |
| 0 | 36 | 1500 | 1 |
| 0 | 42 | 1000 | 1 |
| 0 | 46 | 950 | 1 |
| 0 | 48 | 1200 | 0 |
| 0 | 55 | 1800 | 1 |
| 0 | 56 | 2100 | 1 |
| 0 | 58 | 1800 | 1 |
| 1 | 18 | 850 | 0 |
| 1 | 20 | 1000 | 0 |
| 1 | 25 | 1200 | 0 |
| 1 | 27 | 1300 | 0 |
| 1 | 28 | 1500 | 0 |
| 1 | 30 | 950 | 1 |
| 1 | 32 | 1000 | 0 |
| 1 | 33 | 1800 | 0 |
| 1 | 33 | 1000 | 0 |
| 1 | 38 | 1200 | 0 |
| 1 | 41 | 1500 | 0 |
| 1 | 45 | 1800 | 1 |
| 1 | 48 | 1000 | 0 |
| 1 | 52 | 1500 | 1 |
| 1 | 56 | 1800 | 1 |
Input X3 X1 X2 y;
Datalines;
0 18 850 0
0 21 1200 0
0 23 850 1
0 23 950 1
0 28 1200 1
0 31 850 0
0 36 1500 1
0 42 1000 1
0 46 950 1
0 48 1200 0
0 55 1800 1
0 56 2100 1
0 58 1800 1
1 18 850 0
1 20 1000 0
1 25 1200 0
1 27 1300 0
1 28 1500 0
1 30 950 1
1 32 1000 0
1 33 1800 0
1 33 1000 0
1 38 1200 0
1 41 1500 0
1 45 1800 1
1 48 1000 0
1 52 1500 1
1 56 1800 1
;
Proc logistic data=p256 descending ;
Model y=x1-x3;
output out=pred p=phat lower=lcl upper=ucl
predprobs=(individual crossvalidate);
run;
proc print data=pred;
run;
The LOGISTIC Procedure
Model Information
Data Set WORK.P256
Response Variable y
Number of Response Levels 2
Model binary logit
Optimization Technique Fisher's scoring
Number of Observations Read 28
Number of Observations Used 28
Response Profile
Ordered Total
Value y Frequency
1 0 15
2 1 13
Probability modeled is y=0.
Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 40.673 33.971
SC 42.005 39.299
-2 Log L 38.673 25.971
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 12.7026 3 0.0053
Score 10.4135 3 0.0154
Wald 6.5331 3 0.0884
Analysis of Maximum Likelihood Estimates
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 3.6547 2.0911 3.0545 0.0805
X1 1 -0.0822 0.0521 2.4853 0.1149
X2 1 -0.00152 0.00187 0.6613 0.4161
X3 1 2.5016 1.1578 4.66 0.0307
The LOGISTIC Procedure
Odds Ratio Estimates
Point 95% Wald
Effect Estimate Confidence Limits
X1 0.921 0.832 1.020
X2 0.998 0.995 1.002
X3 12.203 1.262 118.014
Association of Predicted Probabilities and Observed Responses
Percent Concordant 87.2 Somers' D 0.744
Percent Discordant 12.8 Gamma 0.744
Percent Tied 0.0 Tau-a 0.384
Pairs
| 195 c 0.872 |
| 序号 | 样品数W | 其中有房屋数 | 收 入(千元) |
| 1 | 10.0 | 1.5 | 2.0 |
| 2 | 20.0 | 3.2 | 3.0 |
| 3 | 25.0 | 4.0 | 4.0 |
| 4 | 30.0 | 5.0 | 5.0 |
| 5 | 40.0 | 8.0 | 6.0 |
| 6 | 50.0 | 12.0 | 8.0 |
| 7 | 60.0 | 18.0 | 10.0 |
| 8 | 80.0 | 28.0 | 13.0 |
| 9 | 100.0 | 45.0 | 15.0 |
| 10 | 70.0 | 36.0 | 20.0 |
| 11 | 65.0 | 39.0 | 25.0 |
| 12 | 50.0 | 33.0 | 30.0 |
| 13 | 40.0 | 30.0 | 35.0 |
| 14 | 25.0 | 20.0 | 40.0 |
| 15 | 30.0 | 27.0 | 50.0 |
| 16 | 40.0 | 38.0 | 60.0 |
| 17 | 50.0 | 48.0 | 70.0 |
| 18 | 60.0 | 58.0 | 80.0 |
Input no n n1 x;
Datalines;
1 10.0 1.5 2.0
2 20.0 3.2 3.0
3 25.0 4.0 4.0
4 30.0 5.0 5.0
5 40.0 8.0 6.0
6 50.0 12.0 8.0
7 60.0 18.0 10.0
8 80.0 28.0 13.0
9 100.0 45.0 15.0
10 70.0 36.0 20.0
11 65.0 39.0 25.0
12 50.0 33.0 30.0
13 40.0 30.0 35.0
14 25.0 20.0 40.0
15 30.0 27.0 50.0
16 40.0 38.0 60.0
17 50.0 48.0 70.0
18 60.0 58.0 80.0
;
Proc logistic data=ex1;
Model n1/n=x;
Run;
