psy524 lecture 19 logistic_cont

8/7/2019 Psy524 Lecture 19 Logistic_cont

http://slidepdf.com/reader/full/psy524-lecture-19-logisticcont 1/29

Logistic Regression

Continued

Psy 524Ainsworth



Eq uations

Regression Equation

1 1 2 2 3 3

1 1 2 2 3 31

A B X B X B X

i A B X B X B X

eY

e

!



Eq uations

The linear part of the logistic regressionequation is used to find the probabilityof being in a category based on thecombination of predictorsPrograms like SPSS and SAS separate

discrete predictors with more than twolevels into multiple dichotomousvariables.



Eq uations

Fall (0 = no, 1 = yes);Difficulty is continuous;

season (1 = autumn, 2= winter, 3 = spring)

Fall Difficulty Season1 3 1

1 1 10 1 31 2 31 3 20 2 20 1 21 3 11 2 31 2 1

0 2 20 2 31 3 21 2 20 3 1



Eq uations

Season is a discrete variable with threelevels that would be turned into 2separate variables season 1 and season 2.Season 1 is coded 1 for autumn and 0otherwise; season 2 is coded 1 if winter

and 0 otherwise; spring is coded whenboth are 0.



Fall Difficulty Season Season1 Season21 3 1 1 0

1 1 1 1 00 1 3 0 01 2 3 0 01 3 2 0 10 2 2 0 10 1 2 0 11 3 1 1 01 2 3 0 01 2 1 1 0

0 2 2 0 10 2 3 0 01 3 2 0 11 2 2 0 10 3 1 1 0



I nterpreting coefficients

Good news regression coefficients andtheir standard errors are found throughadvanced calculus methods of maximumlikelihood (e.g. derivatives, etc.), sowe re not getting into it.



I nterpreting coefficientss

Each coefficient is evaluated using aWald test (really just a Z-test)

j

j

jB

BW

SE !




Term C e fficien t t ndard E rror Wa ld Te st (Z)(C on st an t) - .776 .89 - .88iffic lt . .9 . 7ea s on ( ) .9 7 .59 . 4

ea s on ( ) - .4 8 . 9 - . 9




The tests of the coefficients areapproximate z-scores so they are testedas z-scores. None of the coefficients aresignificant in the sample data.The coefficients are placed into the

model like in regular multiple regressionin order to predict individual subjects probabilities.



G oodness of fit

Log-likelihood

1

log [ ln( ) (1 ) ln(1 )]N

i i i i

i

likelihood Y Y Y Y !

! §) )



G oodness of fit

Models are compared by taking 2 timesthe difference between the models log-likelihoods.

2

2[(log-likelihood f or bigger model) - (log-likelihood f or sma ller model)]G !

Note: model s mus t be nested in order to be compa red. Nested means that all component s of the sma ller model mus t be in the larger model.



G oodness of fit

Often a model with intercept andpredictors is compared to an intercept only model to test whether thepredictors add over and above theintercept only. This is usually noted as

2=2[LL(B)-LL(0)]



G oodness of Fit

2[-8.74 - (-10.11)] = 2.74the constant only model has one degree of

freedom (for the constant) and the full modelhas 4 degrees of freedom (1 for the constant,and one for each predictor), the DF for thetest is 4 1 = 3. The test of the chi-square is

not significant at 3 DFs so the null is retained.Models with different numbers of predictors(nested) can also be compared in the samefashion.



S tandardized Residuals

Given a model you can calculate thestandardized residual of each persons

predicted probability (using the rather scarymatrix formula on page 527)You can have SPSS save the standardizedresiduals and once this is done you can

analyze them to see if any are above 3.3 andif they are the subject is an outlier accordingto the given model.



T ypes of Logistic Regression

Direct or SimultaneousSequential or User defined

Stepwise or StatisticalProbit vs. Logistic

Logistic assumes a categorical (qualitative)underlying distributionProbit assumes a normal distribution and uses Z-scores to estimate the proportion under the curve.Near .5 the analyses are similar they only differ at the extremes.



I nferential T ests

Assessing goodness of fit for the modelThere are many goodness of fit indices, soyou need to keep in mind what is beingcompared to know whether a significant difference is good or not. Some testssignificance means fit and otherssignificance means lack of fit.



I nferential T ests

Also consider sample sized when evaluatinggoodness of fit. Chi-square statistics are

heavily influenced by sample size so that with a very large sample even minutedifferences will be significant.

If the sample size is large and the chi-square is

significant this may not be important Though if there is significance and the sample isrelatively small than the effect is notable.



I nferential T ests

Constant only vs. full model here youwant there to be a significant improvement

to the prediction when all of the predictorsare added to the model.Perfect model vs. proposed model someprograms test the proposed model against a

perfect model (one that predicts perfectly)in this case you want the chi-square to benon-significant.



I nferential T ests

Deciles of riskStep 1: Subjects are ordered on there predicted probabilityStep 2: Subjects are divided into 10 groups based on theprobabilities (all subjects with .1or lower in lowest decile,.9 or higher in the highest decile, etc.)Step 3: Divide subjects into groups according to theiractual outcome (e.g. fall or no fall) creating a 2 X 10matrix of observed frequencies for the example data.

Step 4: Expected frequencies are calculated and theobserved frequencies are compared to the expectedfrequencies in a chi-square test. Fit is indicated by a non-significant chi-square.In SPSS this is given by the Hosmer-Lemeshow test.



T est of individual predictors

The Wald test is usually used to assessthe significance of prediction of each

predictorThe Wald test is known to be overlyconservative (increased type II error)and when a predictor is multinomial it does not give a test of the wholepredictor but only the dummy codedversions of the predictor.



N umber and type of outcomes

Logistic regression with more than twooutcome categories

If the response are ordered polytomous than k 1equations are made (k being the number of categories) which predicts the probability that acase is above a given category.

Defines thresholds point in the data that separatescategory one form two, two from three, etc.Calculates the probability that a person passes a giventhresholdThis is done for all categories except the last because theprobability of being in a category above the highest iszero.



N umber and type of outcomes

If the responses are non-orderedmultinomial than again k 1 equations are

created but the equations are predictingwhether a person belongs to a category ornot. An equation is made for all categoriesexcept the last.

SPSS ordinal (plum) is used for orderedpolytomous and SPSS multinomial (nomreg) isused for un-ordered multinomial data.



S trength of association(pseudo R-s q uare)

There are several measures intended tomimic the R-squared analysis, but noneof them are an R-squared. Theinterpretation is not the same, but theycan be interpreted as an approximate

variance in the outcome accounted forby the




McFadden s

this value tends to be smaller than R-square and values of .2 to .4 areconsidered highly satisfactory.

2( )

1 (0)LL B

LLV !




Cox and Snell is also based on log-likelihood but it takes the sample sizeinto account:

but it cannot reach a maximum of 1 likewe would like so

2 21 exp [ ( ) (0)]C S

R LL B LLn

« »! ¬ ¼½




The Nagelkerke measure adjusts the Cand S measure for the maximum value sothat 1 can be achieved:

22 2 1

2 , where 1 exp[2( ) (0)]C S

N M AX

M AX

nR

! !

psy524 lecture 19 logistic_cont

Documents