statistica pentru evaluarea performantelor

Upload: lucianb12

Post on 04-Jun-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 Statistica Pentru Evaluarea Performantelor

    1/8

    Appendix D

    Statistics for Performance Evaluation

    CD-71

  • 8/13/2019 Statistica Pentru Evaluarea Performantelor

    2/8

    D.1 Single-Valued Summary Statistics

    The mean, or average value, is computed by summing the data and dividing the sum

    by the number of data items. Specifically,

    where

    is the mean value

    n is the number of data items

    xiis the ith data item

    The formula for computing variance is shown in Equation D.2.

    where

    2 is the variance

    is the population mean

    n is the number of data items

    xi

    is the ith data item

    When calculating the variance for a sampling of data, a better result is obtained by

    dividing the sum of squares by n 1 rather than by n.The proof of this is beyond the

    scope of this book; however, it suffices to say that when the division is by n 1, the

    sample variance is an unbiased estimator of the population variance.An unbiased esti-

    mator has the characteristic that the average value of the estimator taken over all pos-

    sible samples is equal to the parameter being estimated.

    D.2 The Normal DistributionThe equation for the normal, or bell-shaped, curve is shown in Equation D.3.

    where

    f(x) is the height of the curve corresponding to values of x

    eis the base of natural logarithms approximated by 2.718282

    f xx

    ( ) / (( ) ( )/ )= 1

    2 22 (D.3)e

    2

    D.22

    1

    21= =

    nxi

    i

    n

    ( ) ( )

    (D.1)= =

    1

    1n

    xi

    i

    n

    CD-72 Appendix D Statistics for Performance Evaluation

  • 8/13/2019 Statistica Pentru Evaluarea Performantelor

    3/8

    is the arithmetic mean for the data

    is the standard deviation

    D.3 Comparing Supervised Learner Models

    In Chapter 7 we described a general technique for comparing two supervised learner

    models using the same test dataset. Here we provide two additional techniques for

    comparing supervised models. In both cases, model test set error rate is treated as a

    sample mean.

    Comparing Models with Independent Test Data

    With two independent test sets, we simply compute the variance for each model and

    apply the classical hypothesis testing procedure.An outline of the technique follows.

    1. Given

    Two models, M1 and M2, built with the same training data Two independent test sets, set A containing n1 elements and set B with n2

    elements

    Error rate E1 and variance v1 for model M1 on test set A

    Error rate E2 and variance v2 for model M2 on test set B2. Compute

    3. Conclude

    If P >= 2, the difference in the test set performance of model M1 andmodel M

    2is significant.

    Lets look at an example. Suppose we wish to compare the test set performance of

    learner models M1

    and M2.We test M

    1on test set A and M

    2on test set B. Each test set

    contains 100 instances. M1

    achieves an 80% classification accuracy with set A, and M2

    obtains a 70% accuracy with test set B.We wish to know if model M1

    has performed

    significantly better than model M2.

    For model M1:

    E1

    = 0.20

    v1

    = .2 (1.2) = .16

    PE E

    v n v n=

    +

    1 2

    1 1 2 2( / / )( )D.4

    D.3 Comparing Supervised Learner Models CD-73

  • 8/13/2019 Statistica Pentru Evaluarea Performantelor

    4/8

    For model M2:E

    2= 0.30

    v2

    = .3 (1.3) = .21

    The computation for P is:

    As P 1.4714, the difference in model performance is not considered to be sig-

    nificant.We can increase our confidence in the result by switching the two test sets

    and repeating the experiment.This is especially important if a significant difference is

    seen with the initial test set selection.The average of the two values for P are then

    used for the significance test.

    Pairwise Comparison with a Single Test Set

    When the same test set is applied to the data, one option is to perform an instance-

    by-instance pairwise matching of the test set results.With an instance-based compari-

    son, a single variance score based on pairwise differences is computed.The formula for

    calculating the joint variance is shown in Equation D.5.

    When test set error rate is the measure by which two models are compared, the

    output attribute is categorical.Therefore for any instance i contained in classj, eijis 0 if

    the classification is correct and 1 if the classification is in error.When the output at-

    tribute is numeric, eijrepresents the absolute difference between the computed and ac-

    Vn

    E Ei

    n

    V

    e i

    e i

    E E

    n

    e e1i 2i 12 1 2

    12

    1 2

    1

    1 1

    2=

    =

    [( ) ( )] (D.5)

    where

    is the joint variance

    is the classifier error on the instance for learner model M

    is the classifier error on the instance for learner model M

    is the overall classifier error rate for model M minus the classifier error rate

    for model M

    is the total number of test set instances

    1ith

    1

    2ith

    2

    1

    2

    P0.20 0.30

    =(0.16/100 + 0.21/100)

    CD-74 Appendix D Statistics for Performance Evaluation

  • 8/13/2019 Statistica Pentru Evaluarea Performantelor

    5/8

    tual output value.With the revised formula for computing joint variance, the equation

    to test for a significant difference in model performance becomes

    Once again, a 95% confidence level for a significant difference in model test set

    performance is seen if P >= 2.This technique is appropriate only if an instance-based

    pairwise comparison of model performance is possible. In the next section we address

    the case where an instance-based comparison is not possible.

    D.4 Confidence Intervals for Numeric Output

    Just as when the output was categorical, we are interested in computing confidenceintervals for one or more numeric measures. For purposes of illustration, we use mean

    absolute error. As with classifier error rate,mean absolute error is treated as a sample mean.

    The sample variance is given by the formula:

    Lets look at an example using the data in Table 7.2.To determine a confidence

    interval for the mae computed for the data in Table 7.2, we first calculate the variance.

    Specifically,

    Next, as with the classifier error rate, we compute the standard error for the maeas the square root of the variance divided by the number of sample instances.

    Finally, we calculate the 95% confidence interval by respectively subtracting and

    adding two standard errors to the computed mae. This tells us that we can be 95%

    confident that the actual mae falls somewhere between 0.0108 and 0.1100.

    SE (0.0092 / 15) .0248= 0

    variance( )14

    ( )1

    1 20 0604 0 0604

    0 024 0 0604 0 002 0 0604 0 001 0 0604

    0 0092

    15

    2 2 2

    . .

    ( . . ) ( . . ) .... ( . . )

    .

    ==

    + + +

    ei

    i

    variance( )1

    ( )1

    D.7)

    where

    is the absolute error for the th instance

    is the number of instances

    1 2

    maen

    e mae i

    n

    i

    n

    i

    ie

    == (

    P

    E E

    V n=

    1 2

    12 /(D.6)

    D.4 Confidence Intervals for Numeric Output CD-75

  • 8/13/2019 Statistica Pentru Evaluarea Performantelor

    6/8

    D.5 Comparing Models with Numeric Output

    The procedure for comparing models giving numeric output is identical to that for

    models with categorical output. In the case where two independent test sets are avail-able and maemeasures model performance, the classical hypothesis testing model takes

    the form:

    where

    mae1is the mean absolute error for model M

    1

    mae2is the mean absolute error for model M

    2

    v1 and v2 are variance scores associated with M1 and M2

    n1

    and n2

    are the number of instances within each respective test set

    When the models are tested on the same data and a pairwise comparison is possi-

    ble,we use the formula:

    where

    mae1is the mean absolute error for model M

    1

    mae2is the mean absolute error for model M

    2

    v12

    is the joint variance computed with the formula defined in Equation D.5

    n is the number of test set instances

    When the same test data is applied but a pairwise comparison is not possible, the

    most straightforward approach is to compute the variance associated with the maefor

    each model using the equation:

    where

    maejis the mean absolute error for model j

    eiis the absolute value of the computed value minus the actual value for instance i

    n is the number of test set instances

    The hypothesis of no significant difference is then tested with Equation D.11.

    variance (D.10)( ) ( )maen

    e mae j i ji

    n

    =

    =

    1

    1 1

    2

    Pmae mae

    v n=

    1 2

    12 /(D.9)

    Pv n v n

    mae mae =

    +

    1 2

    1 1 2 2( / / )(D.8)

    CD-76 Appendix D Statistics for Performance Evaluation

  • 8/13/2019 Statistica Pentru Evaluarea Performantelor

    7/8

    where

    vis either the average or the larger of the variance scores for each model

    n is the total number of test set instances

    As is the case when the output attribute is categorical, using the larger of the two

    variance scores is the stronger test.

    Pv

    mae mae

    n=

    1 2

    2( / )(D.11)

    D.5 Comparing Models with Numeric Output CD-77

  • 8/13/2019 Statistica Pentru Evaluarea Performantelor

    8/8