02 cuando un buen ajuste puede ser malo

Upload: ana-aranda

Post on 03-Jun-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/11/2019 02 Cuando Un Buen Ajuste Puede Ser Malo

    1/5

    TRENDS in Cognitive Sciences Vol.6 No.10 October 2002

    http://tics.trends.com 1364-6613/02/$ see front matter 2002 Elsevier Science Ltd. All rights reserved. PII: S1364-6613(02)01964-2

    421Opinion

    The explosion of inter est in m odeling cognit ive

    processes over the pa st 20 years h as fueled t he

    cognitive sciences in man y w ay s. Not only ha s it

    opened up new w ays of thinking about research

    problems a nd possible solutions, but it ha s also

    enabled researchers to gain a better understan ding of

    their theories by simulating a computat ional

    insta ntia tion of it. Modeling is now sufficiently

    mainstr eam tha t one can get the impression tha t the

    models themselves are repla cing the th eories from

    wh ich they evolved.

    What has not kept pace with t he advan ces and

    interest in modeling is the development of methods

    for evalua ting a nd t esting the models themselves.

    A model is not interchan geable with a theory, but

    only one of many possible qua ntita tive representat ions

    of it. At horough evalua tion of a m odel requires

    methods that are sensitive to its qua ntita tive form.

    Crit eria used for evalua ting t heories [1], such as

    testing their performa nce in an experimental setting,

    do not speak to the q uality of the choices that are

    made in building their qua ntita tive counterpart s

    (i.e. choice of par am eters, how th ey ar e combined)

    or their ra mifications. The paucity of such model

    selection methods is surprising given th e centra lity of

    the problem itself. Wha t could be more funda menta l

    tha n deciding between two alt ernative explana tionsof a cognitive process?

    Hownotto compare models

    Math ematical model are frequently tested aga inst

    one another by evaluating how w ell each fits the

    dat a genera ted in an experiment or simulation.

    Such a t est ma kes sense given tha t one criterion of

    model performa nce is that it reproduce the da ta .

    A goodness-of-fit mea sure (GOF; see Glossar y) is

    invariably used to measure their adequa cy in

    achieving this goal. Wha t is measur ed is how mu ch a

    models predictions devia te from th e observed da ta [2,3].

    The model tha t provides the best fit (i.e. sma llest

    devia tion) is fav ored. The logic of th is choice rest s on

    the a ssumption th at the model that provides the best

    fit to all da ta must be a closer approximat ion to the

    cognitive process under investiga tion tha n its

    competitors [4].

    Such a conclusion is reasonable if measurem ents

    were m ad e in a noise-free (i.e. errorless) system . One

    of the biggest challenges fa ced by cognitive scientists

    is that human a nd animal da ta are noisy. Error ar ises

    from several sources, such a s th e imprecision of our

    measurement tools, variat ion in par ticipants a nd

    their performa nce over t ime. The problem of ran dom

    When a good fit canbe badMark A.Pitt and In Jae Myung

    How should we select among computational models of cognition? Although itis commonplace to measure how well each model fits the data, this is

    insufficient.Good fits can be misleading because they can result from

    properties of the model that have nothing to do with it being a close

    approximation to the cognitive process of interest (e.g. overfitting).Selection

    methods are introduced that factor in these properties when measuring fit.

    Their success in outperforming standard goodness-of-fit measures stems from

    a focus on measuring the generalizability of a models data-fitting abilities,

    which should be the goal of model selection.

    Mark A. Pitt*

    In Jae Myung

    Dept of Psychology, Ohio

    State University, 1885

    Neil Avenue, Columbus,

    Ohio 43210-1222, USA.

    *e-mail: [email protected]

    hippocampus and neocortexInsights from th e

    successes a nd fa ilures of connectionist models of

    learning a nd memory.Psychol. R ev. 102, 419457

    10 Petersen , S.E . et al . (1988) P ositron emission

    tomographic studies of the cortical ana tomy of

    single word processing. Nature331, 585589

    11 Penfield, W. a nd R oberts, L. (1959)Speech an d

    Brain Mechanism, P rinceton University P ress

    12 Ojeman n, G.A. (1979) Individua l varia bility in

    cortical localization of langua ge. J. N eurosurg.50, 164169

    13 Nobre, A.C. et al . (1994) Word r ecognition in th e

    huma n inferior temporal lobe.Nature372,

    260263

    14 Wada , J . and Ra smussen, T. (1960) Intr acar otid

    injection of sodium amy ta l for the latera lization of

    speech domina nce: experimenta l an d clinical

    observations. J. N eurosurg. 17, 266282

    15 Pa scual-Leone, A. et al . (1999) Tra nscra nia l

    magnetic stimulation: studying the

    brain-behaviour relationship by ind uction of

    virtua l lesions. Phil os. Trans. R. Soc. Lond. B

    Bi ol. Sci. 354, 12291238

    16 Pa scual-Leone, A. et al . (2000) Tra nscra nia l

    magn etic stimulation in cognitive

    neurosciencevirtua l lesion, chronometry, an d

    functional connectivity. Curr. Opin. Neurobiol. 10,

    232237

    17 Marie, P. (1906a) Revision de la question de

    laphasie: La troisieme convolution frontale gauche

    ne joue aucun t ole speciale da ns la fonction du

    langage. Semai ne Medi cale21, 241247.

    (Repreinted in C ole, M.F. and Cole, M. eds., (1971),

    Pierre Ma ries papers on speech disorders.

    New York: Ha fner).18 Marie, P. (1906b) Revision de la quest ion de

    lapha sie: Que faut-il penser des a phasies

    sous-corticales (aphasies pures)? Semaine

    Medicale26, 493500

    19 Lash ley, K.S. (1929)Brain Mechanisms and

    Intelligence,Un iversity of Chicago Press

    20 Lash ley, K.S. (1950) In sear ch of the engram . In

    Symposia for th e Society for E xperi mental B iology,

    No, 4, Ca mbridge University Press

    21 Fodor, J .A. (1983)The Modular ity of Mi nd,

    MITPress

    22 Vandenberghe, R. et al . (1996) Fun ctiona l

    ana tomy of a common semantic system for words

    and pictures. Nature383, 254256

    23 Mummery, C.J . et al . (1999) Disrupted temporal

    lobe connections in sema ntic dementia . Bra in

    122, 6173

    24 Price, C.J . et al . (1999) Delineat ing necessary and

    sufficient neura l systems with functional imaging

    studies of neuropsychological pat ients. J. Cogn.

    Neurosci. 11, 43714382

    25 Price, C.J. a nd Friston, K.J. (1999) Scanning

    patients on ta sks they can perform.Hum. Brain

    Mapp. 8, 102108

    26 Mummery, C.J . et al . (2000) A voxel ba sedmorphometry stud y of sema ntic dementia. The

    relation of temporal lobe atr ophy to cognitive

    deficit.Ann. N eurol. 47, 3645

    27 Friston , K.J . et al . (1999) Multi-subject fM RI

    studies and conjunction ana lysis. NeuroImage10,

    385396

    28 Price, C.J. and Friston, K.J. (1997) Cognitive

    conjunctions: a new a pproach to brain activa tion

    experiments. Neuroimage5, 261270

    29 Ashburner, J . and Friston, K.J. (2000)

    Voxel-based morphometry the methods.

    NeuroImage11, 805821

    30 Howa rd, D. and P att erson, K. (1992)Pyrami ds and

    Palm Trees: A Test of Semanti c Access from Pi ctures

    and Words, Tham es Valley, Bury S t Ed munds

  • 8/11/2019 02 Cuando Un Buen Ajuste Puede Ser Malo

    2/5

    error a nd the lengths researchers go to combat it are

    evident in th e experimental a nd sta tistical methods

    used in the field (e.g. repeat ed measur ement,inferentia l sta tistics). They serve t o hold error in

    check so tha t va ria tion due to the menta l process of

    interest, what we a re really interested in, will be

    visible in the da ta .

    Noisy da ta ma ke GOF by itself a poor method of

    model selection. As th e simula tion in B ox 1

    illustra tes, a G OF measure such as the Root Mean

    Sq ua red Err or (RMSE ) is insensitive to the different

    sources of varia tion in the data , whether it is ra ndom

    error or du e to t he cognitive process of int erest. This

    could result in t he selection of a model tha t overfits

    the dat a, w hich ma y not be the model that best

    approximat es the cognitive process under st udy.

    How to compare models

    B ecause it is impossible to elimina te error from da ta ,

    efforts ha ve focused on im proving model selection in

    other wa ys. The preferred solution ha s been to

    redefine the problem as one of assessing how w ell a

    models fit to one da ta sam ple genera lizes to futur e

    sam ples genera ted by t ha t sa me process [5].

    GENERALIZABILITY (see Glossary) uses the data and

    informat ion about the model itself to make a best-

    guessestima te as t o how likely it is the model could

    have generated the data sample in hand . In this

    approach, a g ood fit is a necessary b ut not sufficient

    condition for a model becau se man y models are

    capable of fit t ing a da ta set reasonably well. Because

    the set of such can dida te models is potent ially qu ite

    lar ge, we can only ever infer the likelihood wit h w hich

    each model under consideration generated the da ta .

    In this regard, generalizability is a stat istical

    inference problem. It is no different conceptua lly from

    estimating the replicability of a n experimental result

    using inferential stat istics or inferring the

    chara cteristics of a populat ion from a sa mple.

    A han dful of generalizability m easures have been

    proposed in th e last s 30 yea rs [6]. By necessity, all

    include a measure of GOF t o assess a models fit to the

    da ta (see B ox 2). Terms encoding informa tion a bout

    the model itself ar e included to level the playin g field

    am ong models so tha t one model, by virtu e of its design

    choices (i.e. ma thema tical insta ntia tion) does not ha ve

    an inherent adva nta ge over its competitors in fit t ing

    the dat a best, a nd thus being selected. By nullifying

    such model-specific properties, the m odel tha t is the

    best a pproxima tion to th e cognitive process understudy, and not simply th e one that absorbs the most

    varia tion in the data , will be selected.

    Measures of generalizability

    Ea rly measures of generalizability such as t he Akaike

    informat ion criterion (AIC ) [7,8] a nd B ay esian

    informat ion criterion (BIC ) [9] ad dressed th e most

    salient differences among models: the number of free

    para meters. As is generally w ell known, a model with

    man y free para meters can provide a better fit to a

    dat a sa mple tha n a model with few para meters, even

    if the latt er generated the da ta . The second term in

    these mea sures includes a count of the num ber ofpara meters (k). AIC and BI C penalize a model more

    as t he number of par am eters increases. To be

    selected, the model w ith m ore parameters must

    overcome this penalt y a nd provide a substant ially

    better fit than a model with fewer para meters. That

    is, the superior fit obtained w ith th e extra para meters

    must justify th e necessity of those para meters in fully

    capturin g th e cognitive process.

    An equa lly salient but much less tan gible

    dimension along w hich models also differ is in their

    functional form, which refers to the wa y in w hich the

    para meters a re combined in a models equat ion. More

    sophisticat ed selection meth ods, such as Ba yesian

    model selection (BM S) [10] a nd MINIMUM DESCRIP TION

    LENGTH (MDL) [11,12], a re sen sit ive to a models

    functional form as w ell as t o the number of para meters.

    Functional form is t aken into a ccount in th e third

    term in the MDL measure. In BMS, both are hidden

    in the integra l. (Cross Valida tion, alth ough not listed,

    is another selection method tha t is t hought to be

    sensitive to both dimensions of model COMPLEXITY. I t

    involves applying G OF in a n on-sta nda rd w ay. [13])

    The second and third t erms in MDL together

    provide a mea sure of a models complexity.

    Conceptually, complexity refers to tha t char acterist ic

    TRENDS in Cognitive Sciences Vol.6 No.10 October 2002

    http://tics.trends.com

    422 Opinion

    Complexity: the property of a model that enables it to fit diverse patterns of data; it is the

    flexibility of a model. Although the number of parameters in a model and its functional form can

    be useful for gauging its complexity, a more accurate and intuitive measure is the number of

    distinct probability distributions that the model can generate by varying its parameters over

    their entire range. Details of this geometric complexity measure can be found in [a].

    Functional form: the way in which the parameters () and data (x) are combined in a models

    equation: y= xand y= + xhave the same number of parameters but different functional forms

    (multiplicative versus additive).Generalizability: the ability of a model to fit all data samples generated by the same cognitive

    process, not just the currently observed sample (i.e. the models expectedGOF with respect to

    new data samples). Generalizability is estimated by combining a models GOF with a measure of

    its complexity.

    Goodness of fit (GOF): the precision with which a model fits a particular sample of observed

    data. The predictions of the model are compared with the observed data. The discrepancy

    between the two is measured in a number of ways, such as calculating the root mean squared

    error between them.

    Minimum Description Length (MDL): a versatile measure of generalizability. MDL was

    developed within algorithmic coding theory in computer science [b], where the goal of model

    selection is to choose the model that permits the greatest compression of data in its description.

    Regularities in the data are assumed to imply redundancy. The more the data can be

    compressed by the model by extracting this redundancy, the more that is learned about the

    cognitive process.

    Overfitting: the case where, in addition to fitting the main trends in the data, a model also fits the

    microvariation from this main trend at each data point. Compare the middle and right graph

    inserts in Fig. 1.Parameters: variables in a models equation that represent mental constructs or processes;

    they are adjusted to improve a models fit to data. For example, in the model y= x, is a

    parameter.

    Probability density function: a function that specifies the probability of observing each outcome

    of a random variable given the value of a models parameter.

    References

    a Myun g , I . J . et al . (2000) Counting probability d istributions: differentia l geometry a nd

    model selection. Proc. Nat l. Acad. Sci. U. S. A. 97, 1117011175

    b Li, M. and Vitan yi, P. (1997)An I ntr oduction to Kolm ogorov Complexity and its

    Applications, Springer-Verlag

    Glossary

  • 8/11/2019 02 Cuando Un Buen Ajuste Puede Ser Malo

    3/5

    TRENDS in Cognitive Sciences Vol.6 No.10 October 2002

    http://tics.trends.com

    423Opinion

    The ability of a model to fit data is a necessary conditionthat all models must satisfy to be taken seriously. GOFmeasures such as RMSE are inappropriate as modelselection methods because all they do is assess fit. Thismyopic focus is problematic when variation in the datacan be due to other factors (e.g. random sampling,individual differences) as well as the cognitive process

    under study.The severity of the problem is shown in Table I, which

    contains the results of a model recovery simulation usingRMSE. Four datasets were generated from a combinationof the two models (M

    Aand M

    B), defined as follows:

    MA: y= (1+t)a, M

    B: y= (b+ct)a where a, b, c > 0. Datasets

    were generated from each in the frequencies shown inthe four conditions (rows). In the first condition, all100 samples were generated by M

    Awith a = 0.4 and only

    sampling error introduced as noise. In the secondcondition, variation due to individual differences wasalso added to the data by using a different parameter value (a = 0.6) halfof the time. In the third condition, half of the data were generated bymodel M

    Aand half by M

    B. Condition four is the reverse of condition one,

    with the data plus sampling error being generated by MB. Models M

    Aand

    MB

    were then fitted to the data in each condition using RMSE. The meanRMSE fits, along with the percentage of time each model provided the

    best fit, are shown on the two right-most columns of the Table.A good model selection method must be able to ignore irrelevant

    variation in the data (e.g. sampling error, individual differences that arenot being modeled) and recover the model that generated the data.That is, the selection method must be capable of differentiating betweenthe variation that the model was designed to capture and the variationdue to noise. RMSE fails miserably at this task. M

    Bwas chosen 100% of

    the time in all four conditions and the mean fit is substantially better thanthat of M

    A. M

    Anever provided a better fit than M

    B, even when some or all

    of the data were generated by MA

    (conditions 13). This is why a good fitcan be bad.

    Readers who have some familiarity with modeling might not besurprised by the results given that M

    Bhas two more parameters than M

    A.

    A model with more parameters will always provide a better fit, all otherthings being equal [a]. The typical solution to this problem is to controlfor the number of parameters, but there are at least two reasons why thisfix is unsatisfactory. The most obvious is that it limits model comparison

    to those situations in which the number of parameters is equated acrossmodels. The diversity of models in the cognitive sciences can makethis a significant impediment in doing research. Less obvious althougheven more important is that a models data-fitting abilities are alsoaffected by other properties of the model, such as its functional form[b,c,d]. Unless they are taken into account by the selection method,

    simply equating for number of parameters will not place the models onan equal footing.

    In summary, it may make perfect sense to use GOF to determinewhether a given model can even pass the test of fitting a datasetreasonably well (i.e. capturing the main trends), but going beyondthis and comparing such fits betweenmodels, although intuitive,is risky.

    References

    a Linhar t, H. a nd Zucchini, W. (1986)Model Selection, Wiley

    b Myun g , I . J . et al . (2000) Specia l Issu e on model selection. J. Math.

    Psychol. 44 (12)

    c L i, S .C . et al . (1996) Using para meter sensitivity a nd interd ependence

    to predict model scope and falsifiability. J . Exp. Psychol. Gen. 125,

    360369

    d Myung, I.J . and Pit t, M.A. (1997) Applying Occam s raz or in modeling

    cognition: a Bayesian approach. Psychon. Bu ll . Rev. 4, 7995

    Box 1.Why GOF alone is a poor measure of model selection

    Listed in Table II are two GOF measures (RMSE, PVAF) and fourgeneralizability measures (AIC, BIC, BMS, MDL). Except for BMS, inwhich the likelihood function is integrated over the parameter space, themeasures of generalizability use the maximized log-likelihood, that is,ln(f(y|

    0)), as a GOF measure, the minus of which represents lack of fit.

    This fit index is combined with a measure of model complexity to yield ageneralizability measure. The four generalizability criteria differ fromone another in the conceptualization of model complexity. In AIC, thenumber of parameters (k) is the only dimension of complexity that is

    considered, whereas BIC also considers sample size (n). BMS and MDLgo one step further and also take into account the functional form of amodels equation. In MDL, this is reflected through the third term of thecriterion equation whereas in BMS it is hidden in the integral. Theseselection methods, except for PVAF, prescribe that the model thatminimizes the given criterion should be chosen.

    Reference

    a Schervish, M.J . (1995)The Theory of Stat istics, pp. 11115, S pringer -Verla g

    Box 2.Model selection criteria as measures of generalizability

    Table II. Two GOF Measures, four generalizability measures, and the dimensions of complexity to which each is sensitive

    Selection method Criterion equation Dimensions of complexity considered

    Root Mean Squared Error RMSE = (SSE/ N)1/2

    None

    Percent Variance Accounted For PVAF=100(1-SSE/SST) None

    Akaike Information Criterion AIC = -2 ln((y|0)) + 2k Number of parameters

    Bayesian Information Criterion BIC = -2 ln((y|0)) + kln(n) Number of parameters, sample size

    Bayesian Model Selection BMS=ln ( y|)()d Number of parameters, sample size, functional formMinimum Description Length MDL=ln(( y|0)) + (k/2)ln(n/2)+ln dIdet ))(( Number of parameters, sample size, functional form

    In the equations above, ydenotes observed data, is the models parameter, 0is the parameter value that maximizes the likelihood function f(y|), kis the number

    of parameters, n is the sample size, N is the number of data points fitted, SSE is the minimized sum of the squared errors between observations and predictions,

    SST is the sum of the squares total, () is the parameter prior density, I() is the Fisher information matrix in mathematical statistics [a], detdenotes thedeterminant of a matrix, and lndenotes the natural logarithm of base e.

    Table I. Results of a model recovery simulation in which a GOF measure

    (RMSE) was used to discriminate models when the source of the error was

    varied.

    Condition (sources of

    variation)

    Model the data were

    generated from

    Model fitted

    MAa = 0.4

    MAa = 0.6

    MB MA MB

    (1) Sampling error 100 0.040 (0%) 0.029 (100%)

    (2) Sampling error +

    individual differences

    50 50 0.041 (0%) 0.029 (100%)

    (3) Different models 50 50 0.075 (0%) 0.029 (100%)

    (4) Sampling error 100 0.079 (0%) 0.029 (100%)

  • 8/11/2019 02 Cuando Un Buen Ajuste Puede Ser Malo

    4/5

    of a model that makes it flexible and easily a ble to fit

    diverse patterns of dat a, usually by a small

    adjustment in one of its par ameters. In Fig. 1, it is

    wha t ena bles the model (wa vy line) in the lower rightgraph t o provide a better fit t o the dat a (dots) tha n

    tha t in the middle and left graphs. Both FUNCTIONAL

    FORM and the number of PARAMETERS contr ibute to

    model complexity.

    How complexity is related to generalizability and GOF:

    the dilemma of Occams razor

    The notion of model complexit y is a g ood vehicle wi th

    which to illustrate furt her the goal of generalizability

    and distinguish it from G OF. As demonstrat ed in B ox 1,

    fit is maximized w ith G OF. Because additional

    complexity w ill improve fit, th e tw o are positively

    relat ed; this is depicted by the t op function in Fig. 1.

    Generalizability, on the other hand, is not related t o

    complexity so stra ightforw ar dly. Ra ther, its function

    follows t he same t rajectory as tha t of GOF up to a

    certain point, a fter w hich fit declines as complexity

    increases.

    Why do th e tw o functions diverge? The da ta being

    fitted ha ve a certa in degree of complexity th at reflects

    th e opera tion of the cognit ive process. This point

    corresponds to the peak of the genera lizability fun ction.

    Any ad ditiona l complexity beyond tha t needed to

    capture t he underlying process (to the right of the peak)

    will cause the model to overfit the dat a by a lso

    capturing the microvaria tion due to ra ndom error,

    an d thu s reduce genera lizability. The reason both

    functions overlap to the left ha lf of the peak is tha t t he

    model itself must be sufficiently complex to fit t he

    dat a well. The model will underfit the da ta if it lacks

    the necessary complexity (i.e. it is t oo simple), a s

    illustra ted in t he lower left gra ph in Fig. 1. The dilemma

    tha t is faced in trying to maximize generalizabilityshould be clear : a delicat e bala nce must be struck

    betw een sufficient complexity on the one hand a nd

    good generalizability on t he other. MDL achieves th is

    bala nce by choosing t he model wh ose complexity is

    most justified by considering th e complexity of the

    da ta relat ive to the complexity of the model.

    Selection methods at work:the proof is in the pudding

    Generalizabili ty measures like BMS and MDL have

    thus fa r been developed for test ing only th ose models

    tha t can be described in terms of a PROBABILITY DENSITY

    FUNCTION. Exa mples of such sta tistical models are

    models of categorizat ion and models of informa tion

    int egra tion [14,15].

    The followin g m odel-recovery t ests d emonstra te

    the rela tive performan ce of the t hree classes of

    selection m ethods sh own in Table 1. The t hree m odels

    described (see Ta ble footn ote) w ere compar ed. In ea ch

    simulation, 1000 dat asets were generated from each

    model. Ea ch selection method w as t hen tested on its

    abilit y to determine w hich of the th ree models did in

    fact genera te the 3000 dat aset s. A good selection

    method sh ould be able to discern correctly t he model

    tha t generat ed the da ta . The ideal outcome is one in

    wh ich each model generalizes best only to dat a

    TRENDS in Cognitive Sciences Vol.6 No.10 October 2002

    http://tics.trends.com

    424 Opinion

    TRENDS in Cognitive Sciences

    Goodness of fit

    Generalizability

    Model complexity

    Overfitting

    Poor

    Good

    Mod

    elfit

    Fig. 1. Relationship between goodness of fit and generalizability as a function of model complexity.

    The y-axis represents any fit index, where a larger value indicates a better fit (e.g. percent variance

    accounted for). The three smaller graphs along the x-axis show how fit improves as complexity

    increases. In the left graph, the model (represneted by the line) is not complex enough to match the

    complexity of the data (dots). The two are well matched in complexity in the middle graph, which is

    why this occurs at the peak of the generalizability function. In the right graph, the model is more

    complex than the data, fitting random error. It has better goodness of fit, but is overfitting the data.

    Table 1. Model recovery performance (percentage of

    correct recoveries) for three models using three

    selection methods

    Selection

    method

    Model

    fitted

    Model the data were generated

    from

    M1

    M2

    M3

    PVAF M1

    0 0 0

    M2

    38 97 30

    M3

    62 3 70

    AIC M1

    79 0 0

    M2

    9 97 30

    M3

    12 3 70

    MDL M1

    86 0 0

    M2

    1 92 8

    M3

    13 8 92

    Models M1, M2and M3were defined as follows: M1: y = (1+t)-a;

    M2: y = (b+t)-a; M3: y = (1+bt)

    -a. In the model equations, a, b and c

    are parameters that were adjusted to fit each model to the data,

    which were generated using the same five points,

    t = 0.1, 2.1, 4.1 6.1, 8.1. Each sample of five observations was

    sampled from the binomially probability distribution of size n = 50.

    One thousand samples were generated from each model and

    served as the data to fit. Each selection method was then used to

    determine which model generated each of the samples.

    The percentage of time each model was chosen for each dataset

    is shown.

  • 8/11/2019 02 Cuando Un Buen Ajuste Puede Ser Malo

    5/5

    Voice your Opinion in TICS

    The pages of Trends in Cognitive Sciencesprovide a unique forum for debate for all cognitive scientists. Our Opinionsectionfeatures articles that present a personal viewpoint on a research topic, and the Letters page welcomes responses to any of the

    articles in previous issues. If you would like to respond to the issues raised in this months Trends in Cognitive Sciencesor,alternatively, if you think there are topics that should be featured in the Opinionsection, please write to:

    The Editor, Trends in Cognitive Sciences84 Theobalds Road, London,

    UK WC1X 8RR,or e-mail: [email protected]

    generated by itself. In the 33 ma trices in Table 1,

    this corresponds t o perfect recovery in th e diagona l

    going from the upper left to the lower right . Errors

    (cells in th e off diagonals) reveal a bias in the selection

    method t owa rd eith er the more or less complex model.

    The top mat rix shows th e results using P VAF, wit h

    th e percenta ge of correct recoveries in ea ch cell. The

    problem with using a GOF measure shows up in the

    first column of the mat rix, where the data were

    genera ted by the one-para meter model M1. M

    1never

    recovered its own dat a, w ith one of the tw o-para meter

    models alwa ys fit t ing the da ta best. Comparison of

    these dat a w ith the middle mat rix shows that using

    AIC r ectifies this problem. Becau se of its sensitivity to

    the number of param eters in a m odel, it does a

    reasona bly good job of distinguishing betw een dat a

    generated by M1

    or M2. By contra st , note how model

    recovery performa nce remains consta nt a cross these

    tw o mat rices in columns 2 a nd 3. This is not

    surprising because AIC, like P VAF, is insensitive t o

    functional form, wh ich is the dimension a long which

    M2 and M3 differ. Only when MDL is used is animprovement in m odel recovery performa nce

    observed in these columns (bottom ma trix).

    Why did MDL not perform perfectly, and w hy did it

    perform slightly worse tha n AIC and P VAF w hen the

    dat a came from M2

    (middle column)? Recall th at , like

    a st at istical test, model selection is an inference

    problem. The qua lity of th e inference depends on th e

    informat ion tha t the selection method uses. Even

    though MDL ma kes use of all of the informa tion

    ava ilable (dat a and the model), this does not

    guara ntee success, but it greatly improve the chan ces

    of the inference being correct [16]. MDL will

    outperform selection met hods such a s AIC an d P VAF

    most of the time, but neither it nor its Ba yesian

    counterpart, BMS , are infa llible.

    The inferentia l na ture of model selection ma kes it

    imperat ive to interpret m odel recovery result s in the

    context of the data tha t w ere used in the test itself.

    Poor performance might not indicate a problem with

    the selection method, but ra ther a constra int on the

    resolving power of the selection m ethod in

    discriminating which model could ha ve generated the

    da ta . Conversely, biases in the selection method itself

    can m asq uera de as good model-recovery

    performance. In a recent comparison of BMS and

    RMSD , we dem onstra ted both of these outcomes [17].

    Conclusion

    Methods like AIC and MDL can give the impression

    tha t model selection can be automat ed and require

    minima l thought. H owever, choosing betw eencompeting m odels is no easier or less subjective th a n

    choosing bet ween competing theories. Selection

    methods should be viewed as a tool tha t researchers

    can use to gain addit ional information about the

    models under considera tion. These tools, however, are

    ignoran t of other properties of the models, such a s

    their plausibility or the qua lity of the data , so it is

    ina ppropria te to decide between them solely on wh at

    is lea rned from a test of generaliza bility.

    TRENDS in Cognitive Sciences Vol.6 No.10 October 2002

    http://tics.trends.com

    425Opinion

    References

    1 J acobs, A.M. and G rainger, J . (1994) Models of

    visual word recognitionsampling the sta te of the

    art . J. Exp. Psychol. H um. Percept. Perform. 29,13111334

    2 Smith, J .D. and Minda, J .P. (2000) Thirty

    categorizat ion results in search of a model. J. Exp.

    Psychol. L earn. M em. Cogn. 26, 327

    3 Wichman, F.A. and Hill, N.J . (2001) The

    psychometric function: I. Fitting, sam pling and

    goodness of fit. Percept. Psychophys. 63, 12931313

    4 Roberts, S. a nd P ashler, H. (2000) How persuasive

    is a good fit? Acomment on th eory testing.

    Psychol . Rev. 107, 358367

    5 Bishop, C.M. (1995)Neural N etwork s for Pattern

    Recognition, (Chapt ers 1 and 9), Oxford

    University P ress

    6 Myun g , I . J . et al . (2000) Specia l Issu e on model

    selection. J. M ath. Psychol. 44, 12

    7 Akaike, H. (1973) Informa tion theory and an

    extension of the ma ximum likelihood principle. InSecond In tern ational Symposium on Inform ati on

    Theory(Petrox, B .N. a nd C aski, F.), pp. 267281,

    Akademiai Kiado, B udapest

    8 Bur ha m, K.P. and Anderson, D.R. (1998)Model

    Selection and I nference. A Practical I nform ation-

    Theoretic Approach (Chapter 2), Springer-Verlag

    9 Schwa rz, G. (1978) Estima ting the dimension of a

    model. Ann. Stat. 6, 461464

    10 Kas s, R.E. and Ra ftery, A.E. (1995) Ba yes factors.

    J. Am. St at. Assoc.90, 773795

    11 Rissanen, J . (1996) Fisher information and

    stochastic complexity. IE EE Tran s. In form. Theor.

    42, 4047

    12 Ha nsen, M.H. a nd Yu, B. (2001) Model selection

    an d the principle of minimum d escription length.

    J. Am. Stat . Assoc. 96, 746774

    13 Myung, I .J . and Pitt , M.A. (in press). Modelevalua tion, testing and selection. In H andbook of

    Cognition(Lambert, K. and G oldstone, R. eds.), Sage

    14 Nosofsky, R.M. (1986) Attent ion, similarity and

    the identification-cat egorization relationship.

    J. Exp. Psychol. Gen. 115, 3957

    15 Oden, G.C. a nd Massa ro, D.W. (1978) Integra tion

    of featura l informat ion in speech perception.

    Psychol. R ev. 85, 172191

    16 Pi t t , M.A. et al . (2002) Towa rd a meth od of

    selecting a mong computationa l models of

    cognition. Psychol . Rev. 109, 472491

    17 Pi t t , M.A. et al . Flexibility versus generalizability

    in model selection. Psychon. Bu ll . Rev. (in press)

    Acknowledgement

    Both authors were

    supported by NIH Grant

    MH57472.