sticigui [statistics 21]

45
8/18/2019 SticiGui [Statistics 21] http://slidepdf.com/reader/full/sticigui-statistics-21 1/45 SticiGui Statistics 21 University of California at Berkeley ©1997–2013. P.B. Stark. All rigts reserve!. 1

Upload: jacob-bains

Post on 06-Jul-2018

228 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 1/45

SticiGui

Statistics 21

University of California at Berkeley

©1997–2013. P.B. Stark. All rigts reserve!.

1

Page 2: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 2/45

Preface

Content" Piloso#y an! $oals

%vervie& of te %nline 'e(t)ook

Prere*+isites'ecnical ,esign Criteria an! -#leentation

 A!vantages of /avaScri#t over stan!ar! Statistical Packages

S+ggestions for val+ating te aterials

 A)o+t te A+tor 

 Ackno&le!gents

2

Page 3: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 3/45

PrefaceContent" Piloso#y" an! $oals

%vervie&

Prere*+isites

,esign criteria an! i#leentation

 A!vantages of /avascri#t over #ro#rietary Statistics Packages

S+ggestions for eval+ating te aterials

 A)o+t te A+tor 

 Ackno&le!gents

Content, Philosophy and Goals

'is te(t &as &ritten for a terinal45

 intro!+ctory class in Statistics s+ita)le st+!ents in B+siness" Co+nications"conoics" Psycology" Social Science" or li)eral arts6 tat is" tis is te first an!last class in Statistics for ost st+!ents &o take it. -t also covers logic an!reasoning at a level s+ita)le for a general e!+cation co+rse. Accor!ingly" te te(t isnot geare! to&ar! teory" n+erical analysis" or so#isticate! for+lae6 neiter!oes it contain a )estiary of tecni*+es or nae! #ro)a)ility !istri)+tions. ater" -o#e to el# st+!ents to tink logically a)o+t *+antitative evi!ence an! to translatereal8&orl! sit+ations into ateatical *+estions6 an! to e(#ose st+!ents to a fe&i#ortant statistical an! #ro)a)ilistic conce#ts an! to soe of te !iffic+lties"s+)ective !ecisions" an! #itfalls" in analy:ing !ata an! aking inferences fron+)ers. 'e te(t !evelo#s #ro)a)ility" estiation" an! inference +sing co+ntingarg+ents; tere is no calc+l+s involve!.

- o#e tat st+!ents &o st+!y fro tese aterials &ill;

< ea! te ne&s#a#er &it ne& eyes; )ecoe skille!" circ+s#ect

cons+ers of *+alitative an! *+antitative inforation.< =no& tat #ro)a)ility in #artic+lar" an! n+)ers in general" can

)e +se! to o!el soe feat+res of te #ysical &orl! an! +an

)eavior.< -#rove teir skills in critical tinking an! logical reasoning.< A##reciate te role Statistics #lays in any fiel!s" fro )+siness

to econoics" la&" #olitics" science an! e!icine.

3

Page 4: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 4/45

< =no& tat !ata can )e ani#+late! to tell any inconsistent

stories" tat !ata analysis is not clear c+t" an! tat any s+)ective

 +!gents are involve! in analy:ing real !ata.< =no& i#ortant *+estions to ask &en face! &it a *+antitative

arg+ent>)e a)le to analy:e arg+ents an! fin! teir strengts an!

&eaknesses.< Un!erstan! tat +nt+tore! int+ition ten!s to #ro!+ce fa+lty

#ro)a)ility +!gents an! kno& o& to reason a)o+t #ro)a)ility.< A##reciate soe of te #iloso#ical !iffic+lties in ascri)ing

eaning to #ro)a)ility an! in inferring ca+sal relationsi#s fro !ata.< Be #re#are! for ore a!vance! co+rses in Statistics>even

to+g tey igt not take any.

'e te(t starts &it reasoning an! fallacies" &ic is #era#s a )it +n+s+al for a

Statistics te(t)ook>)+t logical reasoning is key to )ot teoretical an! e#irical

&ork. 'e te(t goes f+rter &it co+nting arg+ents an! co)inatorics tan osteleentary te(t)ooks !o6 it also goes f+rter &it logic an! &it !ata analysis. 'e

tools incor#orate! into te aterials ena)le st+!ents to analy:e real !atasets ?te

largest as 913 o)servations of @ varia)les &ito+t te #e!agogical overea! of

teacing st+!ents to +se a #ro#rietary statistics #ackage. St+!ents also re#ro!+ce

n+erical e(#erients tat !eonstrate key conce#ts" s+c as sa#ling

!istri)+tions" confi!ence intervals" an! te a& of arge +)ers. Using /avaScri#t

)ase! tools also eliinates te nee! to teac st+!ents to rea! arcane ta)les

associate! &it !ifferent !istri)+tions6 instea!" st+!ents ty#e te relevant

#araeters into te(t)o(es" igligt a range of val+es" an! rea! off te #ro)a)ility.

- ave trie! to e#asi:e to#ics tat can )e ta+gt ost effectively &it tis sort of

interactive online tool. - ave so+gt to #rovi!e eno+g variety in te aterial tat

instr+ctors can #ick an! coose fro aong te ca#ters to fin! ateriala##ro#riate to te level at &ic tey !esire to teac. 'e ost tecnical aterial is

in footnotes an! si!e)ars" so tat it !oes not interr+#t te flo&. any of te

e(a#les an! !atasets for e(ercises are real>tey arose in y cons+lting &ork" in

e(#erients - a failiar &it" or tey are in te #+)lic !oain ?for e(a#le" !ata

on $A' scores" +n!ergra!+ate $PA" an! BA $PA.

any of te inference #ro)les are real" too. Dor e(a#le" te =assel ,o&sing

(#erient is a real test of te a)ility of !o&sers to !eterine &eter &ater is

r+nning in a )+rie! #i#e6 te !erivation of DiserEs e(act test is in te conte(t of

!eterining &eter targete! Fe) a!vertising &orks" a #ro)le - ave st+!ie! for a

cons+lting client6 te case st+!ies a)o+t e#loyent !iscriination an! teft of

tra!e secrets !erive fro y &ork as an e(#ert &itness.

- ave trie! to otivate any of te co#+tations )y inference #ro)les. Pro)a)ility"

y#otesis testing" ran!oi:ation" an! sa#ling error" are &oven into te !isc+ssion

of e(#erients an! sa#le s+rveys. Dor soe intro!+ctory co+rses" te #ro)a)ility

in tose sections &ill s+ffice. Dor instr+ctors &o !esire a ore *+antitative te(t"

tere are a!!itional ca#ters on #ro)a)ility !istri)+tions" !iscrete ran!o varia)les"

an! e(#ectation.

'e )ook !oes not !isc+ss contin+o+s !istri)+tions; 'e noral c+rve" St+!entEs t8

c+rve" an! te ci8s*+are c+rve a##ear as a##ro(iations to te #ro)a)ility

G

Page 5: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 5/45

Page 6: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 6/45

< -t is easy for st+!ents to +se; 'e te(t is accessi)le tro+g

stan!ar! &e) )ro&sers. 'ere is none of te start8+# cost associate!

&it learning to +se a #ro#rietary statistical soft&are #ackage.< -t is easy for instr+ctors to +se; Assignent !+e !ates an!

enrollent lists are controlle! over te Fe) +sing a )ro&ser. -nstr+ctors

!o not ave to &rite *+i::es" collect *+i::es" recor! *+i: gra!es" orret+rn *+i::es to st+!ents. oreover" instr+ctors !o not ave to teac

st+!ents to +se s#eciali:e! soft&are.

'e soft&are e#o&ers st+!ents to re#ro!+ce n+erical e(#erients teselves"

&ito+t aving to learn a statistical lang+age ?+sing instea! a stan!ar! Fe)

)ro&ser" &ic enco+rages e(#loration an! in*+iry8)ase! learning. 'e te(t +ses

te #o&er of te -nternet in any &ays" incl+!ing te follo&ing;

< inks to a glossary of ters.

< ,ynaic e(a#les an! self8test e(ercises tat cange every tiea st+!ent visits a ca#ter. Soe self8test e(ercises #arse st+!ent in#+t

to !eterine &eter st+!ent for+lae are correct>a +c ore

so#isticate! notion of correctness tan +lti#le coice or n+erical

res#onses.< online" acine8gra!e! assignents" constr+cte! so tat eac

st+!ent gets a !ifferent version of te assignent. $ra!es are #oste!

a+toatically to te class &e)site" an! sol+tions are availa)le online

after te !+e !ate.< eference an! #ractice aterials to #re#are for e(as.

Prerequisites

'ese aterials !o not ass+e tat te rea!er as any #revio+s kno&le!ge of

statistics or #ro)a)ility. Lo&ever" te rea!er nee!s to )e coforta)le &it

#ercentages" e(#onentiation an! s*+are roots" an! scientific notation ?n+)ers

ties #o&ers of ten. Assignent 0 is a revie& an! *+i: covering te #rere*+isite

aterial. 'e +ltiate calc+lations are all si#le" )+t te logical reasoning nee!e! to

re!+ce te #ro)les to tose si#le calc+lations are soeties s+)tle.

Soe of te footnotes an! si!e)ars rely on eleentary calc+l+s to fin! stationary

#oints of conve(" contin+o+sly !ifferentia)le f+nctions. Dor e(a#le" te ean is

caracteri:e! as te n+)er fro &ic te rs of te resi!+als is sallest" an! te

regression line is caracteri:e! as te least8s*+ares line. 'ose !erivations can )e

ski##e! &it i#+nity.

Technical Design Criteria and !ple!entation

'ese aterials are co#rise! of ML'" CSS" an! /avaScri#t. As of 19 /+ne

2011" tey consiste! of 217 ML' files containing over 13N"000 lines of ML'

an! /avaScri#t" N3 /ava classes containing a)o+t 1N"000 lines of co!e" 27

N

Page 7: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 7/45

/avaScri#t li)raries containing a)o+t 1G"900 lines of co!e" 3G !ata files containing

a)o+t @"N00 recor!s" fo+r casca!ing style seets &it a)o+t 2"200 lines" an! a

an!f+l of .#g an! .gif files. 'e coice to +se ML'" CSS" /ava an! /avaScri#t

&as otivate! )y tese !esign criteria;

1. a(ii:e accessi)ility an! #orta)ility. ecent )ro&sers allo& tis

aterial to )e accesse! fro alost any&ere in te &orl!" &ito+ta!!ing #l+g8ins to te )ro&ser" an! &ito+t )+ying any #ro#rietary

soft&are. -f a #l+g8in &ere re*+ire!" !o&nloa!ing te #l+g8in &o+l!

itself #resent a consi!era)le )arrier to soe st+!ents. 'e soft&are r+ns

+n!er every aor o#erating syste ?+ni(" lin+(" Fin!o&s 9( an! '"

ac %S" )eca+se o:illa" %#era an!Oor icrosoft ave versions of teir 

)ro&sers tat r+n on tose o#erating systes. Bro&sers coe installe!

on all ne& #ersonal co#+ters" so tis aterial is ie!iately

accessi)le to ne& co#+ter o&ners. - ave a!e a consi!era)le effort

to ake te aterials f+nction &ell &it screen rea!er soft&are for

vis+ally i#aire! st+!ents>)+t tere is ore &ork to )e !one.Converting te ateatics to at +sing at/a( is +n!er&ay.

2. a(ii:e interactivity an! inii:e tecnological )arriers.

St+!ents so+l! )e a)le to e(#lore !ata an! to ask an! ans&er &at8if

*+estions" &ito+t nee!ing to learn o& to +se a conventional statistical

soft&are #ackage. 'ools so+l! ave a #oint8an!8click interface &ose

+se &as fairly o)vio+s 88 no i!!en en+s" consistent $U-" etc ..3. inii:e )an!&i!t an! a(ii:e s#ee!. Using /avaScri#t

allo&s te fig+res an! #lots to )e generate! on te client8si!e. 'e co!e

an! !ata !o&nloa! to te client" ten te client co#+tes an! creates

te fig+res. 'is is )y far te ost efficient &ay to get !ynaicinteraction &it te !ata. %ter&ise" every tie te +ser cange! a

#araeter val+e" te client &o+l! nee! to sen! a essage to te server"

an! te server &o+l! ave to co#+te te ne& fig+re" an! sen! te

res+lting fig+re over te -nternet to te client. -nteractive real8tie !ata

e(#loration &o+l! not )e #ossi)le. 'ere are a sall n+)er of fig+res

tat are store! as $-D or /P$ files6 alost all te fig+res are co#+te!

)y te client. Sen!ing +st te !ata an! te r+les ?#rogras for

generating fig+res fro te !ata s+)stantially re!+ces te tie it takes

#ages to loa!. any of te fig+res are #+re CSS" &ic not only is very

ligt&eigt" )+t allo&s te fig+res to re8si:e elegantly if te +ser canges

te !iensions of te #age" an! !is#lays &ell even on o)ile !evices.G. ake it easy to +se te aterials in lect+res. Beca+se te

soft&are is free8stan!ing ?it !oes not nee! a server for co#+tations" it

is easy to !is#lay te content in te classroo &ito+t an -nternet

connection. 'at allo&s te instr+ctor to !eonstrate conce#ts an! te

+se of te aterials in class.

7

Page 8: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 8/45

@. ake it easy for instr+ctors to set !+e !ates for assignents an!

anage a co+rse" an! for st+!ents to track teir o&n #rogress. ,+e

!ates are controlle! )y te instr+ctor over te -nternet6 siilarly" te

instr+ctor can o!ify gra!es" co#+te co+rse scores" enter e(tra cre!it

assignents" etc ." +sing a )ro&ser. Perl8cgi ro+tines +#!ate te

!ata)ase &en a st+!ent s+)its an assignent" an! allo& st+!ents

an! instr+ctors to *+ery te !ata)ase for gra!es over te -nternet. As an

alternative to #erl cgi" assignent !+e !ates can )e controlle! )y

o!ifying a si#le ascii te(t file tat is *+erie! )y an A/AM call" an!

assignent gra!es can )e re!irecte! to an eail a!!ress instea! of a

!ata)ase.N. a(ii:e #orta)ility on te server si!e. ater tan +se

#ro#rietary sol+tions" s+c as icrosoft active server #ages ?.as# #ages

or server8si!e /avaScri#t" te )ack en! is a collection of #erl cgi

ro+tines tat access a stan!ar! !ata)ase file. 'e server only nee!s to

s+##ort #erl scri#ts an! serve static ML'" /avaScri#t" te(t" an! CSS.

'is akes te #ackage #latfor8in!e#en!ent on te server si!e as &ell

as te client si!e; tere is #+)lic8!oain soft&are allo&ing #erl cgi to r+n

&it every coon &e) server" an! tere are #+)lic8!oain #erl

i#leentations for all #o#+lar o#erating systes. -nstalling te

aterials on a ne& server is si#le.Using ML' &it /avaScri#t an! CSS allo&e! e to ake te content !ynaic;

any of te e(a#les an! e(ercises in te te(t cange &enever te #age is

reloa!e!" so st+!ents can get +nliite! #ractice at certain kin!s of #ro)les.

Siilarly" eac st+!ent gets a !ifferent version of eac assignent an! e(a" )+t

can see te sol+tions to isOer version after te !+e !ate.

"dvantages of #ava$cript over standard $tatistical Packages

'ere are a n+)er of a!vantages to +sing /avaScri#t rater tan an integrate!

statistical #ackage;

1. 'e aterial can )e accesse! fro any co#+ter &it an -nternet

connection an! a &e) )ro&ser. 'e co#+ter !oes not nee! to ave

any #ro#rietary soft&are installe!. St+!ents terefore can access te

aterial fro +niversity an! #+)lic li)raries" -nternet cafes" oe" etc .

St+!ents ave even s+)itte! oe&ork +sing Fe)'H.2. Soe of te !eonstrations &o+l! )e e(treely !iffic+lt" if not

i#ossi)le" to co!e in a stan!ar! statistical #ackage. Dor e(a#le" see

te Henn ,iagra tool.3. -f a stan!ar! #ackage &ere +se!" te

fig+resO!eonstrationsOcalc+lations co+l! not )e e)e!!e! in te te(t

Page 9: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 9/45

an! te assignents. 'e st+!ent &o+l! ave to navigate aong

#rogras to see !eonstrations or solve #ro)les.G. 'e intellect+al start8+# cost to te st+!ent is lo&er tan it &o+l!

)e for a general8#+r#ose #ackage. ac tool ill+strates a single conce#t"

all te controls are visi)le" an! te interface is as int+itive as - ave )een

a)le to ake it. 'e st+!ent !oes not nee! to learn +c to get starte!.@. - fin! it #refera)le #e!agogically to +se tools &it a single

f+nction" &it all te controls visi)le.N. 'e onetary cost to te st+!ent is inii:e!.

$uggestions for %valuating the &aterials

- &o+l! recoen! tat instr+ctors &o &is to eval+ate tese aterials for

#ossi)le a!o#tion look first at C'"PT%( ), &*+T"("T% D"T" "-D 

$C"TT%(P+OT$ " C'"PT%( ., CO((%+"TO- "-D "$$OC"TO- " an!

C'"PT%( /, (%G(%$$O- . 'ose ca#ters ill+strate several as#ects of te te(t;!ynaic e(ercises" te +se of real !ata in e(a#les an! e(ercises" te istogra

an! scatter#lot tools" an! te gra!+al intro!+ction of ne& f+nctionality ?)+ttons an!

!is#laye! statistics into te tools as st+!ents learn ne& conce#ts. Dor e(a#le"

&en te scatter#lot tool arrives in C'"PT%( ), &*+T"("T% D"T" "-D 

$C"TT%(P+OT$ " its only controls cange te varia)les #lotte!" list te !ata" so&

+nivariate statistics of te varia)les in te !ataset ?s+ary statistics covere! in te

first t&o ca#ters" an! !is#lay te coor!inates of te c+rsor. ?Selecting a ro& in te

!ata listing igligts te corres#on!ing #oint in te scatter#lot. -n C'"PT%( .,

CO((%+"TO- "-D "$$OC"TO- " te scatter#lot tool ac*+ires te correlation

coefficient" an! a )+tton to so& gra#ically te stan!ar! !eviations of te t&o

varia)les #lotte!6 it is also invoke! to !is#lay ran!oly generate! !ata tat attain a

given val+e of te correlation coefficient. -t also starts to allo& st+!ents to a!! #oints

)y clicking on te #lot" to see te effect of a!!itional !ata on te correlation

coefficient. -n C'"PT%( /, (%G(%$$O- " te sae tool gains )+ttons to so& te

gra# of averages" te S, line" an! te regression line.

 After tose ca#ters" - &o+l! recoen! looking at te collection of interactive tools

to see o& vario+s conce#ts are #resente! gra#ically6 in #artic+lar" )e s+re to see

te tools for Henn !iagras" sa#ling !istri)+tions" confi!ence intervals" an! te

a& of arge +)ers. 'o see o& ta)les of #ro)a)ilities are eliinate!" see te

tools for te oral ,istri)+tion" St+!entEs t8,istri)+tion" an! te Ci8s*+are

,istri)+tion. - &o+l! recoen! ten looking at C'"PT%( 01, T'% 2+%T3$ 

&"4% " D%"+2 5&O-T6 '"++7 P(O8+%& " C'"PT%( 0/, P(O8"8+T6 

&%%T$ D"T" " C'"PT%( 91, DO%$ T(%"T&%-T '"% "- %::%CT;" an!

C'"PT%( 9/, T%$T-G %<*"+T6 O: T=O P%(C%-T"G%$ . -nstr+ctors &it

an interest in logic or &o teac general e!+cation co+rses igt enoy 'e first of

tose as e(ercises tat #arse logical e(#ressions te st+!ents ty#e in.

9

Page 10: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 10/45

"bout the "uthor 

Pili# B. Stark is Professor an! Cair of Statistics at te University of California"

Berkeley" &ere e as )een on te fac+lty since 19. Le receive! is )acelorEs

!egree in Piloso#y fro Princeton University in 190" an! is P, in art

Science fro te Scri##s -nstit+tion of %ceanogra#y in 19N. Le receive! a

ational Science Do+n!ation Post!octoral Dello&si# in ateatical Sciences in197 an! te Presi!ential Io+ng -nvestigator A&ar! in 199. Le &as electe! a

Dello& of te -nstit+te of Pysics in 1999. Pili# !ro##e! o+t of ig scool an! la&

scool. Le as serve! on te e!itorial )oar!s of o+rnals in a##lie! ateatics"

geo#ysics" an! statistics" an! as given over 130 invite! lect+res at conferences

an! +niversities in 17 co+ntries. Le is te a+tor or co8a+tor of over 100

#+)lications. Pili# as !one researc in astro#ysics" icro&ave cosology"

eart*+ake #re!iction" geoagnetis" geoceistry" seisic toogra#y" signal

recovery" constraine! confi!ence estiation" #ro)a)ility !ensity estiation"

s#ectr+ estiation" inforation retrieval" inverse #ro)les" election a+!iting"

a!+sting te U.S. Cens+s" ca+sal inference" an! +an earing.

Le s#eciali:es in #ro)les &it very large !atasets6 soft&are &ritten )y i an! is

st+!ents #erfors #art of te ro+tine !ata re!+ction for a geoagnetic satellite an!a net&ork of solar telesco#es. Pili# as cons+lte! in -C ask an+fact+ring" oil

e(#loration" &ater treatent" #re!icting e8ail s#ool fill" electrical activity of te

)rain" an! targete! -nternet a!vertising. Le as serve! as an e(#ert &itness in

litigation an! legislation on to#ics ranging fro nat+ral reso+rces to agric+lt+ral

i#ort restrictions" !isaster relief" fairness in len!ing" te U.S. Cens+s" te Cil!

%nline Protection Act ?sa#ling te -nternet an! testing content filters" &ic

involve! te controversial s+)#oena of searc recor!s an! in!e(e! &e)#ages fro

$oogle" IaooQ an! S" cons+er #rotection" conteste! elections" e#loyent

!iscriination" ins+rance" #ro!+ct lia)ility" #ro#erty ta( assessent" tr+t in

a!vertising" arketing" e*+al #rotection" tra!e secrets" intellect+al #ro#erty" risk

assessent" &age an! o+r !is#+tes" an! anti8tr+st.Le as testifie! to te U.S. Lo+se of e#resentatives S+)coittee on te Cens+s"

to te California State Senate" te California State Asse)ly" an! te California

,e#artent of Dis an! $ae. Le as cons+lte! for te U.S. ,e#artent of /+stice"

te U.S. ,e#artent of Agric+lt+re" te U.S. Cens+s B+rea+" te U.S. ,e#artent of 

Lo+sing an! Ur)an ,evelo#ent" te U.S. AttorneyEs %ffice of te ortern ,istrict

of California" te U.S. ,e#artent of Heterans Affairs" te De!eral 'ra!e

Coission" te os Angeles Co+nty S+#erior Co+rt" te ational Solar

%)servatory" te California Secretary of State" te Colora!o Secretary of State"

#+)lic +tilities" aor cor#orations" an! n+ero+s la& firs" incl+!ing a)o+t alf of

te 2@ largest. Le &on te CancellorEs A&ar! for P+)lic Service for esearc in te

P+)lic -nterest in 2011 for is &ork on election a+!iting" an! e is c+rrently &orking

&it te Secretary of State of California an! te Secretary of State of Colora!o toi#leent risk8liiting election a+!its.

Pili# &as te Dac+lty Assistant for !+cational 'ecnology at 'e University of

California" Berkeley" fro 2001–2003 an! caire! te U.C. Berkeley !+cational

'ecnology Coittee fro 2001–200@. Le ta+gt UC BerkeleyEs first official online

co+rse" in 2007" an! !evelo#e! one of te first co+rses to )e offere! tro+g UC

%nline !+cation" an! co8!evelo#e! ?&it Ani A!ikari te first intro!+ctory

statistics co+rse offere! tro+g e!M.

10

Page 11: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 11/45

Pili# !oes not like to )e calle! Pil. Le likes o#en8so+rce soft&are" !islikes eail

attacents" r+ns 100 ile en!+rance trail races" roasts is o&n coffee" an! tinks

tis )ook is #roof tat o)sessive8co#+lsive !isor!er is a o) *+alification. Pili#

lives in Berkeley" California" &it a la#to#" an iPone" a Bacci s#resso" Porle( an!

Rassena+s an! ills" +!!y oea!e +araces" an! an e)arrassing

n+)er of , flasligts.

"cknowledg!ents

'is #roect &o+l! not ave )een #ossi)le &ito+t %fer ict" &o gave co#etent"

intelligent" an! congenial ans&ers to y ten8noo)ie *+estions a)o+t /ava an!

/avaScri#t" #ointe! e to lots of +sef+l aterial" an! &o &rote te original server8

si!e Perl cgi scri#ts for gra!ing oe&ork an! *+erying te gra!e !ata)ase.

,+ncan 'e#le ang &as also a el#f+l an! sy#atetic reso+rce regar!ing te

intricacies an! n+isances of /ava 1.06 in a!!ition" e is #riarily res#onsi)le for te

+lti8trea!e! !ata server +se! to loa! large !ata sets ?oe8gro&n tecnology

tat antici#ates Aa(. - a gratef+l to +!y $+erra" &o co8&rote an earlier version

of te ca#ter on co+nting an! te assignent on e(#erients. y frien! an!entor ,avi! A. Dree!an" an! te e(cellent !ea!8tree )ook Statistics )y

Dree!an" Pisani" an! P+rves" &ere ins#irational. Aviva Sielan &orke! all  te

e(ercises an! &as instr+ental in strealining te #rose an! vis+al style. ,eir!re

ync a!e several val+a)le s+ggestions regar!ing te +ser interface" an! Sy!ney

/ones &as e(treely el#f+l in i!entifying #ro)les &it flo&" organi:ation"

consistency" an! #rose. Dac+lty &o ta+gt fro tese aterials an! a!e val+a)le

s+ggestions incl+!e Ani A!ikari" . /ay Citron" Ie+!a =lein" ark in!ean" Aviva

Sielan an! argaret Sit. - a gratef+l for teir el# an! teir co+rage to

e(#erient &it a ne& e!i+ an! ne& o!e of teacing. any st+!ents ave

fo+n! ty#os over te years; 'ank yo+Q

'is )ook is !e!icate! to Alessan!ra an! aoi an! to ,avi! Dree!an.

11

Page 12: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 12/45

< U T L% T 'M' ' AB %D C%''S T %- C'US 

T  ASS-$'S T C ACUA'%

T '%%S  ,%S V B-%-A L-S'%$A V C ACUA'% V CL-8SKUA ,-S'-BU'-% V C%'%-$ D% HA-ABS V C%D-,C -'HAS V C%A'-%  A, $SS-% V L-S'%$A V  AF %D  A$ UBS 

V %A APP%M-A'-% '% , A'A V %A CUH V %A P%BAB--'-S V P%BAB--'I C ACUA'% V S AP-$ ,-S'-BU'-%S V SCA''P%'S V S'U,'ES ' ,-S'-BU'-% V H ,-A$A ?2 SUBS'SV H ,-A$A ?3 SUBS'S

T H-F T $%SSAI 

T B-B-%$APLI T SIS' KU-'S T  AU'L%ES L%PA$ 

ntroduction 

'ow to use these !aterialsIo+ are #ro)a)ly +sing eiter Direfo(" $oogle Croe" -nternet

(#lorer " Safari or %#era to vie& tese aterials. 'ose are te ost

#o#+lar &e) )ro&sers. 'o +se all te feat+res of Stici$+i©" yo+ nee!

an +#8to8!ate )ro&ser tat s+##orts fraes" casca!ing style seets

?CSS" L' G.01" an! /avaScri#t 1.. /avaScri#t +st )e ena)le! in

12

Page 13: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 13/45

te )ro&ser6 for assignents to &ork" te )ro&ser +st also acce#t

cookies fro te originating server. Dor a variety of reasons" - strongly

recoen! tat yo+ +se Direfo( an! tat yo+ not +se -nternet

(#lorer. ecent versions of Croe" Safari an! %#era also &ork>

ostly. 'e aterials ave )een teste! ost toro+gly &it Direfo(.

 A link  is a #iece of te(t yo+ can click to see anoter !oc+ent. Io+

#lace te c+rsor over te link" ten #+s te o+se )+tton ?left o+se

)+tton on a PC co#ati)le co#+ter to follo& te link. inks in

tese aterials are generally in )l+e ty#e.

Clicking a link can eiter re#lace te !oc+ent in one of te fraes

yo+ are vie&ing ?not al&ays te frae tat as te link" or o#en a

ne& &in!o& tat !is#lays te ne& !oc+ent. -f yo+ ave not cange!

te !efa+lt settings in yo+r )ro&ser" links to te glossary &ill )e in

green ty#e6 tose take yo+ to te rigt #lace in te glossary in te

)otto frae" so tat yo+ !onEt lose yo+r #lace in te )ook &en yo+look +# a ter. inks to oter aterials are in )l+e ty#e6 tose

ty#ically re#lace te contents of te frae yo+ are rea!ing.

Io+ so+l! failiari:e yo+rself &it o& yo+r )ro&ser &orks to learn

to navigate aong &in!o&s. %n te rigt si!e of te )ro&ser &in!o&"

yo+ so+l! see a scroll )ar. -f tere is a scroll)ar" tat eans tere is

ore to see>!rag te sli!er !o&n to see ore of te te(t. %n soe

co#+ters" tere is a #age !o&n )+tton yo+ can click to see te ne(t

screenf+l of te(t.

-n te assignents" te navigation )+ttons are s+##resse! to leave

ore roo on te screen for te(t an! gra#ics. Io+ can still get a#o#8+# en+ tat &ill allo& yo+ to go )ack to te #revio+s

!oc+ent. Lo& yo+ get te en+ !e#en!s on te )ro&ser an! te

o#erating syste. -n icrosoft Fin!o&s" yo+ get te #o#8+# en+ )y

rigt8clicking in te o#en )ro&ser &in!o&. ost )ro&sers ave te

a)ility to searc &itin a !oc+ent to fin! a &or! or #rase &itin a

#age. Io+ igt fin! tat feat+re +sef+l to searc for a &or! in te

glossary or a ca#ter of te te(t.

'ere are gra#ical !ata analysis an! vis+ali:ation tools tro+go+t

te te(t an! assignents. very ca#ter as e(ercises to ceck yo+r

+n!erstan!ing. - strongly recoen! tat yo+ !o all of te. ost of

te e(ercises call for an ans&er in a )o(. After yo+ ty#e in yo+r

ans&er" strike te ret+rn or enter key. 'e sy)ol ne(t to te

*+estion &ill cange fro a *+estion ark eiter to a green

13

Page 14: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 14/45

ceck ?if yo+r ans&er is rigt or to a re! M ?if yo+r

ans&er is &rong. +lti#le8coice *+estions a+toatically so& te

green ceck or re! M &en yo+ select an ans&er. +lti#le8+lti#le8

coice *+estions ?select all tat a##ly are follo&e! )y a )+tton yo+can click to ceck yo+r ans&er. Clicking te ark after te *+estion

?te *+estion ark" ceck" or M &ill e(#an! a )o( containing te

correct ans&er" if yo+ ave alrea!y atte#te! te #ro)le. Io+ can

ans&er eac *+estion as any ties as yo+ like" or see te correct

ans&er. ore !etaile! sol+tions to soe of te e(ercises are availa)le

too6 tere is a link to te !etaile! ans&ers after tose *+estions. Io+r

ans&ers to te e(ercises in te te(t are not recor!e!" an! te

e(ercises !o not contri)+te to yo+r gra!e. any of te #ro)les are

generate! ran!oly>reloa!ing or revisiting te #age &ill give yo+ ane& set of #ro)les" so yo+ can get +nliite! #ractice.

ost of te ca#ters ave a corres#on!ing assignent covering te

aterial in te ca#ter. 'e assignents are gra!e! )y a co#+ter"

an! ay contri)+te to yo+r gra!e ?yo+r instr+ctor &ill tell yo+. Be s+re

to click te )+tton la)ele! S+)it for $ra!ing after yo+ ans&er te

*+estions in eac assignent. After te !+e !ate of te #ro)le set"

yo+ can see yo+r score )y filling o+t a for. Io+ can also see te

sol+tions after te !+e !ate )y ret+rning to te assignent. Dor

!etaile! instr+ctions a)o+t te assignents" incl+!ing )ro&ser8relate!

iss+es" see te oe&ork oe#age. 

/+# to ca#ter;

W Preface W -ntro!+ction W 2 W 3 W G W @ W N W 7 W  W 9 W 10 W 11 W 12 W 13 W 1G W

1@ W 1N W 17 W 1 W 19 W 20 W 21 W 22 W 23 W 2G W 2@ W 2N W 27 W 2 W 29 W 30 W

31 W 32 W 33 W©1997–2013. P.B. Stark. All rigts reserve!.

ast generate! 2O2NO201G N;2;@7 P. Content last o!ifie! 22 /+ne 2013 1@;37

P,'.

1G

Page 15: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 15/45

< U T L% T 'M' ' AB %D C%''S T %- C'US 

T  ASS-$'S T C ACUA'%

T '%%S  ,%S V B-%-A L-S'%$A V C ACUA'% V CL-8SKUA ,-S'-BU'-% V C%'%-$ D% HA-ABS V C%D-,C -'HAS V C%A'-%  A, $SS-% V L-S'%$A V  AF %D  A$ UBS 

V %A APP%M-A'-% '% , A'A V %A CUH V %A P%BAB--'-S V P%BAB--'I C ACUA'% V S AP-$ ,-S'-BU'-%S V SCA''P%'S V S'U,'ES ' ,-S'-BU'-% V H ,-A$A ?2 SUBS'SV H ,-A$A ?3 SUBS'S

T H-F T $%SSAI 

T B-B-%$APLI T SIS' KU-'S T  AU'L%ES L%PA$ 

ntroduction 

'ow to use these !aterialsIo+ are #ro)a)ly +sing eiter Direfo(" $oogle Croe" -nternet

(#lorer " Safari or %#era to vie& tese aterials. 'ose are te ost

#o#+lar &e) )ro&sers. 'o +se all te feat+res of Stici$+i©" yo+ nee!

an +#8to8!ate )ro&ser tat s+##orts fraes" casca!ing style seets

?CSS" L' G.01" an! /avaScri#t 1.. /avaScri#t +st )e ena)le! in

1@

Page 16: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 16/45

te )ro&ser6 for assignents to &ork" te )ro&ser +st also acce#t

cookies fro te originating server. Dor a variety of reasons" - strongly

recoen! tat yo+ +se Direfo( an! tat yo+ not +se -nternet

(#lorer. ecent versions of Croe" Safari an! %#era also &ork>

ostly. 'e aterials ave )een teste! ost toro+gly &it Direfo(.

 A link  is a #iece of te(t yo+ can click to see anoter !oc+ent. Io+

#lace te c+rsor over te link" ten #+s te o+se )+tton ?left o+se

)+tton on a PC co#ati)le co#+ter to follo& te link. inks in

tese aterials are generally in )l+e ty#e.

Clicking a link can eiter re#lace te !oc+ent in one of te fraes

yo+ are vie&ing ?not al&ays te frae tat as te link" or o#en a

ne& &in!o& tat !is#lays te ne& !oc+ent. -f yo+ ave not cange!

te !efa+lt settings in yo+r )ro&ser" links to te glossary &ill )e in

green ty#e6 tose take yo+ to te rigt #lace in te glossary in te

)otto frae" so tat yo+ !onEt lose yo+r #lace in te )ook &en yo+look +# a ter. inks to oter aterials are in )l+e ty#e6 tose

ty#ically re#lace te contents of te frae yo+ are rea!ing.

Io+ so+l! failiari:e yo+rself &it o& yo+r )ro&ser &orks to learn

to navigate aong &in!o&s. %n te rigt si!e of te )ro&ser &in!o&"

yo+ so+l! see a scroll )ar. -f tere is a scroll)ar" tat eans tere is

ore to see>!rag te sli!er !o&n to see ore of te te(t. %n soe

co#+ters" tere is a #age !o&n )+tton yo+ can click to see te ne(t

screenf+l of te(t.

-n te assignents" te navigation )+ttons are s+##resse! to leave

ore roo on te screen for te(t an! gra#ics. Io+ can still get a#o#8+# en+ tat &ill allo& yo+ to go )ack to te #revio+s

!oc+ent. Lo& yo+ get te en+ !e#en!s on te )ro&ser an! te

o#erating syste. -n icrosoft Fin!o&s" yo+ get te #o#8+# en+ )y

rigt8clicking in te o#en )ro&ser &in!o&. ost )ro&sers ave te

a)ility to searc &itin a !oc+ent to fin! a &or! or #rase &itin a

#age. Io+ igt fin! tat feat+re +sef+l to searc for a &or! in te

glossary or a ca#ter of te te(t.

'ere are gra#ical !ata analysis an! vis+ali:ation tools tro+go+t

te te(t an! assignents. very ca#ter as e(ercises to ceck yo+r

+n!erstan!ing. - strongly recoen! tat yo+ !o all of te. ost of

te e(ercises call for an ans&er in a )o(. After yo+ ty#e in yo+r

ans&er" strike te ret+rn or enter key. 'e sy)ol ne(t to te

*+estion &ill cange fro a *+estion ark eiter to a green

1N

Page 17: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 17/45

Page 18: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 18/45

< U T L% T 'M' ' AB %D C%''S T %- C'US 

T  ASS-$'S T C ACUA'%

T '%%S  ,%S V B-%-A L-S'%$A V C ACUA'% V CL-8SKUA ,-S'-BU'-% V C%'%-$ D% HA-ABS V C%D-,C -'HAS V C%A'-%  A, $SS-% V L-S'%$A V  AF %D  A$ UBS 

V %A APP%M-A'-% '% , A'A V %A CUH V %A P%BAB--'-S V P%BAB--'I C ACUA'% V S AP-$ ,-S'-BU'-%S V SCA''P%'S V S'U,'ES ' ,-S'-BU'-% V H ,-A$A ?2 SUBS'SV H ,-A$A ?3 SUBS'S

T H-F T $%SSAI 

T B-B-%$APLI T SIS' KU-'S T  AU'L%ES L%PA$ 

Chapter >

$tatisticsStatistics is te science of !ra&ing concl+sions fro !ata. 'is

ca#ter intro!+ces a ro+g ta(onoy of !ata" as &ell as tools for

#resenting" s+ari:ing" an! !is#laying !ata; ta)les" fre*+ency

ta)les" istogras" an! #ercentiles. 'e tools are ill+strate! +sing

!atasets fro tra!e secret litigation an! geo#ysics.

1

Page 19: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 19/45

Data-n its )roa!est sense" Statistics is te science of !ra&ing concl+sions

a)o+t te &orl! fro !ata. ,ata are o)servations ?eas+reents of

soe *+antity or *+ality of soeting in te &orl!. ,ata is a #l+ral

no+n6 te sing+lar for is !at+. %+r lives are fille! &it !ata; te&eater" &eigts" #rices" o+r state of ealt" e(a gra!es" )ank

)alances" election res+lts" an! so on. ,ata coe in any fors" ost

of &ic are n+)ers" or can )e translate! into n+)ers for analysis.

-n tis ca#ter" &e &ill see several ty#es of !ata" an! tools for

s+ari:ing !ata.

'ere are several i#ortant *+estions to kee# in in! &en yo+

eval+ate *+antitative evi!ence;

< Are te !ata relevant to te *+estion aske!J< Fas te !ata collection fair" or igt tere ave )een soe

conscio+s or +nconscio+s 8"$  tat infl+ence! te res+lts or a!e soecases less likely to )e o)serve!J

< ,o te !ata ake senseJ

'e ans&ers to tese *+estions are cr+cial to !ra&ing concl+sions

fro !ata.

%xa!ple >?0@ Data of littlerelevanceA

'ri!entX s+garless g+ +se! toa!vertise tat G o+t of @ !entistss+rveye! recoen! 'ri!entXs+garless g+ for teir #atients &oce& g+.

S+c a s+rvey says little a)o+t&eter 'ri!entX g+ is )etter foryo+r teet tan oter g+" &it or

&ito+t s+gar. 45

 -t &o+l! )e ore relevant to st+!y teeffect on teet of ce&ing !ifferentkin!s of g+" not te o#inions of!entists &o igt not ave

19

Page 20: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 20/45

con!+cte! ?or even rea! anye#irical researc on te effects of!ifferent kin!s of g+.

%xa!ple >?9@ Data with

inadvertent biasA

Co+rse eval+ation fors often askst+!ents *+estions a)o+t teeffectiveness of te instr+ctor. At UCBerkeley" any st+!ents are a)sentfro class &en eval+ation fors are#asse! o+t an! collecte!. -f st+!ents&o !o not fin! lect+res el#f+l are

ore likely to ski# class" teeval+ation for !ata &ill ten! to )e)iase!; on average" te fors &ill ten!to re#ort tat te instr+ctor is oreeffective tan st+!ents really tink e

really is.45

Dor ore on tese to#ics" see Looke ?193" L+ff ?1993 an! 'ale)

?2007.

ariables

 A "("8+%  is a val+e or caracteristic tat can !iffer fro in!ivi!+al

to in!ivi!+al. ,ata are generally recor!e! val+es of varia)les.

<*"-TT"T% "("8+%$  take n+erical val+es &ose si:e is

eaningf+l. <*"-TT"T%  varia)les ans&er *+estions s+c as o&

anyJ or o& +cJ Dor e(a#le" it akes sense to a!!" to

s+)tract" an! to co#are t&o #ersonsE &eigts" or t&o failiesE

incoes; 'ese are *+antitative varia)les. K+antitative varia)les

ty#ically ave eas+reent +nits" s+c as #o+n!s" !ollars" years"volts" gallons" ega)ytes" inces" !egrees" iles #er o+r" #o+n!s

#er s*+are inc" B'Us" an! so on.

Soe "("8+%$ " s+c as social sec+rity n+)ers an! :i# co!es"

take n+erical val+es" )+t are not *+antitative; 'ey are <*"+T"T% 

or C"T%GO(C"+  varia)les. 'e s+ of t&o :i# co!es or social

sec+rity n+)ers is not eaningf+l. 'e average of a list of :i# co!es

20

Page 21: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 21/45

is not eaningf+l. <*"+T"T%  an! C"T%GO(C"+  varia)les ty#ically

!o not ave +nits. K+alitative or categorical varia)les>s+c as

gen!er" air color" or etnicity>gro+# in!ivi!+als. K+alitative an!

categorical varia)les ave neiter a si:e nor" ty#ically" a nat+ral

or!ering to teir val+es. 'ey ans&er *+estions s+c as &ic kin!J

'e val+es categorical an! *+alitative varia)les take are ty#ically

a!ectives ?for e(a#le" green" feale" or tall. Aritetic &it

<*"+T"T%  varia)les +s+ally !oes not ake sense" even if te

varia)les take n+erical val+es. C"T%GO(C"+  varia)les !ivi!e

in!ivi!+als into categories" s+c as gen!er" etnicity" age gro+#" or

&eter or not te in!ivi!+al finise! ig scool.

%xa!ples of qualitative,

quantitative, and categorical

variables<ualitative

Y LotOFarOCol!Y Po#+lation !ensity; lo&Oe!i+OigY Leigt; sortOe!i+OtallY Un!er @E" @E–NE" %ver NEY Slen!erOAverageO%ver&eigtY Io+ngOi!!le8age!O%l!Y Social class; lo&erOi!!leO+##er Y Daily si:e; fe&er tan 3" 3–@" @ or oreYY

CategoricalY 'e#erat+re; #leasantO+n#leasantY +ralOUr)an areaY en!oor#Oesoor#Oectoor#Y 'y#e of cliateY $en!er  Y tnicityY Ri# co!eY Lair color Y Co+ntry of origin

<

ua

n

t

i

t

a

21

Page 22: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 22/45

t

i

v

e

Y 'e#er  

at+re in ZCY Po#+la

tion !ensity;

#eo#le #er

s*+are ileY Leigt

in incesY Leigt

in centietersY Bo!y

ass in!e(

?B-Y Age in

secon!sY -ncoe

in !ollarsY Daily

si:e ?[#eo#leY

'e !istinction )et&een tese ty#es of varia)les is soe&at )l+rry.

Dor e(a#le" &e igt gro+# ages into categories s+c as +n!er @

years ol!" )et&een @ an! 1@" )et&een 1@ an! 2@" )et&een 2@ an! G0"

an! over G0. Siilarly" &eter gen!er or cliate ty#es are *+alitative

or categorical varia)les is not clear8c+t. $enerally" if tere is an i#licit

or!ering of te val+es te varia)le can take ?ot is &arer tan &ar"

&ic is &arer tan col!" tere is a ten!ency to call a varia)le

*+alitative rater tan categorical6 soe #eo#le call s+c varia)les

O(D-"+ . -t is coon to code categorical an! *+alitative varia)les

+sing n+)ers" for e(a#le" 1 for ale an! 0 for feale. The fact

that a category is labeled with a number does not make the

variable quantitative!  'e real iss+e is &eter aritetic &it teval+es akes sense.

-n!ivi!+als nee! not )e #eo#le6 for e(a#le" &e igt )e co#aring

icrocliates in te San Drancisco Bay Area" +sing varia)les s+c as

< ann+al rainfall in inces ?*+antitative< ann+al n+)er of s+nny !ays ?*+antitative" !iscrete

22

Page 23: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 23/45

< A classification into very foggy" soe&at foggy" an! s+nny.

?*+alitative" or!inal< ann+al average te#erat+re in !egrees Dareneit. ?*+antitative

Siilarly" te in!ivi!+als co+l! )e a single in!ivi!+al at !ifferent

ties; A varia)le igt )e te #rice of a sare of icrosoft stock at

!ifferent ties.

-t is soeties +sef+l to !ivi!e <*"-TT"T%  varia)les f+rter into

D$C(%T%  an! CO-T-*O*$  varia)les. ?'is !ivision is soeties

rater artificial. 'e set of #ossi)le val+es of a D$C(%T%  varia)le is

CO*-T"8+% .45

 (a#les of !iscrete varia)les incl+!e ages eas+re! to te nearest

year" te n+)er of #eo#le in a faily" an! stock #rices on te e&Iork Stock (cange. -n te first t&o of tese e(a#les" te varia)lecan take only soe #ositive integers as val+es. -n all tree e(a#les"tere is a ini+ s#acing )et&een te #ossi)le val+es. ost!iscrete varia)les are like tis>tey are c+nky. Haria)les tat co+nttings are al&ays !iscrete.

(a#les of contin+o+s varia)les incl+!e tings like te e(act ages or 

eigts of in!ivi!+als" te e(act te#erat+re of soeting" etc . 'ere

is no ini+ s#acing )et&een te #ossi)le val+es of a contin+o+s

varia)le. 'e #ossi)le val+es of !iscrete varia)les !onEt necessarily

ave a ini+ s#acing. ?Dor e(a#le" te set of fractions>rationaln+)ers>is CO*-T"8+% " )+t tere is no ini+ s#acing )et&een

fractions. %ne reason te !istinction )et&een !iscrete an!

contin+o+s varia)les is soe&at vag+e is tat in #ractice tere is

al&ays a liit to te #recision &it &ic &e can eas+re any

varia)le. 'e liit !e#en!s on te instr+ent &e +se to ake te

eas+reent" o& +c tie &e take to ake te eas+reent"

an! so on. Dor ost #+r#oses" te !istinction )et&een contin+o+s an!

!iscrete varia)les is not i#ortant.

'e follo&ing e(ercise cecks yo+r +n!erstan!ing of te !ifferences

aong ty#es of varia)les. 'e e(ercise &ill tell yo+ ie!iately

&eter yo+ are rigt or &rong; ac *+estion is follo&e! )y an

iage. -nitially" te iage is a *+estion ark. -f yo+ ans&er te

*+estion correctly" te *+estion ark is re#lace! )y a ceck ark. -f

yo+ ans&er te *+estion incorrectly" te *+estion ark is re#lace! )y

an M. %nce yo+ atte#t te e(ercise" yo+ can see te correct ans&er

23

Page 24: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 24/45

)y clicking te iage. Clicking te iage again &ill i!e te ans&er.

Clicking te 4Sol+tion5 link ?&en tere is one reveals a ore

!etaile! ans&er.

 

%xercise >?0A -!entify te ty#es oftese varia)les;

4Sol+tion5

$a!ple Data $ets

'ro+go+t tis )ook" as &e learn ne& tecni*+es &e sall a##ly

te to real8&orl! !ata fro )+siness" !eogra#y" e!+cation" la&"e!icine" an! #ysics. A##lying te tecni*+es to !ata &ill el# +s to

+n!erstan! te tecni*+es an! to i!entify circ+stances in &ic te

tecni*+es are a##ro#riate. 'e follo&ing sections intro!+ce !ata &e

sall +se to ill+strate an! to #ractice +sing ta)les" fre*+ency ta)les"

istogras" an! #ercentiles.

2G

Page 25: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 25/45

Trade $ecret Data

'e first !ata set is te 'ra!e Secret ,ata" &ic arose fro a la&s+it

alleging te teft of a c+stoer list. 'e naes of te #eo#le an!

firs ave )een cange!" )+t oter&ise" te facts are state! as -

+n!erstan! te.

%n 1 ay 199@" t&o forer e#loyees of FeeBee Lar!&are ?FBL"

a fir tat sells co#+ter co#onents to co#+ter asse)lers an!

retailers" o#ene! te !oors of a ne& co#any" Feasel ,rives ?F,.

%ne of te forer e#loyees a! &orke! at FBL +# to te !ay

)efore F, o#ene! its !oors6 te oter a! sto##e! &orking for FBL

a)o+t 1 onts #revio+sly. Bot firs are in te greater San

Drancisco Bay Area.

Dro te tie F, starte! )+siness" it sol! essentially te sae kin!s

of co#+ter co#onents tat FBL !i!" ostly to forer c+stoers of 

one of te forer e#loyees" at essentially te sae #rices an! &itessentially te sae cre!it ters. -n!ee!" in te first t&o !ays F,

&as in )+siness" one of te forer e#loyees a! calle! te to#

!o:en of er FBL acco+nts. -n its first ont of )+siness" F, sol!

a)o+t \1 illion of e*+i#ent to forer c+stoers of FBL6 tat

ao+nt increase! to a)o+t \2 illion #er ont in te co+rse of a fe&

onts.

'e #rinci#als of FBL so+gt an in+nction against F, to #revent it

fro selling to c+stoers of FBL" alleging tat teir c+stoer list &as

a tra!e secret an! a! )een isa##ro#riate! )y its forer e#loyees.

45

-t is &ell esta)lise! tat a c+stoer list can *+alify as a tra!e secret;

-t as econoic val+e" an! !erives its val+e fro not )eing generally

kno&n. C+stoer lists can )e te #ro!+ct of years of soliciting ne&

)+siness )y a!vertising an! col!8calling tens of to+san!s of

#otential c+stoers an! &inno&ing tat list !o&n to a fe& +n!re! or

a fe& to+san! &o act+ally !o )+y te kin! of e*+i#ent te fir

sells" &o &ill )+y it fro tat fir" an! &o #ay #ro#tly. Fit

kno&le!ge of a firEs list of c+stoers" a co#etitor co+l! avoi! tetie an! e(#ense of soe a!vertising" col!8calling" cecking cre!it

references" )a! !e)t" an! so on.

-n res#onse to FBLEs re*+est for an in+nction" F, asserte!;

< 'ey fo+n! te naes of te c+stoers in #+)lic so+rces" s+c

as C,8%s tat contain lists of )+sinesses" an! fro co#+ter

2@

Page 26: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 26/45

aga:ines in &ic tose c+stoers a!vertise" not  fro teir kno&le!ge

of FBLEs c+stoers.< S+c a large overla# &it FBLEs c+stoer list &as inevita)le"

)eca+se FBL a! so any c+stoers.

 A California Co+rt of A##eals !ecision ?ABBA +))er Co. v. Sea*+ist

2N Cal. #tr. at @2 esta)lises tat a rea!ily ascertaina)le )y#ro#er eans affirative !efense to a clai of isa##ro#riation is

a##ro#riate +n!er certain circ+stances;

-5f te !efen!ants can convince te

fin!er of fact ] ?1 tat it is a virt+al

certainty tat anyone &o

an+fact+res certain ty#es of

#ro!+cts +ses r+))er rollers" ?2 tat

te an+fact+rers of tose #ro!+cts

are easily i!entifia)le" an! ?3 tat te

!efen!antsE kno&le!ge of te #laintiffEs

c+stoers resulted fro! that

identification process and not fro!

the plaintiff3s records" ten te

!efen!ants ay esta)lis a !efense to

te isa##ro#riation clai.

 ABBA +))er Co." 2N Cal. #tr. at @29" ftnt. 9.

F, &o+l! t+s )e in te clear if tey co+l! so& tat tey i!entifie!

te c+stoers tey calle! fro te C,8%s an!Oor aga:ines

&ito+t +sing teir kno&le!ge of FBLEs c+stoer list. - &as retaine!as an e(#ert &itness to calc+late te #ro)a)ility tat certain s+)sets of 

F, c+stoers &o+l! overla# &it analogo+s s+)sets of te active

FBL c+stoer list to te e(tent tat tey !o" an! tat F, &o+l!

#lace as large a n+)er of calls to FBL c+stoers as tey !i!" +n!er 

vario+s ass+#tions. 'e #laintiffEs la& fir atce! te !efen!antsE

c+stoer list against te #laintiffEs" an! against a!vertiseents in te

aga:ines fro &ic F, claie! tey o)taine! ost of teir

c+stoers. 'e #laintiff agree! ?sti#+late! tat essentially all te

naes in *+estion &ere in te C,8%s. 'e #laintiffEs la& fir also

&ent tro+g te !efen!antsE tele#one recor!s an! i!entifie! calls to

FBL c+stoers an! oters. %nly local toll calls an! long !istance

calls res+lt in tele#one recor!s" so calls to FBL c+stoers &o are

close to F, co+l! not )e i!entifie!.

FBL a! 3310 active c+stoers at te tie in *+estion6 F, a! 132.

'ey a! 93 c+stoers in coon. F, claie! to ave fo+n! te

naes of 27 of teir c+stoers in local tra!e aga:ine

2N

Page 27: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 27/45

a!vertiseents" an! to ave fo+n! te naes of 31 of teir c+stoers

in te C,8%s. A total of GN9 #otential )+yers of te kin! of

e*+i#ent F, sells a!vertise! in te aga:ines in *+estion6 1@2 of

te &ere FBL c+stoers. %f te 27 c+stoers F, claie! to ave

fo+n! in te aga:ines" 2N &ere c+stoers of FBL. %f te 31

c+stoers F, claie! to ave fo+n! in te C,8%s" 22 &ere

c+stoers of FBL. %f te 3310 FBL c+stoers" 17N9 &ere o+tsi!e

te San Drancisco Bay Area. %f te 132 F, c+stoers" &ere

o+tsi!e te San Drancisco Bay Area. All of te F, c+stoers

o+tsi!e te Bay Area &ere also c+stoers of FBL. %ter e(#erts

estiate! tat tere &ere ore tan 90"000 #otential )+yers of te

kin!s of e*+i#ent FBL an! F, sell in te U.S. as a &ole" an!

ore tan N0"000 o+tsi!e te San Drancisco Bay Area ?incl+!ing

Silicon Halley. 'ere &ere 290N FBL c+stoers to &o calls )y

F, &o+l! ave res+lte! in #one recor!s" an! N F, c+stoers for&o tere &ere #one recor!s" of &o @3 &ere c+stoers of

FBL. -n te ont of ay" 199@" F, #lace! a total of 10@0 calls tat

#ro!+ce! #one recor!s" an! 100N of te &ere to te @3 c+stoers

of FBL.

Presenting te !ata in a narrative is e(treely ar! to follo&. -t is

+c easier to +n!erstan! te !ata +sing a ta)le;

 

Table >?0@ $iBes of groups of purchasers and

potential purchasers of co!puter equip!ent of 

the type sold by =8' and =DA

 Table >?9@ Phone calls in =D phone records, &ay

0//)A

27

Page 28: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 28/45

'ese !ata are *+antitative an! !iscrete ?tey co+nt vario+s tings.

-n C'"PT%( 9, ("-DO& "("8+%$ "-D D$C(%T% D$T(8*TO-$ " &e

sall +se tese !ata to test F,Es clai tat te large overla# of te

c+stoer lists &as inevita)le given te n+)er of c+stoers FBL

a!.

ea!ing ta)les is an e(treely i#ortant skill. 'e follo&ing e(ercises

ay give yo+ val+a)le #ractice. ?-f yo+ nee! a calc+lator" click te

Calc+lator link in te !ro#8!o&n en+ at te to# left of te screen.

 

%xercise >?9A Fat fraction of F,c+stoers are also c+stoers of

FBLJ 

4Sol+tion5

 

%xercise >?>A 'e !ecial fraction ofF, c+stoers o+tsi!e te Bay Area is

4Sol+tion5

 

%xercise >?A Fat fraction of F,c+stoers o+tsi!e te Bay Area are

c+stoers of FBLJ 

4Sol+tion5

 

%xercise >?)A Dill in te issingn+)ers in te follo&ing ta)le;

2

Page 29: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 29/45

4Sol+tion5

Gravity Data

'e secon! set of !ata is a collection of eas+reents of g " te

acceleration !+e to gravity" a!e at Pi^on Dlat %)servatory in 199

?!ay 229" )et&een @;29;@2# an! @;G;0#. Io+ igt ree)er

fro a #ysics class tat if yo+ !ro# an o)ect" it falls faster an! faster

?it accelerates" +ntil it its te gro+n!. 'e rate at &ic it &o+l!

accelerate" in te a)sence of air resistance" is g . At artEs s+rface" g  

is a)o+t 9. eters #er secon! #er secon! ?Os2. 'at is" eac

secon! an o)ect falls" it gains a)o+t 9. eters #er secon! of s#ee!.

 A eter #er secon! ?Os is a)o+t 2.2G iles #er o+r ?#" so te

acceleration !+e to artEs gravity is a)o+t(9.8 m/s2  )×(2.24 mph/(m/s)) = 22 miles per hour per second.

-f yo+ go )+ngee +#ing fro ig eno+g tat yo+ fall for 2 secon!s

)efore te )+ngee starts to stretc" yo+ &ill )e going a)o+t

(22 miles per hour per second)×(2 seconds) = 44 miles per hour.

'is calc+lation neglects air resistance" &ic &o+l! slo& yo+ !o&n a

)it.

'e ta)le lists te D%"TO-$  of te 100 eas+reents fro a )ase

val+e of 9.7923 Os2" ties 10. 'at is" eac entry in te ta)le is

100000000×(measured !alue o" g in m/s2  −9.792838).

?ote tat 10 _ 10`10`10`10`10`10`10`10 _ 100"000"000. -f yo+

nee! to revie& e(#onential notation" see Assignent 1.

'e e(#eriental a##arat+s +se! to collect tese !ata is #retty slick;

-t +ses a laser an! an acc+rate tie reference to !eterine te

!istance a irrore! corner of a c+)e falls in a vac++ ca)er as a

f+nction of tie. 'e c+)e is !ro##e! in a vac++ to avoi! air

resistance" &ic &o+l! ake te eas+reents systeatically too

29

Page 30: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 30/45

sall. 'ese eas+reents &ere a!e at Pi^on Dlat %)servatory )y

$len Sasaga&a an! ark R+)erge of te Scri##s -nstit+tion of

%ceanogra#y in a /olla" California. 'iny fl+ct+ations in gravity" like

tose tis instr+ent can eas+re" allo& geo#ysicists to learn a)o+t

te !istri)+tion of ass &itin te art" a)o+t oveents of te

art associate! &it te ti!es" an! &it stresses tat lea! to

eart*+akes.

Table >?>@ One hundred !easure!ents of g, the

acceleration due to gravity, at PiEon :lat

ObservatoryA The entries are 01 ti!es the

deviations of g fro! a reference value of /A./91>1

!Fs9A

Lere" te ta)+lar re#resentation !oes not ean anyting s#ecial6 it is

 +st a &ay of &riting a list.

 A reasona)le ateatical o!el for te o)servations is tat

(o#ser!ed !alue o" g) = (true !alue o" g) $ error

&ere te error ten!s to )e !ifferent for eac eas+reent.

Fy ake so any eas+reentsJ

< 'o increase acc+racy; rrors in !ifferent eas+reents ten! to

average o+t to soe e(tent" so one can estiate g  )etter +sing an

average of a large n+)er of eas+reents tan one can +sing a single

eas+reent.< 'o assess +ncertainty; 'e varia)ility of te re#eate!

eas+reents gives an estiate of te +ncertainty of a single

eas+reent" or of te average of te eas+reents.

30

Page 31: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 31/45

< 'o onitor te e(#erient; -f te fl+ct+ations get larger" or a

eas+reent is +n+s+al" soeting igt )e going &rong &it te

e(#erient.

-n later ca#ters" &e &ill ill+strate te first an! secon! #oints +sing

tese !ata.

:requency Tables-t is ar! to learn +c )y looking at tis list6 it &o+l! )e el#f+l to

s+ari:e te val+es in a ore trans#arent &ay. Fe sall )egin )y

constr+cting a :(%<*%-C6 T"8+% . A :(%<*%-C6 T"8+%  lists te

fre*+ency ?n+)er or relative fre*+ency ?fraction of o)servations

tat fall in vario+s ranges" calle! C+"$$ -T%("+$ . Fe also nee!

an %-DPO-T CO-%-TO-

  to )e a)le to constr+ct a :(%<*%-C6

 T"8+% ; -f an o)servation falls on te )o+n!ary )et&een t&o C+"$$ 

-T%("+$ " in &ic class interval !o &e co+nt te o)servationJ 'e

t&o stan!ar! coices are al&ays to incl+!e te left )o+n!ary an!

e(cl+!e te rigt" e(ce#t for te rigtost class interval" or al&ays to

incl+!e te rigt )o+n!ary an! e(cl+!e te left" e(ce#t for te leftost

class interval.

et +s constr+ct a relative fre*+ency ta)le for te gravity !ata. 'ere

are no ar!8an!8fast r+les for !eterining a##ro#riate C+"$$ 

-T%("+$ " an! te i#ression one gets of o& te !ata are

!istri)+te! !e#en!s on te n+)er an! location of te intervals ?ore

on tis later in tis ca#ter. Fe sall +se te follo&ing nine class

intervals;

Y −160 (inclusive) to −110 (inclusive)Y −110 (exclusive) to −90 (inclusive)Y −90 (exclusive) to −70 (inclusive)Y −70 (exclusive) to −40 (inclusive)Y −40 (exclusive) to −10 (inclusive)Y −10 (exclusive) to 20 (inclusive)Y 20 ?e(cl+sive to @0 ?incl+siveY @0 ?e(cl+sive to 0 ?incl+sive

Y 0 ?e(cl+sive to 1N0 ?incl+siveote tat te en!#oint convention ere is al&ays to incl+!e te rigt

)o+n!ary an! e(cl+!e te left" e(ce#t for te leftost class interval.

'o constr+ct te fre*+ency ta)le" te ne(t ste# is to co+nt te n+)er

of !ata tat fall in eac C+"$$ -T%("+ . Co+nting is +c easier if

&e sort te !ata. T"8+% >? lists te gravity !ata sorte! into increasing

or!er;

31

Page 32: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 32/45

Table >?@ $orted gravity dataA

'e first class interval contains te 9 o)servations

{−152, −132, −132, −128, −122, −121, −120, −113,

−112}.

ine is 9 of 100" so te relative fre*+ency of o)servations in te first

class interval is 9. 'e secon! class interval contains te 10

o)servations

{−108, −107, −107, −106, −106, −106, −105, −101, −101,

−99}.

'en is 10 of 100" so te relative fre*+ency of o)servations in tesecon! class interval is 10. 'e last class interval contains te t&o

o)servations b1@0" 1@@. '&o is 2 of 100" so te relative fre*+ency

of o)servations in te last class interval is 2.

'e follo&ing e(ercise cecks yo+r +n!erstan!ing of fre*+ency ta)les.

 

%xercise >?A Dill in te issingn+)ers in te follo&ing ?relative

fre*+ency ta)le for te gravity !ata;

32

Page 33: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 33/45

4Sol+tion5

'istogra!s'e fre*+ency ta)le is easier to inter#ret tan te ra& !ata" )+t it is

still ar! to get an overall i#ression of te !ata fro it. 'e

'$TOG("&  is an e(cellent tool for st+!ying te !istri)+tion of a list of

33

Page 34: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 34/45

*+antitative eas+reents. A '$TOG("&  is a &ay of vis+ali:ing a

fre*+ency ta)le gra#ically>of aking a #ict+re fro a fre*+ency

ta)le. 'e fraction of !ata in eac class interval is re#resente! )y a

rectangle ?8-  &ose )ase is te class interval an! &ose area is te

fraction of !ata ?relative fre*+ency of !ata tat fall in te class

interval;

[o)servations in te

888888888888888888888888888888888888888

total n+)er of o)servations

so tat

:G*(% >?0 is a cartoon of a istogra;

:igure >?0@ %le!ents of a histogra!A

3G

Page 35: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 35/45

'e key to a '$TOG("&  is tat it is te area of te 8- " not te eigt

of te 8- " tat re#resents te relative fre*+ency of !ata in te )in.

'e area of te 8-  is #ro#ortional to te relative fre*+ency of

o)servations in te C+"$$ -T%("+ . 'e ori:ontal a(is of a

istogra nee!s a scale &it +nits. 'e vertical a(is of a istogra

al&ays as +nits of #ercent #er +nit of te ori:ontal a(is" so tat te

areas of )ins ave +nits of 

(hori%ontal units) × (percent per hori%ontal unit) = percent.

'e scale of te vertical a(is is a+toatically i#ose! )y te fact tat

te total area of te '$TOG("&  +st )e 100 ?100 of te !ata fall

soe&ere on te #lot. 'e vertical scale is calle! a D%-$T6 $C"+% .

'e eigt of a 8-  is te D%-$T6  of o)servations in te 8- ; te

3@

Page 36: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 36/45

#ercentage of o)servations in te )in #er +nit of te ori:ontal a(is.

'y#ically it is not  te #ercentage of o)servations in te )in.

 A '$TOG("&  is not te sae as a #ar chart ; -n a )ar cart" te eigt

of a rectangle ?)ar" rater tan te area of te )ar" in!icates te

relative fre*+ency of o)servations. 'e &i!t of te )ar !oes not

atter6 it !oes not even nee! to ave +nits. 'is akes )ar carts

es#ecially +sef+l for !is#laying C"T%GO(C"+  an!<*"+T"T%  !ata"

&ere te ori:ontal a(is !oes not ave a scale>it is +st a &ay to

se#arate gro+#s. '$TOG("&$  are ore a##ro#riate for

<*"-TT"T%  !ata.

Dor te gravity !ata" te first C+"$$ -T%("+  is from −160 to

−110, and has 9% of the data. he hei!ht of the

corres"ondin! 8-  is t+s

 

?'e +nit is 10−#Os2.

 he second class interval has $idth (−90 − (−110)) 20

units, and has 10% of the o&servations, so the hei!ht of the

corres"ondin! &in is

 he last class interval has $idth 160−#0 #0 units, and has

2% of the o&servations, so the hei!ht of the last &in is

'e eigt of te last )in is one t&entiet ?.02@O.@ tat of te secon!

)in.

3N

Page 37: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 37/45

 he relative fre'uenc of o&servations in the second class

interval is ve times that of the last class interval (10%

versus 2%), so the area of the second &in is ve times that of 

the last &in. he $idth of the second class interval is 1*4 the

$idth of the last class interval (−90−(−110) 20, versus

160−#0 #0+ 20 is 1*4 of #0). hus the second &in is

-420 times taller than the last &in.

:G*(% >?9 is a istogra of te g  !eviations corres#on!ing to tese

class intervals ?+lti#lie! )y 10 as )efore;

:igure >?9@ 'istogra! of deviations of 

g !easured at PiEon :lat

ObservatoryA81@081008@00@01001@0

Selecte! area; 0

 Area fro;

to;

Bins; 9

:G*(% >?9 is te first "PP+%T  in tis )ook>tere are any ore to

coe. 'is a##let is a #rogra &it controls yo+ can ani#+late. Dor

e(a#le" try oving te scroll )ars near te )otto of te #lot" or

ty#ing oter n+)ers into te )o(es ne(t to te scroll )ars an! ten#ressing te nter or et+rn key. -f yo+ set te Area fro te(t )o(

lo&er tan te to te(t )o(" #art of te istogra &ill cange color fro

)l+e to yello&" an! te area of te yello& #art &ill )e !is#laye! +n!er

te istogra" as Selecte! area.

$kewness and &odes

'e &or! D$T(8*TO-  refers to o& n+erical !ata are

!istri)+te! on te real line. Fe can !iscover *+alitative feat+res of

te D$T(8*TO-  of te !ata fro te'$TOG("& . he center ofthe data is around −0 to −40. /ost of the o&servations are

&et$een −110 and 20. he o&servations are not distri&uted

$6&&%T(C"++6 aro+n! te center; 'ey contin+e farter to te rigt

of te center tan to te left of te center. 'e !istri)+tion is sai! to )e

$4%=%D  to te rigt" right&ske'ed  or to ave a long right tail .

Conversely" &en te !ata are ore s#rea! o+t to te left of te

37

Page 38: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 38/45

center tan to te rigt" te !istri)+tion is sai! to )e SKEWED  to the

le"t " le"t&ske'ed or to ave a long le"t tail .

,istri)+tions of #rices an! incoes ten! to )e ske&e! to te rigt. Dor 

e(a#le" consi!er o+se #rices. ost oes cost +n!er \100"000 to

\200"000 ?!e#en!ing on te locality" )+t a relatively sall n+)er of

oes sell for tens of illions of !ollars. Siilarly" ost faily ann+al

incoes are +n!er \N0"000" )+t a sall n+)er of #eo#le ave

ann+al incoes e(cee!ing tens of illions of !ollars. Age !istri)+tions

also ten! to )e ske&e! to te rigt6 for e(a#le" tere is +nlikely to )e

anyone in tis class yo+nger tan 1G years ol!" an! ost are )et&een

17 an! 22" )+t a fe& ret+rning st+!ents are likely to )e in teir 30s"

G0s or ol!er.

'is istogra of te gravity !ata consists of only one )+#; it is

sai! to )e *-&OD"+ . -n general" a '$TOG("&  is sai! to )e

&*+T&OD"+  if it as ore tan one )+#" an! in #artic+lar 8&OD"+ if it as t&o )+#s.

'e follo&ing e(ercises ceck yo+r a)ility to +se te istogra a##let

in :G*(% >?9 .

 

%xercise >?.A  he area under thehisto!ram &et$een −120 and 10

is

4Sol+tion5

 

%xercise >?1A  he area under thehisto!ram &et$een −160 and

−4 is 

4Sol+tion5

3

Page 39: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 39/45

Percentiles and <uartiles Anoter &ay to caracteri:e a list of n+)ers is +sing P%(C%-T+%$ .

'e pt #ercentile of a list is te sallest n+)er tat is at least as

large as p% of the num&ers in the list. or exam"le, 10% of

the !ravit data are less than or e'ual to −10#, so −10# isthe 10th "ercentile of the !ravit data. he smallest num&er

that is at least as lar!e as 1% of the data is −106, so −106

is the 1th "ercentile of the data, even thou!h in fact 1% of

the o&servations are less than or e'ual to −106. he 1th

throu!h 1th "ercentiles of these data are all −106. t is much

easier to "ind percentiles "rom the sorted list than "rom the original 

Soe #ercentiles ave s#ecial naes" as so&n in 'a)le istogras8

@.

Table >?)@ Co!!on na!ed percentilesA

 he lo$er 'uartile is the 2th "ercentile the smallest num&er

that is at least as lar!e as 2% of the data. he median is the

0th "ercentile the smallest num&er that is at least as lar!e

as half the data. 3e ust sa$ that the median of the !ravit

data is −47. he u""er 'uartile is the 7th "ercentile the

smallest num&er that is at least as lar!e as 7% of the data.

5""roximatel half the o&servations are &et$een the lo$er

'uartile and the u""er 'uartile.

'e follo&ing e(ercises verify tat yo+ can calc+late #ercentiles.

 

%xercise >?/A Dill in te issing

#ercentiles for te gravity !ata;

39

Page 40: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 40/45

4Sol+tion5

 

%xercise >?0A Lere is a list of !ata to#ractice &it. very tie yo+ re8visit or re8loa! te #age" te !ata &ill )e!ifferent.

Practice !ataGG 3G 1G 37 2N 0 839 829 830 8G181 21 31 82G 819 81@ 81G 32 3@ 82N 813 G1 8@ 3@ 81G

'e ta)le sorte! into increasing or!eris

G0

Page 41: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 41/45

Practice !ata8G1 839 830 829 82N 82G 819 81@ 81G 81G 8138@ 81 0 1G 21 2N 31 32 3G 3@ 3@ 37 G1 GG

Dill in te follo&ing ta)le of #ercentiles;

 

%sti!ating Percentiles fro! 'istogra!s

'o fin! a P%(C%-T+%  of a set of eas+reents e(actly" one nee!s

te original !ata. -n #lotting a '$TOG("& " te !ata are gro+#e! into

C+"$$ -T%("+$ " &ic ty#ically akes it i#ossi)le to fin! e(act

P%(C%-T+%$  fro a istogra. A istogra tells yo+ te #ercentage

of !ata eac class interval contains" )+t not &ere in te class interval

eac !at+ is. Lo&ever" one can fin! appro*imate P%(C%-T+%$  fro

G1

Page 42: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 42/45

a '$TOG("& ; 'e pt P%(C%-T+%  is a##ro(iately te #oint on te

ori:ontal a(is s+c tat te area +n!er te '$TOG("&  to te left of

te #oint is p. :G*(% >?> is anoter istogra of te Pi^on Dlat g  

!ata" &it e*+al8&i!t class intervals;

:igure >?>@ 'istogra! of deviations of g using equal?width binsA,ata; gravity.son Haria)le; !eviation of g

81@081008@00@01001@0

Selecte! area; 0

 Area fro;

to;

Bins;

ist ,ata

n_100 ean_8G1.N70 S,[email protected]@

'is istogra as e*+al8&i!t C+"$$ -T%("+$ . Io+ can cange

te n+)er of )ins )y ty#ing a !ifferent val+e into te )o( la)ele!

Bins an! #ressing te et+rn or nter key>)+t !onEt !o tat yet. -f

yo+ click te ist ,ata )+tton" a ne& &in!o& &ill #o# +# &it a listing

of te 100 n+)ers in te gravity !ata set. 'is a##let also !is#layst&o n+)ers tat are !efine! in C'"PT%( , &%"$*(%$ O: +OC"TO- 

"-D $P(%"D ; te &%"-  ?average an! te $D (standard deviation).

 he ran!e −12 to −44.2 is hi!hli!hted $hen ou rst o"en

this "a!e, and the !ure sho$s that the area under the

histo!ram in that ran!e is 0%. ur estimate of the median

from the histo!ram thus $ould &e −44.2. 3e sa$ earlier in

this cha"ter that the median of the data is −47 he estimate

of the median from the histo!ram is o & a &it &ecause the

data have &een !rou"ed into C+"$$ -T%("+$  in te'$TOG("& .

 "e −47 into the to )o(" an! #ress et+rn or nter. 'e selecte!

area +n!er te istogra so+l! so& G. 'e !ifference )et&een

G an! @0 is also ca+se! )y te gro+#ing of !ata into C+"$$ 

-T%("+$  in te istogra.

G2

Page 43: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 43/45

'e follo&ing e(ercise lets yo+ #ractice estiating #ercentiles fro

istogras.

 

%xercise >?00A stiate te follo&ing#ercentiles of te gravity !ata fro teistogra;

4Sol+tion5

o& cange te n+)er of )ins fro 9 to 30 )y ty#ing 30 into te

Bins )o( an! #ressing et+rn or nter. 'e istogra is no& ro+ger 

>it as ore )+#s or o!es. 'e a##earance of a istogra

!e#en!s cr+cially on o& te class intervals are cosen. -f yo+

estiate #ercentiles fro te istogra &it 30 )ins an! &it 9 )ins"

yo+ &ill get !ifferent ans&ers.

$u!!ary'is ca#ter intro!+ce! !aria#les" an! !istinctions aong varia)les"

accor!ing to te kin!s of val+es te varia)les can take; +uantitati!e"

+ualitati!e" an!categorical . K+antitative varia)les are classifie! f+rter 

as eiter discrete or continuous. ,ata>o)serve! val+es of varia)les

>can )e #resente! in any &ays. 'a)les often are easier to

+n!erstan! tan &or!s. Fen te n+)er of !ata is large" looking at

te !ata #rovi!es little insigt" )+t s+aries of te !ata can el#.

G3

Page 44: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 44/45

K+antitative !ata can )e s+ari:e! +sing "re+uenc, ta#les.

Constr+cting a fre*+ency ta)le re*+ires s#ecifying class inter!als an!

an endpoint con!ention. Dre*+ency ta)les can )e #resente!

gra#ically as histograms" &ic give an i#ression of te !istri)+tion

of te !ata. -n a istogra" relative fre*+ency is re#resente! )y area.

Caracteristics of te !istri)+tion tat can )e gleane! fro a

istogra incl+!e s,mmetr, " ske'ness" an! te n+)er an! location

of modes. Lo&ever" te a##earance of tose caracteristics in a

istogra !e#en!s on te n+)er an! location of te class intervals.

-ercentiles are anoter &ay to s+ari:e te !istri)+tion of a list.

Calc+lating #ercentiles e(actly re*+ires te original !ata" )+t

#ercentiles can )e estiate! a##ro(iately fro istogras.

4ey Ter!s< a##let< )ias< )io!al< )in< categorical varia)le< class interval< contin+o+s< co+nta)le< !ensity< !ensity scale< !eviation< !istri)+tion< !iscrete< en!#oint convention< fre*+ency ta)le< istogra< lo&er *+artile< e!ian< +ltio!al< or!inal varia)le< #ercentile< *+alitative varia)le< *+antitative varia)le< *+artile< ske&e!< syetrically< +nio!al< +##er *+artile< varia)le

GG

Page 45: SticiGui [Statistics 21]

8/18/2019 SticiGui [Statistics 21]

http://slidepdf.com/reader/full/sticigui-statistics-21 45/45

/+# to ca#ter;

W Preface W -ntro!+ction W 2 W 3 W G W @ W N W 7 W  W 9 W 10 W 11 W 12 W 13 W 1G W

1@ W 1N W 17 W 1 W 19 W 20 W 21 W 22 W 23 W 2G W 2@ W 2N W 27 W 2 W 29 W 30 W

31 W 32 W 33 W©1997–2013. P.B. Stark. All rigts reserve!.

ast generate! 2O2NO201G N;30;3@ P. Content last o!ifie! 11 /+ly 2013 07;G7

P,'.