emalgouthy - stanford universitycs229.stanford.edu/notes2020fall/notes2020fall/draft_em.pdfargifax...
Post on 28-Mar-2021
3 Views
Preview:
TRANSCRIPT
Emalgouthyt DeuveGmm as AN Enr Algorithm
t FnetouAnalysis lhyhdaeanwnedmn.ci En
RECALL JENSEN'S Inequality EffKDAfca LI AfcbfE fix I f EW
ate
t.mnACECxDCANONICAL convexfu XZ f't l
a 2 b
II 2 Ta t CiHb
Ing Concave if g convex
egg logCx on CoD CHORD Below function
for concave g sense say19 x
E x aco on cop
EL gu z g EEN
Oz E glad E g EEN LINEQualityreverses
43ef x is constant E gcxD gCECxDforstrongly concave g g co then holds As
Em Algorithm As MaxLikelihood
µPARAMETERS
do log Plx o
DATA
WE Assume P xj Q P X 2 je ofGmm LATENT VARIABLE
Picture al Algorithm
l.to Ca Ell07 lowerbound
Lo't Lto't Eighty
Angfax Leto HopE It easier tooptimizethan lto
gain Q
Roughage
E STEP 1 GWENOtt find LtMSto 2 Given Lt SET d ArgnfaxLELA
IiHowdoweconstructLts let's look at single Dain point x
log PIX Z o log QHPLx.to foranyQttQC4
WEPick Lz Sit QC23 1 AND Qa o
logPl E GimmePoming
3 I logPHILIP JENSEN Clog Is concave
QQ dog PCx z DEF of EQQ
KeystELholds forany sock Q Cx
FbsGIVES A family of lowerBOUNDS ONE for Each CHOICEal Q Dt El
How do we make it tight Select Q to make Inequality tight
whatif logpCxZ e for some constant then JENSEN's Is EqualityQQ
So Cz PLZ I jo flew Plx Z o P Ix o Dexo
leg PCKe does not depr.org on 2 so constant
NB Ctl does depend ON 0 4X WE will select a Q t23
for Every Point Inbreenbently ie forevery i
c Since Lga is stronglyconcave this was our onlyGeoice
WE DEFINE Evidences BASED lower BOUND CELBO sumOVER 2
ELBOWQ Z to logPLx DQCD
WE'VE Stown ICE 3 ELBO Xli Q 0 forANy_ Q saristry
lowerBOUND
float II ELBO X Q 041 for curate of Q AoovE
RESTATEEM
CESTEP for e I n set Qicz Plz IX 041
m stepCt
Argnfax LetoDATA Input Params
Argifax II ELBOCX Q Q
WARM Mixture of GaussANS EM RECOVERS OURAD40CAlgorithm
PCxci za PCx 20 pcz.cix
2 n Multinomial OI Dj o O I In cluster jy J
X t z j Nlay of sourcemeans
2 is our LATENT VARIABLE K Sarees
W4ATISEMHE.EE
QilZ7 PCZ'i j xci7joWE SAW that could compute Via Bayes Rule PIX t2 jD
Got idk
notI Mueymore likely for than
2 But if WE KNEW 0 227 01 MAYBE WE'd think likely from
Bayes ROLE AUTOMATES this REASONING
M S 0 i Compete Derivatives
MAX IEEci Qitz log PCx zuOlh E Qilzfila
c
WRITE 0 forNOTARONAbove
R see Wj E QiltijPex 2 go p Xli Izci p zig
Gaussian CeeE
fico Way logfzittgikexpftzlxcis ig.JEjlxh ujg.clway
n n
Teej IIfile 4 Ou way I Cx n F x a Dn n
I w EjCx u I Ej Cx nD
setting to 0 AND using E j is full rank w x Mj o
i kj wWg's
Before
y is constrained dj 7 dy 20 NEED LAGRANGIAN
r00J II Wj dogdjttg.ME i
II 1 1 0 do IEwin
Bince dj I dog I Wj I
i log w's
MEISACLE EM Recovers GMM Automatically
NB If Z Is continuous you CAN DEME SUMSW INTEGRALS
f s g R un Wl t g s
FaetorANalysis
MANYfewer pants 7hAMdimensions need
Ef Gmms nod Lydsof 04010ns few sources
Howdoes this happen
PLACE SENSORS All OVER CAMPUS RECORD 1000s allocations
d N 10005
But Only record for 30 days NcdWANT TO FIT A Density but SEEMS hopeless
KEYI DEA Assume there IS SOME LATENT V.V that
IS NOT too complex ANI Explainsbehavior
1Stlet's see Problems w GMMs Even 1 Gaussian
µ I X this is Ok
I II x a Cx est
RANK Ei ENL d Not Fole RANK
Probed IN Gaussianlikelihood IS NOTDEFINED
Phx µE III ik exp ex ust E x ers
k 0
WE will fix these Issues by Examining three Modelsthat Are Simpler Speke WE'll COMBINEthese in theEND
Recall ME for Gaussian
f
MII IE dog exrEtzlx uTE tx.ir
Equivalent
mining EI exuTECx u x log181
If is full rank On II E x ie o µ In
WE'll USE this AS PLUGIN Below
BuildngB.la c1SUPPOSE INDEPENDENT And iDEntiticalCOVARIANCE
COVARIANCE ARE circles
o.iolet 2 02 my I C t d tog z
if E C't n o ten
i 02 IT 4 x µ5cx ie
SUBTRAET MEAN AND SQUARE All ENTRIES
I
Bwedngblook2z.fi o
Axis Alegwers elapse
SET Zi Q SAME IDEA AS ABOVE
Mw EIIE ZI'Cx a 5tlogzZ 2d
this is d problems for EAey l dimension
II zj Xi est t lo z
of L Cx a 5
O T
nmodeTPARAMETER.SU
CRdA CRd'sOI e Rd
WDiagonalMATRIX
MODELP x z P HZ PLZ 2 latent
z N o I E RS fore Sed smaeedim
X µ t NZ t C or xn Nlt Nz ET
MEAN MAPS FROM Small latentSPACE to large SPACEINthe nee EN HLO OI Noisy
D 2 5 1 n 5 X µ t NZ t E
l GENERATE 2 l 2 from NCO D
I 1
2 Suppose N It t t t
l zaµ NZ
3 Add µ Cs
4 Add cBigSPACE 7
za
DAJA WEwouldOBSERVE ARE PurpleDots
So Small LATENT SPACE PRODUCES DATA IN high dim SPACE
TECHNical.TO Block GAUSSIANS
Ix c Rd cRdxc R
de deIdi
8 69 EEL wz c.IR xdii.jeEi23
NOTATION IS widelyOSED AND helpful
FAET 1 P X 2 PlexXz MARGINALIZATION
For GaussIANS Pcx HIM Guotsurprising
Fact Plx 1 2 NIU k 42 conditioninge
th iz µ t z zz Xz Uz
112 z z matrixInversionlemma
ProofsoNline 4APPY to Add
19 Marginalization Conditioning Gaussian
Another GAUSSIAN CLOSED
WE HAVE formula for PARAMETERS
BACK.to ncTo Nalqss X µ t NZ t E
E n N Ion E SINCE EEZ _o
E x u
Wheat is
E zzT I
0
t.EC C.xei5I ElzztnTJtE.fzfEJAT
Eu E Lx a Lx ustE IN 2 te NztefIE Nzzth t E EE
NRT t I
EIn Intel
I IE Qi Z P Z IX o USE conditional
MSTEI WE HAVE CLOSED Forms
Summary
WE SAW that EM CAPTURES GMM
WE LEARNED ABOUT FACTOR Analysis flattens low dim STRUCTURE
WE SAW How to ESTIMATE PARAMETERS OF FA Using EM
top related