edunex itb
TRANSCRIPT
EDUNEX ITB
EDUNEX ITB
IF4074 Minggu 9-15
โข Minggu ke-9 (18 Oktober 2021): LSTM + RNN arsitektur + Tubes 2
โข Minggu ke-10 (25 Oktober 2021): RNN Latihan + BPTT
โข Minggu ke-11 (1 November 2021): Kuliah tamu (sharing Aplikasi ML di Gojek)
โข Minggu ke-12 (8 November 2021): Praktikum RNN
โข Minggu ke-13 (15 November 2021): Feature Engineering 1 / TugasDesain eksperimen
โข Minggu ke-14 (22 November 2021): Kuis 2
โข Minggu ke-15 (29 November 2021): Praktikum Feature Engineering 2
EDUNEX ITB
04 LSTM: What & Why
Pembelajaran Mesin Lanjut(Advanced Machine Learning)
Masayu Leylia Khodra([email protected])
KK IF โ Teknik Informatika โ STEI ITB
Modul 4: Recurrent Neural Network
01
EDUNEX ITB
Long Short-Term Memory (LSTM): Why
https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21
โ๐ก = ๐(๐๐ฅ๐ก +๐โ๐กโ1 + ๐xh)
๐ฆ๐ก = ๐(๐โ๐ก + ๐hy)
RNN: long-term dependency problem
U
W V
๐ฅ๐ก
โt-1
โt
Suffer from short-term memory (forward propagation).
Suffer from vanishing gradient problem (backward propagation).RNNs fail to learn greater than 5-10 time steps.In the worst case, this may completely stop the neural network from further training.
02
EDUNEX ITB
Long Short-Term Memory (LSTM): What
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
LSTMs are explicitly designed to avoid the long-term dependency problem.
Introduced by Hochreiter & Schmidhuber (1997)
LSTM is special kind of RNN. The differences are the operations within the LSTMโs cells. RNN: repeating module have a very simple structure. LSTM: repeating module contains four interacting layers.
03
EDUNEX ITB
LSTM: Cell State & Gatesโข as the โmemoryโ of the network
โข act as a transport highway that transfers relevant information throughout the processing of the sequence.
Cell State
โข decides what information should be thrown away or kept.
โข Values closer to 0 means to forget, and closer to 1 means to keep.Forget Gate
โข Decides what information is relevant to add from the current step
โข update the cell state by hidden state and current inputInput Gate
โข decides what the next hidden state should be.
โข Hidden state contains information on previous inputs. The hidden state is also used for predictions.
Output Gate
https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21
04
EDUNEX ITB
Forget Gate
https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21
๐๐ก = ๐(๐๐๐ฅ๐ก +๐๐โ๐กโ1 + ๐f)
Value 1 represents
โcompletely keep thisโ
while a 0 represents โcompletely get rid of this.โ
๐ฅ๐ก
โ t-1
๐t-1
๐๐ก
05
EDUNEX ITB
Input Gate
https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21
๐๐ก = ๐(๐๐๐ฅ๐ก +๐๐โ๐กโ1 + ๐i)
๐ฅ๐ก
โ t-1
๐t-1
๐๐ก ๐๐ก ว๐๐ก
เทฉ๐ถ๐ก = tanh(๐๐๐ฅ๐ก +๐๐โ๐กโ1 + ๐c)
06
EDUNEX ITB
Cell State
https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21
๐ฅ๐ก
โ t-1
๐t-1
๐๐ก ๐๐ก ว๐๐ก
๐ถ๐ก = ๐๐ก โ๐ถ๐กโ1 + ๐๐ก โ เทฉ๐ถ๐ก
07
EDUNEX ITB
Output Gate
https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21
๐ฅ๐ก
โ t-1
๐t-1
๐๐ก ๐๐ก ว๐๐ก
๐๐ก = ๐ ๐๐๐ฅ๐ก +๐๐โ๐กโ1 + ๐o
โ๐ก = ๐๐ก โ tanh(๐ถt)
08
EDUNEX ITB
LSTM Forward Propagation: Example
https://medium.com/@aidangomez/let-s-do-this-f9b699de31d9
ิฆ๐ฅ (2)
โ (1)
๐๐ , ๐๐,
๐๐ , ๐๐
๐๐ ,๐๐,
๐๐ ,๐๐
A1 A2 Target
1 2 0.5
0.5 3 1
โฆ
๐๐ก = ๐(๐๐๐ฅ๐ก +๐๐โ๐กโ1 + ๐f)
๐๐ก = ๐(๐๐๐ฅ๐ก +๐๐โ๐กโ1 + ๐i)
เทฉ๐ถ๐ก = tanh(๐๐๐ฅ๐ก +๐๐โ๐กโ1 + ๐c)
๐ถ๐ก = ๐๐ก โ๐ถ๐กโ1 + ๐๐ก โ เทฉ๐ถ๐ก
๐๐ก = ๐ ๐๐๐ฅ๐ก +๐๐โ๐กโ1 + ๐o
โ๐ก = ๐๐ก โ tanh(๐ถt)
Uf0.700 0.450
Ui0.950 0.800
Uc0.450 0.250
Uo0.600 0.400
Wf bf0.100 0.150
Wi bi0.800 0.650
Wc bc0.150 0.200
Wo bo0.250 0.100
ht-1 Ct-10 0
09
EDUNEX ITB
Computing ht and ct : Timestep t1
t1=<12, 0.5> Uf.xt Wf.ht-1+bf net_ft ft
1.600 0.150 1.750 0.852
Ui.xt Wi.ht-1+bi net_it it2.550 0.650 3.200 0.961
Uc.xt Wc.ht-1+bc net_~ct ~ct0.950 0.200 1.150 0.818
Uo.xt Wo.ht-1+bo net_ot ot1.400 0.100 1.500 0.818
Ct ht0.786 0.536
ht-1 Ct-10 0
๐๐ก = ๐(๐๐๐ฅ๐ก +๐๐โ๐กโ1 + ๐f)
๐๐ก = ๐(๐๐๐ฅ๐ก +๐๐โ๐กโ1 + ๐i)
เทฉ๐ถ๐ก = tanh(๐๐๐ฅ๐ก +๐๐โ๐กโ1 + ๐c)
๐ถ๐ก = ๐๐ก โ๐ถ๐กโ1 + ๐๐ก โ เทฉ๐ถ๐ก
๐๐ก = ๐ ๐๐๐ฅ๐ก +๐๐โ๐กโ1 + ๐o
โ๐ก = ๐๐ก โ tanh(๐ถt)
https://medium.com/@aidangomez/let-s-do-this-f9b699de31d9
10
EDUNEX ITB
Computing ht and ct : Timestep t2
t2=<0.53
, 1>
ht-1 Ct-10.786 0.536
๐๐ก = ๐(๐๐๐ฅ๐ก +๐๐โ๐กโ1 + ๐f)
๐๐ก = ๐(๐๐๐ฅ๐ก +๐๐โ๐กโ1 + ๐i)
เทฉ๐ถ๐ก = tanh(๐๐๐ฅ๐ก +๐๐โ๐กโ1 + ๐c)
๐ถ๐ก = ๐๐ก โ๐ถ๐กโ1 + ๐๐ก โ เทฉ๐ถ๐ก
๐๐ก = ๐ ๐๐๐ฅ๐ก +๐๐โ๐กโ1 + ๐o
โ๐ก = ๐๐ก โ tanh(๐ถt)
Uf.xt Wf.ht-1+bf net_ft ft1.700 0.204 1.904 0.870
Ui.xt Wi.ht-1+bi net_it it2.875 1.079 3.954 0.981
Uc.xt Wc.ht-1+bc net_~ct ~ct0.975 0.280 1.255 0.850
Uo.xt Wo.ht-1+bo net_ot ot1.500 0.234 1.734 0.850
Ct ht1.518 0.772
https://medium.com/@aidangomez/let-s-do-this-f9b699de31d9
11
EDUNEX ITB
Implementing LSTM on Keras: Many to One
from keras import Sequential
from keras.layers import LSTM, Dense
model = Sequential()
model.add(LSTM(10, input_shape=(50,1)))
#10 neurons & process 50x1 sequences
model.add(Dense(1,activation='linearโ))
#linear output as regression problem
https://towardsdatascience.com/a-comprehensive-guide-to-working-with-recurrent-neural-networks-in-keras-f3b2d5e2fa7f
ิฆ๐ฅ (1)
โ (10)
๐ฆ (1)
U
V
W
# predict amazon stock closing prices, LSTM 50 timestep
12
EDUNEX ITB
Number of Parameter
ิฆ๐ฅ (1)
โ (10)
๐ฆ (1)
U
V
W
Total parameter = (1+10+1)*4*10+(10+1)*1=491
Simple RNN with equal networks: 131 parameterU: matrix hidden neurons x (input dimension + 1)W: matrix hidden neurons x hidden neuronsV: matrix output neurons x (hidden neurons+1)
13
Total parameter for n unit lstm from m-dimension input to k dimension output = (m+n+1)*4*n+(n+1)*k
EDUNEX ITB
RNN โ LSTM โ GRU โ ReGU
1985Recurrent nets
1997LSTMBi-RNN
2014GRU
2017Residual LSTM
2019Residual Gated Unit
GRU: no cell state, 2 gates
ReGU: shortcut connection
14
EDUNEX ITB
Summary
LSTMs avoid the long-term
dependency problem
LSTMs have a cell state and 3 gates
(forget, input, output)
Computing ht and ct
15
Backpropagation Through Time
EDUNEX ITB
03 RNN Architecture
Pembelajaran Mesin Lanjut(Advanced Machine Learning)
Masayu Leylia Khodra([email protected])
KK IF โ Teknik Informatika โ STEI ITB
Modul 4: Recurrent Neural Network
01
EDUNEX ITB
General Architecture
ิฆ๐ฅ (i)
โ1 (๐)
โโ (k)
๐ฆ (๐)
Uxh1
Uh1hโฆ
V
โฆ
Uhโฆhh
Wh1
Whโฆ
Whh
ิฆ๐ฅ1
โ (๐)
ิฆ๐ฅ2
โ (๐)
ิฆ๐ฅ๐
โ (๐)โฆ
n timestep
Return sequence = True/False
02
EDUNEX ITB
Architecture
fixed-sized input vector xt
fixed-sized output vector ot
RNN state st
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
One to many: image captioningMany to one: text classificationMany to many: machine translation, video frame classification, POS tagging
03
EDUNEX ITB
One to Many: Image Captioning
CNN Encoder (Inception) - RNN Decoder (LSTM) (Vinyals dkk., 2014)
04
EDUNEX ITB
Many to One: Text Classification
22https://www.oreilly.com/learning/perform-sentiment-analysis-with-lstms-using-tensorflow
05
EDUNEX ITB
Many to Many: Sequence Tagging
https://www.depends-on-the-definition.com/guide-sequence-tagging-neural-networks-python/
Input is a sequence of words, and output is the sequence of POS tag for each word.
23
06
EDUNEX ITB
Many to Many: Machine Translation
http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
โ Machine Translation: input is a sequence of words in source language (e.g. German). Output is a sequence of words in target language (e.g. English).
โ A key difference is that our output only starts after we have seen the complete input, because the first word of our translated sentences may require information captured from the complete input sequence.
24
07
EDUNEX ITB
Implementing RNN on Keras: Many to One
from keras import Sequential
from keras.layers import SimpleRNN, Dense
model = Sequential()
model.add(SimpleRNN(10, input_shape=(50,1)))
#simple recurrent layer, 10 neurons & process 5
0x1 sequences
model.add(Dense(1,activation='linear')) #linear
output because this is a regression problem
https://towardsdatascience.com/a-comprehensive-guide-to-working-with-recurrent-neural-networks-in-keras-f3b2d5e2fa7f
ิฆ๐ฅ (1)
โ (10)
๐ฆ (1)
U
V
W
# predict amazon stock closing prices, RNN 50 timestep
08
EDUNEX ITB
Number of Parameter
ิฆ๐ฅ (1)
โ (10)
๐ฆ (1)
U
V
W
Total parameter = (1+10+1)*10+(10+1)*1=131Simple RNN:U: matrix hidden neurons x (input dimension + 1)W: matrix hidden neurons x hidden neuronsV: matrix output neurons x (hidden neurons+1)
09
EDUNEX ITB
Number of Parameter: Example 2model = Sequential() #initialize model
model.add(SimpleRNN(64, input_shape=(50,1), return_sequences=True))#64 neurons
model.add(SimpleRNN(32, return_sequences=True))#32 neurons
model.add(SimpleRNN(16)) #16 neurons
model.add(Dense(8,activation='tanh'))
model.add(Dense(1,activation='linear'))
Total parameter = 8257
= (1+64+1)*64=4224
= (64+32+1)*32=3104
= (32+16+1)*16=784= (16+1)*8=136= (8+1)*1=9
10
EDUNEX ITB
Bidirectional RNNs
โข In many applications we want to output a prediction of y (t) which may depend on the whole input sequence. E.g. co-articulation in speech recognition, right neighbors in POS tagging, etc.
โข Bidirectional RNNs combine an RNN that moves forward through time beginning from the start of the sequence with another RNN that moves backward through time beginning from the end of the sequence. https://www.cs.toronto.edu/~tingwuwang/rnn_tutorial.pdf
11
EDUNEX ITB
Bidirectional RNNs for Information Extraction
https://www.depends-on-the-definition.com/sequence-tagging-lstm-crf/
29
12
EDUNEX ITB
Summary
Architecture: 1-to-n, n-to-1,
n-to-n
Number of parameter
RNN
Bidirectional RNN
10
LSTM
EDUNEX ITB
05 Backpropagation Through Time
Pembelajaran Mesin Lanjut(Advanced Machine Learning)
Masayu Leylia Khodra([email protected])
KK IF โ Teknik Informatika โ STEI ITB
Modul 4: Recurrent Neural Network
01
EDUNEX ITB
Backpropagation Through Time (BPTT)
Forward Passget sequence current
output
Backward Passcompute ๐ฟ๐๐๐ก๐๐ ๐ก, ๐ฟ๐ฅ๐ก, โ๐๐ข๐ก๐กโ1, ๐ฟU, ๐ฟW, ๐ฟb
Update Weights๐ค๐๐๐ค = ๐ค๐๐๐ โ . ๐ฟ๐ค๐๐๐
BPTT learning algorithm is an extension of standard backpropagation that performs gradients descent on an unfolded network.
02
EDUNEX ITB
Example
ิฆ๐ฅ (2)
โ (1)
๐๐ , ๐๐,
๐๐ , ๐๐
๐๐ ,๐๐,
๐๐ ,๐๐
unfold
ิฆ๐ฅ1 =12
[0.536]
ิฆ๐ฅ2 =0.53
[0.772]
0.5 1.25
U0.7 0.95 0.5 0.6
0.45 0.8 0.3 0.4
W0.100 0.800 0.150 0.250
03
0
EDUNEX ITB
LSTM: Backward Propagation Timestep t
๐๐ก = ๐(๐๐๐ฅ๐ก +๐๐โ๐กโ1 + ๐f)
๐๐ก = ๐(๐๐๐ฅ๐ก +๐๐โ๐กโ1 + ๐i)
เทฉ๐ถ๐ก = tanh(๐๐๐ฅ๐ก +๐๐โ๐กโ1 + ๐c)
๐ถ๐ก = ๐๐ก โ๐ถ๐กโ1 + ๐๐ก โ เทฉ๐ถ๐ก
๐๐ก = ๐ ๐๐๐ฅ๐ก +๐๐โ๐กโ1 + ๐o
โ๐ก = ๐๐ก โ tanh(๐ถt)
๐ฟ๐๐ข๐ก๐ก = โ๐ก + โ๐๐ข๐ก๐ก
๐ฟ๐ถ๐ก = ๐ฟ๐๐ข๐ก๐ก โ๐๐ก โ 1โ ๐ก๐๐โ2 ๐ถ๐ก + ๐ฟ๐ถ๐ก+1 โ๐๐ก+1
๐ฟ เทฉ๐ถ๐ก = ๐ฟ๐ถ๐ก โ ๐๐ก โ (1 โ เทฉ๐ถ๐ก2)
๐ฟ๐๐ก = ๐ฟ๐ถ๐ก โ เทฉ๐ถ๐ก โ ๐๐ก โ (1 โ ๐๐ก)
๐ฟ๐๐ก = ๐ฟ๐ถ๐ก โ๐ถ๐กโ1 โ๐๐ก โ 1โ ๐๐ก
๐ฟ๐๐ก = ๐ฟ๐๐ข๐ก๐ก โ tanh ๐ถ๐ก โ๐๐ก โ 1โ ๐๐ก
๐ฟ๐ฅ๐ก = ๐๐ . ๐ฟ๐๐๐ก๐๐ ๐ก
โ๐๐ข๐ก๐กโ1= ๐๐. ๐ฟ๐๐๐ก๐๐ ๐ก
04
EDUNEX ITB
Computing ๐ฟ๐๐๐ก๐๐ ๐ก for timestep t=2
Last timestep: โ๐๐ข๐ก๐ก= 0; ๐๐ก+1 = 0; ๐ฟ๐ถ๐ก+1 = 0
t2: ๐๐ธ
๐๐= 0.772 โ 1.25 = โ0.478โ ๐ฟ๐๐ข๐ก2 = โ0.478 + 0 = โ0.478
๐ฟ๐ถ2 = โ0.478 โ 0.85 โ 1 โ ๐ก๐๐โ2 1.518 + 0 โ 0 = โ0.071
๐ฟ๐2 = โ0.071 โ 0.786 โ 0.870 โ 1 โ 0.870 = โ0.006๐ฟ๐2 = โ0.071 โ 0.850 โ 0.981 โ 1 โ 0.981 = โ0.001๐ฟเทช๐ถ2 = โ0.071 โ 0.981 โ 1 โ 0.8502 = โ0.019๐ฟ๐2 = โ0.478 โ tanh 1.518 โ 0.850 โ 1 โ 0.850 = โ0.055
๐ฟ๐๐ก = ๐ฟ๐ถ๐ก โ เทฉ๐ถ๐ก โ ๐๐ก โ (1 โ ๐๐ก)๐ฟ๐๐ก = ๐ฟ๐ถ๐ก โ๐ถ๐กโ1 โ๐๐ก โ 1โ ๐๐ก๐ฟ๐๐ก = ๐ฟ๐๐ข๐ก๐ก โ tanh ๐ถ๐ก โ๐๐ก โ 1โ ๐๐ก
๐ฟ๐๐ข๐ก๐ก = โ๐ก + โ๐๐ข๐ก๐ก๐ฟ๐ถ๐ก = ๐ฟ๐๐ข๐ก๐ก โ๐๐ก โ 1โ ๐ก๐๐โ2 ๐ถ๐ก + ๐ฟ๐ถ๐ก+1 โ๐๐ก+1
๐ฟ เทฉ๐ถ๐ก = ๐ฟ๐ถ๐ก โ ๐๐ก โ (1 โ เทฉ๐ถ๐ก2)
๐ธ =1
2(๐ก๐๐๐๐๐ก โ โ)2 โ๐ก=
๐๐ธ
๐โ= โ(๐ก๐๐๐๐๐ก โ โ) = โ โ ๐ก๐๐๐๐๐ก
๐ฟ๐๐๐ก๐๐ 2 =
โ0.006โ0.001โ0.019โ0.055
05
EDUNEX ITB
Computing ๐ฟ๐ฅ2 and โ๐๐ข๐ก1 for timestep t=2
๐ฟ๐ฅ๐ก = ๐๐ . ๐ฟ๐๐๐ก๐๐ ๐กโ๐๐ข๐ก๐กโ1= ๐๐ . ๐ฟ๐๐๐ก๐๐ ๐ก
U0.7 0.95 0.5 0.6
0.45 0.8 0.3 0.4
dgates-0.006-0.001-0.019-0.055
dx2-0.047-0.030
W0.100 0.800 0.150 0.250
dout1-0.018
06
EDUNEX ITB
Computing for timestep t=1: โ๐๐ข๐ก1= โ0.018
๐ฟ๐๐ข๐ก1 = 0.036 โ 0.018 = 0.018๐ฟ๐ถ1 = โ0.053๐ฟ๐1 = 0๐ฟ๐1 = โ0.0017๐ฟเทช๐ถ1 = โ0.017๐ฟ๐1 = 0.0018
๐ฟ๐ฅ๐ก = ๐๐ . ๐ฟ๐๐๐ก๐๐ ๐กโ๐๐ข๐ก๐กโ1= ๐๐ . ๐ฟ๐๐๐ก๐๐ ๐ก
U0.7 0.95 0.5 0.6
0.45 0.8 0.3 0.4
W0.100 0.800 0.150 0.250
dgates0.0000
-0.0017-0.01700.0018
dx1-0.0082-0.0049
dout0-0.0035
07
EDUNEX ITB
Computing ๐ฟU, ๐ฟW, ๐ฟb
๐ฟ๐ =
๐ก=1
2
๐ฟ๐๐๐ก๐๐ ๐ก . ๐ฅ๐ก =
0.0โ0.0017โ0.01700.0018
1 2 +
โ0.006โ0.001โ0.019โ0.055
0.5 3 =
๐ฟ๐ = ฯ๐ก=12 ๐ฟ๐๐๐ก๐๐ ๐ก+1 . โ๐ก=
โ0.006โ0.001โ0.019โ0.055
[0.536]=
๐ฟ๐ =
๐ก=1
2
๐ฟ๐๐๐ก๐๐ ๐ก+1 =
dU-0.0032 -0.0189-0.0022 -0.0067-0.0267 -0.0922-0.0259 -0.1626
dW-0.0034-0.0006-0.0104-0.0297
db-0.00631-0.00277-0.03641-0.05362
08
EDUNEX ITB
Update Weights (=0.1) ๐ค๐๐๐ค = ๐ค๐๐๐ โ . ๐ฟ๐ค๐๐๐
dU-0.0032 -0.0189-0.0022 -0.0067-0.0267 -0.0922-0.0259 -0.1626
dW-0.0034-0.0006-0.0104-0.0297
db-0.00631-0.00277-0.03641
Unew0.7003 0.9502 0.4527 0.60260.4519 0.8007 0.2592 0.4163
Uold0.7 0.95 0.5 0.6
0.45 0.8 0.3 0.4
Wold0.100 0.800 0.150 0.250
Wnew0.1003 0.8001 0.1510 0.2530
bnew0.1506 0.6503 0.2036 0.1054
bold0.1500 0.6500 0.2000 0.1000
09
EDUNEX ITB
Truncated BPTT
https://deeplearning4j.org/docs/latest/deeplearning4j-nn-recurrent
41
Truncated BPTT was developed in order to reduce the computational complexity of each parameter update in a recurrent neural network.
10
EDUNEX ITB
Summary
Backpropagation through time for
LSTMTruncated BPTT
11
EDUNEX ITB