edunex itb

42
EDUNEX ITB

Upload: others

Post on 13-Apr-2022

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EDUNEX ITB

EDUNEX ITB

Page 2: EDUNEX ITB

EDUNEX ITB

IF4074 Minggu 9-15

โ€ข Minggu ke-9 (18 Oktober 2021): LSTM + RNN arsitektur + Tubes 2

โ€ข Minggu ke-10 (25 Oktober 2021): RNN Latihan + BPTT

โ€ข Minggu ke-11 (1 November 2021): Kuliah tamu (sharing Aplikasi ML di Gojek)

โ€ข Minggu ke-12 (8 November 2021): Praktikum RNN

โ€ข Minggu ke-13 (15 November 2021): Feature Engineering 1 / TugasDesain eksperimen

โ€ข Minggu ke-14 (22 November 2021): Kuis 2

โ€ข Minggu ke-15 (29 November 2021): Praktikum Feature Engineering 2

Page 3: EDUNEX ITB

EDUNEX ITB

04 LSTM: What & Why

Pembelajaran Mesin Lanjut(Advanced Machine Learning)

Masayu Leylia Khodra([email protected])

KK IF โ€“ Teknik Informatika โ€“ STEI ITB

Modul 4: Recurrent Neural Network

01

Page 4: EDUNEX ITB

EDUNEX ITB

Long Short-Term Memory (LSTM): Why

https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21

โ„Ž๐‘ก = ๐‘“(๐‘ˆ๐‘ฅ๐‘ก +๐‘Šโ„Ž๐‘กโˆ’1 + ๐‘xh)

๐‘ฆ๐‘ก = ๐‘“(๐‘‰โ„Ž๐‘ก + ๐‘hy)

RNN: long-term dependency problem

U

W V

๐‘ฅ๐‘ก

โ„Žt-1

โ„Žt

Suffer from short-term memory (forward propagation).

Suffer from vanishing gradient problem (backward propagation).RNNs fail to learn greater than 5-10 time steps.In the worst case, this may completely stop the neural network from further training.

02

Page 5: EDUNEX ITB

EDUNEX ITB

Long Short-Term Memory (LSTM): What

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

LSTMs are explicitly designed to avoid the long-term dependency problem.

Introduced by Hochreiter & Schmidhuber (1997)

LSTM is special kind of RNN. The differences are the operations within the LSTMโ€™s cells. RNN: repeating module have a very simple structure. LSTM: repeating module contains four interacting layers.

03

Page 6: EDUNEX ITB

EDUNEX ITB

LSTM: Cell State & Gatesโ€ข as the โ€œmemoryโ€ of the network

โ€ข act as a transport highway that transfers relevant information throughout the processing of the sequence.

Cell State

โ€ข decides what information should be thrown away or kept.

โ€ข Values closer to 0 means to forget, and closer to 1 means to keep.Forget Gate

โ€ข Decides what information is relevant to add from the current step

โ€ข update the cell state by hidden state and current inputInput Gate

โ€ข decides what the next hidden state should be.

โ€ข Hidden state contains information on previous inputs. The hidden state is also used for predictions.

Output Gate

https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21

04

Page 7: EDUNEX ITB

EDUNEX ITB

Forget Gate

https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21

๐‘“๐‘ก = ๐œŽ(๐‘ˆ๐‘“๐‘ฅ๐‘ก +๐‘Š๐‘“โ„Ž๐‘กโˆ’1 + ๐‘f)

Value 1 represents

โ€œcompletely keep thisโ€

while a 0 represents โ€œcompletely get rid of this.โ€

๐‘ฅ๐‘ก

โ„Ž t-1

๐‘t-1

๐‘“๐‘ก

05

Page 8: EDUNEX ITB

EDUNEX ITB

Input Gate

https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21

๐‘–๐‘ก = ๐œŽ(๐‘ˆ๐‘–๐‘ฅ๐‘ก +๐‘Š๐‘–โ„Ž๐‘กโˆ’1 + ๐‘i)

๐‘ฅ๐‘ก

โ„Ž t-1

๐‘t-1

๐‘“๐‘ก ๐‘–๐‘ก ว๐‘๐‘ก

เทฉ๐ถ๐‘ก = tanh(๐‘ˆ๐‘๐‘ฅ๐‘ก +๐‘Š๐‘โ„Ž๐‘กโˆ’1 + ๐‘c)

06

Page 9: EDUNEX ITB

EDUNEX ITB

Cell State

https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21

๐‘ฅ๐‘ก

โ„Ž t-1

๐‘t-1

๐‘“๐‘ก ๐‘–๐‘ก ว๐‘๐‘ก

๐ถ๐‘ก = ๐‘“๐‘ก โŠ™๐ถ๐‘กโˆ’1 + ๐‘–๐‘ก โŠ™ เทฉ๐ถ๐‘ก

07

Page 10: EDUNEX ITB

EDUNEX ITB

Output Gate

https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21

๐‘ฅ๐‘ก

โ„Ž t-1

๐‘t-1

๐‘“๐‘ก ๐‘–๐‘ก ว๐‘๐‘ก

๐‘œ๐‘ก = ๐œŽ ๐‘ˆ๐‘œ๐‘ฅ๐‘ก +๐‘Š๐‘œโ„Ž๐‘กโˆ’1 + ๐‘o

โ„Ž๐‘ก = ๐‘œ๐‘ก โŠ™ tanh(๐ถt)

08

Page 11: EDUNEX ITB

EDUNEX ITB

LSTM Forward Propagation: Example

https://medium.com/@aidangomez/let-s-do-this-f9b699de31d9

ิฆ๐‘ฅ (2)

โ„Ž (1)

๐‘ˆ๐‘“ , ๐‘ˆ๐‘–,

๐‘ˆ๐‘ , ๐‘ˆ๐‘œ

๐‘Š๐‘“ ,๐‘Š๐‘–,

๐‘Š๐‘ ,๐‘Š๐‘œ

A1 A2 Target

1 2 0.5

0.5 3 1

โ€ฆ

๐‘“๐‘ก = ๐œŽ(๐‘ˆ๐‘“๐‘ฅ๐‘ก +๐‘Š๐‘“โ„Ž๐‘กโˆ’1 + ๐‘f)

๐‘–๐‘ก = ๐œŽ(๐‘ˆ๐‘–๐‘ฅ๐‘ก +๐‘Š๐‘–โ„Ž๐‘กโˆ’1 + ๐‘i)

เทฉ๐ถ๐‘ก = tanh(๐‘ˆ๐‘๐‘ฅ๐‘ก +๐‘Š๐‘โ„Ž๐‘กโˆ’1 + ๐‘c)

๐ถ๐‘ก = ๐‘“๐‘ก โŠ™๐ถ๐‘กโˆ’1 + ๐‘–๐‘ก โŠ™ เทฉ๐ถ๐‘ก

๐‘œ๐‘ก = ๐œŽ ๐‘ˆ๐‘œ๐‘ฅ๐‘ก +๐‘Š๐‘œโ„Ž๐‘กโˆ’1 + ๐‘o

โ„Ž๐‘ก = ๐‘œ๐‘ก โŠ™ tanh(๐ถt)

Uf0.700 0.450

Ui0.950 0.800

Uc0.450 0.250

Uo0.600 0.400

Wf bf0.100 0.150

Wi bi0.800 0.650

Wc bc0.150 0.200

Wo bo0.250 0.100

ht-1 Ct-10 0

09

Page 12: EDUNEX ITB

EDUNEX ITB

Computing ht and ct : Timestep t1

t1=<12, 0.5> Uf.xt Wf.ht-1+bf net_ft ft

1.600 0.150 1.750 0.852

Ui.xt Wi.ht-1+bi net_it it2.550 0.650 3.200 0.961

Uc.xt Wc.ht-1+bc net_~ct ~ct0.950 0.200 1.150 0.818

Uo.xt Wo.ht-1+bo net_ot ot1.400 0.100 1.500 0.818

Ct ht0.786 0.536

ht-1 Ct-10 0

๐‘“๐‘ก = ๐œŽ(๐‘ˆ๐‘“๐‘ฅ๐‘ก +๐‘Š๐‘“โ„Ž๐‘กโˆ’1 + ๐‘f)

๐‘–๐‘ก = ๐œŽ(๐‘ˆ๐‘–๐‘ฅ๐‘ก +๐‘Š๐‘–โ„Ž๐‘กโˆ’1 + ๐‘i)

เทฉ๐ถ๐‘ก = tanh(๐‘ˆ๐‘๐‘ฅ๐‘ก +๐‘Š๐‘โ„Ž๐‘กโˆ’1 + ๐‘c)

๐ถ๐‘ก = ๐‘“๐‘ก โŠ™๐ถ๐‘กโˆ’1 + ๐‘–๐‘ก โŠ™ เทฉ๐ถ๐‘ก

๐‘œ๐‘ก = ๐œŽ ๐‘ˆ๐‘œ๐‘ฅ๐‘ก +๐‘Š๐‘œโ„Ž๐‘กโˆ’1 + ๐‘o

โ„Ž๐‘ก = ๐‘œ๐‘ก โŠ™ tanh(๐ถt)

https://medium.com/@aidangomez/let-s-do-this-f9b699de31d9

10

Page 13: EDUNEX ITB

EDUNEX ITB

Computing ht and ct : Timestep t2

t2=<0.53

, 1>

ht-1 Ct-10.786 0.536

๐‘“๐‘ก = ๐œŽ(๐‘ˆ๐‘“๐‘ฅ๐‘ก +๐‘Š๐‘“โ„Ž๐‘กโˆ’1 + ๐‘f)

๐‘–๐‘ก = ๐œŽ(๐‘ˆ๐‘–๐‘ฅ๐‘ก +๐‘Š๐‘–โ„Ž๐‘กโˆ’1 + ๐‘i)

เทฉ๐ถ๐‘ก = tanh(๐‘ˆ๐‘๐‘ฅ๐‘ก +๐‘Š๐‘โ„Ž๐‘กโˆ’1 + ๐‘c)

๐ถ๐‘ก = ๐‘“๐‘ก โŠ™๐ถ๐‘กโˆ’1 + ๐‘–๐‘ก โŠ™ เทฉ๐ถ๐‘ก

๐‘œ๐‘ก = ๐œŽ ๐‘ˆ๐‘œ๐‘ฅ๐‘ก +๐‘Š๐‘œโ„Ž๐‘กโˆ’1 + ๐‘o

โ„Ž๐‘ก = ๐‘œ๐‘ก โŠ™ tanh(๐ถt)

Uf.xt Wf.ht-1+bf net_ft ft1.700 0.204 1.904 0.870

Ui.xt Wi.ht-1+bi net_it it2.875 1.079 3.954 0.981

Uc.xt Wc.ht-1+bc net_~ct ~ct0.975 0.280 1.255 0.850

Uo.xt Wo.ht-1+bo net_ot ot1.500 0.234 1.734 0.850

Ct ht1.518 0.772

https://medium.com/@aidangomez/let-s-do-this-f9b699de31d9

11

Page 14: EDUNEX ITB

EDUNEX ITB

Implementing LSTM on Keras: Many to One

from keras import Sequential

from keras.layers import LSTM, Dense

model = Sequential()

model.add(LSTM(10, input_shape=(50,1)))

#10 neurons & process 50x1 sequences

model.add(Dense(1,activation='linearโ€™))

#linear output as regression problem

https://towardsdatascience.com/a-comprehensive-guide-to-working-with-recurrent-neural-networks-in-keras-f3b2d5e2fa7f

ิฆ๐‘ฅ (1)

โ„Ž (10)

๐‘ฆ (1)

U

V

W

# predict amazon stock closing prices, LSTM 50 timestep

12

Page 15: EDUNEX ITB

EDUNEX ITB

Number of Parameter

ิฆ๐‘ฅ (1)

โ„Ž (10)

๐‘ฆ (1)

U

V

W

Total parameter = (1+10+1)*4*10+(10+1)*1=491

Simple RNN with equal networks: 131 parameterU: matrix hidden neurons x (input dimension + 1)W: matrix hidden neurons x hidden neuronsV: matrix output neurons x (hidden neurons+1)

13

Total parameter for n unit lstm from m-dimension input to k dimension output = (m+n+1)*4*n+(n+1)*k

Page 16: EDUNEX ITB

EDUNEX ITB

RNN โ†’ LSTM โ†’ GRU โ†’ ReGU

1985Recurrent nets

1997LSTMBi-RNN

2014GRU

2017Residual LSTM

2019Residual Gated Unit

GRU: no cell state, 2 gates

ReGU: shortcut connection

14

Page 17: EDUNEX ITB

EDUNEX ITB

Summary

LSTMs avoid the long-term

dependency problem

LSTMs have a cell state and 3 gates

(forget, input, output)

Computing ht and ct

15

Backpropagation Through Time

Page 18: EDUNEX ITB

EDUNEX ITB

03 RNN Architecture

Pembelajaran Mesin Lanjut(Advanced Machine Learning)

Masayu Leylia Khodra([email protected])

KK IF โ€“ Teknik Informatika โ€“ STEI ITB

Modul 4: Recurrent Neural Network

01

Page 19: EDUNEX ITB

EDUNEX ITB

General Architecture

ิฆ๐‘ฅ (i)

โ„Ž1 (๐‘—)

โ„Žโ„Ž (k)

๐‘ฆ (๐‘š)

Uxh1

Uh1hโ€ฆ

V

โ€ฆ

Uhโ€ฆhh

Wh1

Whโ€ฆ

Whh

ิฆ๐‘ฅ1

โ„Ž (๐‘—)

ิฆ๐‘ฅ2

โ„Ž (๐‘—)

ิฆ๐‘ฅ๐‘›

โ„Ž (๐‘—)โ€ฆ

n timestep

Return sequence = True/False

02

Page 20: EDUNEX ITB

EDUNEX ITB

Architecture

fixed-sized input vector xt

fixed-sized output vector ot

RNN state st

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

One to many: image captioningMany to one: text classificationMany to many: machine translation, video frame classification, POS tagging

03

Page 21: EDUNEX ITB

EDUNEX ITB

One to Many: Image Captioning

CNN Encoder (Inception) - RNN Decoder (LSTM) (Vinyals dkk., 2014)

04

Page 22: EDUNEX ITB

EDUNEX ITB

Many to One: Text Classification

22https://www.oreilly.com/learning/perform-sentiment-analysis-with-lstms-using-tensorflow

05

Page 23: EDUNEX ITB

EDUNEX ITB

Many to Many: Sequence Tagging

https://www.depends-on-the-definition.com/guide-sequence-tagging-neural-networks-python/

Input is a sequence of words, and output is the sequence of POS tag for each word.

23

06

Page 24: EDUNEX ITB

EDUNEX ITB

Many to Many: Machine Translation

http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/

โ— Machine Translation: input is a sequence of words in source language (e.g. German). Output is a sequence of words in target language (e.g. English).

โ— A key difference is that our output only starts after we have seen the complete input, because the first word of our translated sentences may require information captured from the complete input sequence.

24

07

Page 25: EDUNEX ITB

EDUNEX ITB

Implementing RNN on Keras: Many to One

from keras import Sequential

from keras.layers import SimpleRNN, Dense

model = Sequential()

model.add(SimpleRNN(10, input_shape=(50,1)))

#simple recurrent layer, 10 neurons & process 5

0x1 sequences

model.add(Dense(1,activation='linear')) #linear

output because this is a regression problem

https://towardsdatascience.com/a-comprehensive-guide-to-working-with-recurrent-neural-networks-in-keras-f3b2d5e2fa7f

ิฆ๐‘ฅ (1)

โ„Ž (10)

๐‘ฆ (1)

U

V

W

# predict amazon stock closing prices, RNN 50 timestep

08

Page 26: EDUNEX ITB

EDUNEX ITB

Number of Parameter

ิฆ๐‘ฅ (1)

โ„Ž (10)

๐‘ฆ (1)

U

V

W

Total parameter = (1+10+1)*10+(10+1)*1=131Simple RNN:U: matrix hidden neurons x (input dimension + 1)W: matrix hidden neurons x hidden neuronsV: matrix output neurons x (hidden neurons+1)

09

Page 27: EDUNEX ITB

EDUNEX ITB

Number of Parameter: Example 2model = Sequential() #initialize model

model.add(SimpleRNN(64, input_shape=(50,1), return_sequences=True))#64 neurons

model.add(SimpleRNN(32, return_sequences=True))#32 neurons

model.add(SimpleRNN(16)) #16 neurons

model.add(Dense(8,activation='tanh'))

model.add(Dense(1,activation='linear'))

Total parameter = 8257

= (1+64+1)*64=4224

= (64+32+1)*32=3104

= (32+16+1)*16=784= (16+1)*8=136= (8+1)*1=9

10

Page 28: EDUNEX ITB

EDUNEX ITB

Bidirectional RNNs

โ€ข In many applications we want to output a prediction of y (t) which may depend on the whole input sequence. E.g. co-articulation in speech recognition, right neighbors in POS tagging, etc.

โ€ข Bidirectional RNNs combine an RNN that moves forward through time beginning from the start of the sequence with another RNN that moves backward through time beginning from the end of the sequence. https://www.cs.toronto.edu/~tingwuwang/rnn_tutorial.pdf

11

Page 29: EDUNEX ITB

EDUNEX ITB

Bidirectional RNNs for Information Extraction

https://www.depends-on-the-definition.com/sequence-tagging-lstm-crf/

29

12

Page 30: EDUNEX ITB

EDUNEX ITB

Summary

Architecture: 1-to-n, n-to-1,

n-to-n

Number of parameter

RNN

Bidirectional RNN

10

LSTM

Page 31: EDUNEX ITB

EDUNEX ITB

05 Backpropagation Through Time

Pembelajaran Mesin Lanjut(Advanced Machine Learning)

Masayu Leylia Khodra([email protected])

KK IF โ€“ Teknik Informatika โ€“ STEI ITB

Modul 4: Recurrent Neural Network

01

Page 32: EDUNEX ITB

EDUNEX ITB

Backpropagation Through Time (BPTT)

Forward Passget sequence current

output

Backward Passcompute ๐›ฟ๐‘”๐‘Ž๐‘ก๐‘’๐‘ ๐‘ก, ๐›ฟ๐‘ฅ๐‘ก, โˆ†๐‘œ๐‘ข๐‘ก๐‘กโˆ’1, ๐›ฟU, ๐›ฟW, ๐›ฟb

Update Weights๐‘ค๐‘›๐‘’๐‘ค = ๐‘ค๐‘œ๐‘™๐‘‘ โˆ’ . ๐›ฟ๐‘ค๐‘œ๐‘™๐‘‘

BPTT learning algorithm is an extension of standard backpropagation that performs gradients descent on an unfolded network.

02

Page 33: EDUNEX ITB

EDUNEX ITB

Example

ิฆ๐‘ฅ (2)

โ„Ž (1)

๐‘ˆ๐‘“ , ๐‘ˆ๐‘–,

๐‘ˆ๐‘ , ๐‘ˆ๐‘œ

๐‘Š๐‘“ ,๐‘Š๐‘–,

๐‘Š๐‘ ,๐‘Š๐‘œ

unfold

ิฆ๐‘ฅ1 =12

[0.536]

ิฆ๐‘ฅ2 =0.53

[0.772]

0.5 1.25

U0.7 0.95 0.5 0.6

0.45 0.8 0.3 0.4

W0.100 0.800 0.150 0.250

03

0

Page 34: EDUNEX ITB

EDUNEX ITB

LSTM: Backward Propagation Timestep t

๐‘“๐‘ก = ๐œŽ(๐‘ˆ๐‘“๐‘ฅ๐‘ก +๐‘Š๐‘“โ„Ž๐‘กโˆ’1 + ๐‘f)

๐‘–๐‘ก = ๐œŽ(๐‘ˆ๐‘–๐‘ฅ๐‘ก +๐‘Š๐‘–โ„Ž๐‘กโˆ’1 + ๐‘i)

เทฉ๐ถ๐‘ก = tanh(๐‘ˆ๐‘๐‘ฅ๐‘ก +๐‘Š๐‘โ„Ž๐‘กโˆ’1 + ๐‘c)

๐ถ๐‘ก = ๐‘“๐‘ก โŠ™๐ถ๐‘กโˆ’1 + ๐‘–๐‘ก โŠ™ เทฉ๐ถ๐‘ก

๐‘œ๐‘ก = ๐œŽ ๐‘ˆ๐‘œ๐‘ฅ๐‘ก +๐‘Š๐‘œโ„Ž๐‘กโˆ’1 + ๐‘o

โ„Ž๐‘ก = ๐‘œ๐‘ก โŠ™ tanh(๐ถt)

๐›ฟ๐‘œ๐‘ข๐‘ก๐‘ก = โˆ†๐‘ก + โˆ†๐‘œ๐‘ข๐‘ก๐‘ก

๐›ฟ๐ถ๐‘ก = ๐›ฟ๐‘œ๐‘ข๐‘ก๐‘ก โŠ™๐‘œ๐‘ก โŠ™ 1โˆ’ ๐‘ก๐‘Ž๐‘›โ„Ž2 ๐ถ๐‘ก + ๐›ฟ๐ถ๐‘ก+1 โŠ™๐‘“๐‘ก+1

๐›ฟ เทฉ๐ถ๐‘ก = ๐›ฟ๐ถ๐‘ก โŠ™ ๐‘–๐‘ก โŠ™ (1 โˆ’ เทฉ๐ถ๐‘ก2)

๐›ฟ๐‘–๐‘ก = ๐›ฟ๐ถ๐‘ก โŠ™ เทฉ๐ถ๐‘ก โŠ™ ๐‘–๐‘ก โŠ™ (1 โˆ’ ๐‘–๐‘ก)

๐›ฟ๐‘“๐‘ก = ๐›ฟ๐ถ๐‘ก โŠ™๐ถ๐‘กโˆ’1 โŠ™๐‘“๐‘ก โŠ™ 1โˆ’ ๐‘“๐‘ก

๐›ฟ๐‘œ๐‘ก = ๐›ฟ๐‘œ๐‘ข๐‘ก๐‘ก โŠ™ tanh ๐ถ๐‘ก โŠ™๐‘œ๐‘ก โŠ™ 1โˆ’ ๐‘œ๐‘ก

๐›ฟ๐‘ฅ๐‘ก = ๐‘ˆ๐‘‡ . ๐›ฟ๐‘”๐‘Ž๐‘ก๐‘’๐‘ ๐‘ก

โˆ†๐‘œ๐‘ข๐‘ก๐‘กโˆ’1= ๐‘Š๐‘‡. ๐›ฟ๐‘”๐‘Ž๐‘ก๐‘’๐‘ ๐‘ก

04

Page 35: EDUNEX ITB

EDUNEX ITB

Computing ๐›ฟ๐‘”๐‘Ž๐‘ก๐‘’๐‘ ๐‘ก for timestep t=2

Last timestep: โˆ†๐‘œ๐‘ข๐‘ก๐‘ก= 0; ๐‘“๐‘ก+1 = 0; ๐›ฟ๐ถ๐‘ก+1 = 0

t2: ๐œ•๐ธ

๐œ•๐‘œ= 0.772 โˆ’ 1.25 = โˆ’0.478โ†’ ๐›ฟ๐‘œ๐‘ข๐‘ก2 = โˆ’0.478 + 0 = โˆ’0.478

๐›ฟ๐ถ2 = โˆ’0.478 โˆ— 0.85 โˆ— 1 โˆ’ ๐‘ก๐‘Ž๐‘›โ„Ž2 1.518 + 0 โˆ— 0 = โˆ’0.071

๐›ฟ๐‘“2 = โˆ’0.071 โˆ— 0.786 โˆ— 0.870 โˆ— 1 โˆ’ 0.870 = โˆ’0.006๐›ฟ๐‘–2 = โˆ’0.071 โˆ— 0.850 โˆ— 0.981 โˆ— 1 โˆ’ 0.981 = โˆ’0.001๐›ฟเทช๐ถ2 = โˆ’0.071 โˆ— 0.981 โˆ— 1 โˆ’ 0.8502 = โˆ’0.019๐›ฟ๐‘œ2 = โˆ’0.478 โˆ— tanh 1.518 โˆ— 0.850 โˆ— 1 โˆ’ 0.850 = โˆ’0.055

๐›ฟ๐‘–๐‘ก = ๐›ฟ๐ถ๐‘ก โŠ™ เทฉ๐ถ๐‘ก โŠ™ ๐‘–๐‘ก โŠ™ (1 โˆ’ ๐‘–๐‘ก)๐›ฟ๐‘“๐‘ก = ๐›ฟ๐ถ๐‘ก โŠ™๐ถ๐‘กโˆ’1 โŠ™๐‘“๐‘ก โŠ™ 1โˆ’ ๐‘“๐‘ก๐›ฟ๐‘œ๐‘ก = ๐›ฟ๐‘œ๐‘ข๐‘ก๐‘ก โŠ™ tanh ๐ถ๐‘ก โŠ™๐‘œ๐‘ก โŠ™ 1โˆ’ ๐‘œ๐‘ก

๐›ฟ๐‘œ๐‘ข๐‘ก๐‘ก = โˆ†๐‘ก + โˆ†๐‘œ๐‘ข๐‘ก๐‘ก๐›ฟ๐ถ๐‘ก = ๐›ฟ๐‘œ๐‘ข๐‘ก๐‘ก โŠ™๐‘œ๐‘ก โŠ™ 1โˆ’ ๐‘ก๐‘Ž๐‘›โ„Ž2 ๐ถ๐‘ก + ๐›ฟ๐ถ๐‘ก+1 โŠ™๐‘“๐‘ก+1

๐›ฟ เทฉ๐ถ๐‘ก = ๐›ฟ๐ถ๐‘ก โŠ™ ๐‘–๐‘ก โŠ™ (1 โˆ’ เทฉ๐ถ๐‘ก2)

๐ธ =1

2(๐‘ก๐‘Ž๐‘Ÿ๐‘”๐‘’๐‘ก โˆ’ โ„Ž)2 โˆ†๐‘ก=

๐œ•๐ธ

๐œ•โ„Ž= โˆ’(๐‘ก๐‘Ž๐‘Ÿ๐‘”๐‘’๐‘ก โˆ’ โ„Ž) = โ„Ž โˆ’ ๐‘ก๐‘Ž๐‘Ÿ๐‘”๐‘’๐‘ก

๐›ฟ๐‘”๐‘Ž๐‘ก๐‘’๐‘ 2 =

โˆ’0.006โˆ’0.001โˆ’0.019โˆ’0.055

05

Page 36: EDUNEX ITB

EDUNEX ITB

Computing ๐›ฟ๐‘ฅ2 and โˆ†๐‘œ๐‘ข๐‘ก1 for timestep t=2

๐›ฟ๐‘ฅ๐‘ก = ๐‘ˆ๐‘‡ . ๐›ฟ๐‘”๐‘Ž๐‘ก๐‘’๐‘ ๐‘กโˆ†๐‘œ๐‘ข๐‘ก๐‘กโˆ’1= ๐‘Š๐‘‡ . ๐›ฟ๐‘”๐‘Ž๐‘ก๐‘’๐‘ ๐‘ก

U0.7 0.95 0.5 0.6

0.45 0.8 0.3 0.4

dgates-0.006-0.001-0.019-0.055

dx2-0.047-0.030

W0.100 0.800 0.150 0.250

dout1-0.018

06

Page 37: EDUNEX ITB

EDUNEX ITB

Computing for timestep t=1: โˆ†๐‘œ๐‘ข๐‘ก1= โˆ’0.018

๐›ฟ๐‘œ๐‘ข๐‘ก1 = 0.036 โˆ’ 0.018 = 0.018๐›ฟ๐ถ1 = โˆ’0.053๐›ฟ๐‘“1 = 0๐›ฟ๐‘–1 = โˆ’0.0017๐›ฟเทช๐ถ1 = โˆ’0.017๐›ฟ๐‘œ1 = 0.0018

๐›ฟ๐‘ฅ๐‘ก = ๐‘ˆ๐‘‡ . ๐›ฟ๐‘”๐‘Ž๐‘ก๐‘’๐‘ ๐‘กโˆ†๐‘œ๐‘ข๐‘ก๐‘กโˆ’1= ๐‘Š๐‘‡ . ๐›ฟ๐‘”๐‘Ž๐‘ก๐‘’๐‘ ๐‘ก

U0.7 0.95 0.5 0.6

0.45 0.8 0.3 0.4

W0.100 0.800 0.150 0.250

dgates0.0000

-0.0017-0.01700.0018

dx1-0.0082-0.0049

dout0-0.0035

07

Page 38: EDUNEX ITB

EDUNEX ITB

Computing ๐›ฟU, ๐›ฟW, ๐›ฟb

๐›ฟ๐‘ˆ =

๐‘ก=1

2

๐›ฟ๐‘”๐‘Ž๐‘ก๐‘’๐‘ ๐‘ก . ๐‘ฅ๐‘ก =

0.0โˆ’0.0017โˆ’0.01700.0018

1 2 +

โˆ’0.006โˆ’0.001โˆ’0.019โˆ’0.055

0.5 3 =

๐›ฟ๐‘Š = ฯƒ๐‘ก=12 ๐›ฟ๐‘”๐‘Ž๐‘ก๐‘’๐‘ ๐‘ก+1 . โ„Ž๐‘ก=

โˆ’0.006โˆ’0.001โˆ’0.019โˆ’0.055

[0.536]=

๐›ฟ๐‘ =

๐‘ก=1

2

๐›ฟ๐‘”๐‘Ž๐‘ก๐‘’๐‘ ๐‘ก+1 =

dU-0.0032 -0.0189-0.0022 -0.0067-0.0267 -0.0922-0.0259 -0.1626

dW-0.0034-0.0006-0.0104-0.0297

db-0.00631-0.00277-0.03641-0.05362

08

Page 39: EDUNEX ITB

EDUNEX ITB

Update Weights (=0.1) ๐‘ค๐‘›๐‘’๐‘ค = ๐‘ค๐‘œ๐‘™๐‘‘ โˆ’ . ๐›ฟ๐‘ค๐‘œ๐‘™๐‘‘

dU-0.0032 -0.0189-0.0022 -0.0067-0.0267 -0.0922-0.0259 -0.1626

dW-0.0034-0.0006-0.0104-0.0297

db-0.00631-0.00277-0.03641

Unew0.7003 0.9502 0.4527 0.60260.4519 0.8007 0.2592 0.4163

Uold0.7 0.95 0.5 0.6

0.45 0.8 0.3 0.4

Wold0.100 0.800 0.150 0.250

Wnew0.1003 0.8001 0.1510 0.2530

bnew0.1506 0.6503 0.2036 0.1054

bold0.1500 0.6500 0.2000 0.1000

09

Page 40: EDUNEX ITB

EDUNEX ITB

Truncated BPTT

https://deeplearning4j.org/docs/latest/deeplearning4j-nn-recurrent

41

Truncated BPTT was developed in order to reduce the computational complexity of each parameter update in a recurrent neural network.

10

Page 41: EDUNEX ITB

EDUNEX ITB

Summary

Backpropagation through time for

LSTMTruncated BPTT

11

Page 42: EDUNEX ITB

EDUNEX ITB