You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/src/models/recurrence.md
+9-9Lines changed: 9 additions & 9 deletions
Original file line number
Diff line number
Diff line change
@@ -72,9 +72,9 @@ Equivalent to the `RNN` stateful constructor, `LSTM` and `GRU` are also availabl
72
72
Using these tools, we can now build the model shown in the above diagram with:
73
73
74
74
```julia
75
-
m =Chain(RNN(2, 5), Dense(5, 2))
75
+
m =Chain(RNN(2, 5), Dense(5, 1))
76
76
```
77
-
In this example, each output has to components.
77
+
In this example, each output has two components.
78
78
79
79
## Working with sequences
80
80
@@ -129,15 +129,14 @@ using Flux.Losses: mse
129
129
130
130
functionloss(x, y)
131
131
m(x[1]) # ignores the output but updates the hidden states
132
-
l =sum(mse(m(xi), yi) for (xi, yi) inzip(x[2:end], y))
133
-
return l
132
+
sum(mse(m(xi), yi) for (xi, yi) inzip(x[2:end], y))
134
133
end
135
134
136
-
y = [rand(Float32, 2) for i=1:2]
135
+
y = [rand(Float32, 1) for i=1:2]
137
136
loss(x, y)
138
137
```
139
138
140
-
In such model, only the last two outputs are used to compute the loss, hence the target `y` being of length 2. This is a strategy that can be used to easily handle a `seq-to-one` kind of structure, compared to the `seq-to-seq` assumed so far.
139
+
In such a model, only the last two outputs are used to compute the loss, hence the target `y` being of length 2. This is a strategy that can be used to easily handle a `seq-to-one` kind of structure, compared to the `seq-to-seq` assumed so far.
141
140
142
141
Alternatively, if one wants to perform some warmup of the sequence, it could be performed once, followed with a regular training where all the steps of the sequence would be considered for the gradient update:
143
142
@@ -150,8 +149,8 @@ seq_init = [rand(Float32, 2)]
150
149
seq_1 = [rand(Float32, 2) for i =1:3]
151
150
seq_2 = [rand(Float32, 2) for i =1:3]
152
151
153
-
y1 = [rand(Float32, 2) for i =1:3]
154
-
y2 = [rand(Float32, 2) for i =1:3]
152
+
y1 = [rand(Float32, 1) for i =1:3]
153
+
y2 = [rand(Float32, 1) for i =1:3]
155
154
156
155
X = [seq_1, seq_2]
157
156
Y = [y1, y2]
@@ -172,7 +171,8 @@ In this scenario, it is important to note that a single continuous sequence is c
172
171
Batch size would be 1 here as there's only a single sequence within each batch. If the model was to be trained on multiple independent sequences, then these sequences could be added to the input data as a second dimension. For example, in a language model, each batch would contain multiple independent sentences. In such scenario, if we set the batch size to 4, a single batch would be of the shape:
173
172
174
173
```julia
175
-
batch = [rand(Float32, 2, 4) for i =1:3]
174
+
x = [rand(Float32, 2, 4) for i =1:3]
175
+
y = [rand(Float32, 1, 4) for i =1:3]
176
176
```
177
177
178
178
That would mean that we have 4 sentences (or samples), each with 2 features (let's say a very small embedding!) and each with a length of 3 (3 words per sentence). Computing `m(batch[1])`, would still represent `x1 -> y1` in our diagram and returns the first word output, but now for each of the 4 independent sentences (second dimension of the input matrix).
0 commit comments