Skip to content

Commit 300f5c2

Browse files
output size 1
1 parent cea8f75 commit 300f5c2

File tree

1 file changed

+9
-9
lines changed

1 file changed

+9
-9
lines changed

docs/src/models/recurrence.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -72,9 +72,9 @@ Equivalent to the `RNN` stateful constructor, `LSTM` and `GRU` are also availabl
7272
Using these tools, we can now build the model shown in the above diagram with:
7373

7474
```julia
75-
m = Chain(RNN(2, 5), Dense(5, 2))
75+
m = Chain(RNN(2, 5), Dense(5, 1))
7676
```
77-
In this example, each output has to components.
77+
In this example, each output has two components.
7878

7979
## Working with sequences
8080

@@ -129,15 +129,14 @@ using Flux.Losses: mse
129129

130130
function loss(x, y)
131131
m(x[1]) # ignores the output but updates the hidden states
132-
l = sum(mse(m(xi), yi) for (xi, yi) in zip(x[2:end], y))
133-
return l
132+
sum(mse(m(xi), yi) for (xi, yi) in zip(x[2:end], y))
134133
end
135134

136-
y = [rand(Float32, 2) for i=1:2]
135+
y = [rand(Float32, 1) for i=1:2]
137136
loss(x, y)
138137
```
139138

140-
In such model, only the last two outputs are used to compute the loss, hence the target `y` being of length 2. This is a strategy that can be used to easily handle a `seq-to-one` kind of structure, compared to the `seq-to-seq` assumed so far.
139+
In such a model, only the last two outputs are used to compute the loss, hence the target `y` being of length 2. This is a strategy that can be used to easily handle a `seq-to-one` kind of structure, compared to the `seq-to-seq` assumed so far.
141140

142141
Alternatively, if one wants to perform some warmup of the sequence, it could be performed once, followed with a regular training where all the steps of the sequence would be considered for the gradient update:
143142

@@ -150,8 +149,8 @@ seq_init = [rand(Float32, 2)]
150149
seq_1 = [rand(Float32, 2) for i = 1:3]
151150
seq_2 = [rand(Float32, 2) for i = 1:3]
152151

153-
y1 = [rand(Float32, 2) for i = 1:3]
154-
y2 = [rand(Float32, 2) for i = 1:3]
152+
y1 = [rand(Float32, 1) for i = 1:3]
153+
y2 = [rand(Float32, 1) for i = 1:3]
155154

156155
X = [seq_1, seq_2]
157156
Y = [y1, y2]
@@ -172,7 +171,8 @@ In this scenario, it is important to note that a single continuous sequence is c
172171
Batch size would be 1 here as there's only a single sequence within each batch. If the model was to be trained on multiple independent sequences, then these sequences could be added to the input data as a second dimension. For example, in a language model, each batch would contain multiple independent sentences. In such scenario, if we set the batch size to 4, a single batch would be of the shape:
173172

174173
```julia
175-
batch = [rand(Float32, 2, 4) for i = 1:3]
174+
x = [rand(Float32, 2, 4) for i = 1:3]
175+
y = [rand(Float32, 1, 4) for i = 1:3]
176176
```
177177

178178
That would mean that we have 4 sentences (or samples), each with 2 features (let's say a very small embedding!) and each with a length of 3 (3 words per sentence). Computing `m(batch[1])`, would still represent `x1 -> y1` in our diagram and returns the first word output, but now for each of the 4 independent sentences (second dimension of the input matrix).

0 commit comments

Comments
 (0)