Skip to content

Missing a layer normalization after a feedforward network #29

@baehyunsol

Description

@baehyunsol

femtoGPT is not applying a layer normalization after adding the result of FFN and attention vector.

femtoGPT implementation

link:

curr_inp = g.call(Add::new(), &[add_atten_norm, lin2_bias_result])?;

code:

for l in 0..num_layers {
    // .. multihead attention and some stuffs...

    let bias2_params = g.alloc(
    Tensor::<f32>::zeros(&[embedding_degree]),
        true,
        format!("feedforward2_{}_bias", l),
    )?;
    let lin2_result = g.call(MatMul::new(), &[lin1_act, lin2_params])?;
    let lin2_bias_result = g.call(Add::new(), &[lin2_result, bias2_params])?;

    // Why not normalize this result?
    curr_inp = g.call(Add::new(), &[add_atten_norm, lin2_bias_result])?;
}

paper

Image

In the paper, this part is Add & Norm, not Add.

Is it intentional or is it just a mistake? Or maybe it's my misunderstanding... Please correct me if I'm wrong.

EDIT: It seems like nanoGPT's attention block isn't normalizing the result of the addition. Maybe I've misread the paper...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions