Clarification on the exact meaning of “attention score” in Sec 2.1 of the DeepSeek-V3.2-Exp tech report

I’m digging into Section 2.1 of the V3.2-Exp technical report and noticed the term “attention score” is used in several places.
Could you please confirm its precise definition in this context?

1. Is it the raw pre-softmax query-key dot product?
2. The post-softmax probability weight?
3. Or something else (e.g., a scaled/renormalized value introduced in V3.2)?

A short clarification would help a lot for reproducing the experiments. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clarification on the exact meaning of “attention score” in Sec 2.1 of the DeepSeek-V3.2-Exp tech report #42

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Clarification on the exact meaning of “attention score” in Sec 2.1 of the DeepSeek-V3.2-Exp tech report #42

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions