Skip to content

Conversation

@wangtianxia-sjtu
Copy link

I am wondering why we need sorted topk in the model. The same code in deepseek v3 will explicitly use unsorted topk whiling choosing 8 from 256 experts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant