Skip to content

Any examples for RWKV7 generation usage? #133

Closed Answered by zhiyuan1i
lidh15 asked this question in Q&A
Discussion options

You must be logged in to vote

It's a known issue that it's slower than official RWKV7.
The first reason is triton based group norm and l2norm, which has been fixed.
The second reason is addcmul, fla combine 5 tensor into one tensor and xr, xw, xk, xv, xa, xg = hidden_states.addcmul(delta, self.x_x.view(6, 1, 1, -1)).unbind(0), which is much much slower. However, I'm trying to figure out how to fix this without breakup.

You can find examples here: https://huggingface.co/fla-hub/rwkv7-1.5B-world

Replies: 5 comments 10 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
7 replies
@lidh15
Comment options

@yzhangcs
Comment options

@lidh15
Comment options

@yzhangcs
Comment options

@lidh15
Comment options

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
3 replies
@lidh15
Comment options

@lidh15
Comment options

@lidh15
Comment options

Comment options

You must be logged in to vote
0 replies
Answer selected by zhiyuan1i
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants