-
Notifications
You must be signed in to change notification settings - Fork 13
Open
Description
I read the paper and have a question on how to assign a reward to the extractor? The reasoner gets the reward 1 if it reaches the correct target entity, and the intermediate reward is 0. You mention that the extractor receives the reward from the reasoner in the step-wise. But how can reasoner give the extractor reward in each of the steps, since the reasoner can only get the reward in the end-step?
Metadata
Metadata
Assignees
Labels
No labels