-
Notifications
You must be signed in to change notification settings - Fork 99
Open
Description
Hi. I am trying to extract visual and audio features on raw video clips. For visual features,
python main.py stack_size=24 step_size=8 extraction_fps=25 feature_type=i3d
Eg. it gives 112x1024 dimensional rgb and flow features on converted 25fps video using above command.
But for audio features, after converting the video fps to 25
python main.py feature_type=vggish
produces features which don't match with that of visual feature in the first dimension
Eg. It gives 32x128 dim feature only.
Can you please tell what needs to be done so that I can get same 112x128 audio feature?
Thank you
Metadata
Metadata
Assignees
Labels
No labels