Skip to content

关于"distilling from teacher’s higher levels adversely affects training of the student."的思考 #21

@littletomatodonkey

Description

@littletomatodonkey

今天重新读了一遍论文,对于下面不同stage的feature map,有一些蒸馏实验结果

image

并且有以下结论

image

个人认为这里的higher level并非是网络深度来定义的,而是由feature map分辨率来定义的(分辨率越小,level越higher),否则的话,相同stage下,teacher model的深度是大于student model的,想请教下作者关于这里的理解,也欢迎大家一起讨论~

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions