I am confused,why gcc auto-vectorization is faster than neon instrinsics. I test open source code like NCNN .