Skip to content
This repository was archived by the owner on Mar 1, 2022. It is now read-only.
This repository was archived by the owner on Mar 1, 2022. It is now read-only.

词频统计的问题 #70

@zjw271208550

Description

@zjw271208550

您好,我对 ngram_utils 的 get_ngram_freq_info 有些疑惑,请教一下:
为什么对于词频是否大于min_freq 的操作要在 _process_corpus_chunk 中进行?
假如每个 chunk 中各有一个 X,共10个 chunk ,那么即便 min_freq 设的是2 也不会统计到这个 X.
min_freq 是只对当前 chunk 的词频结果判断嘛?不应该是整个corpus?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions