Skip to content

generator raised StopIteration when preprocessing when preprocessing #10

@voidism

Description

@voidism

Hi, thank you for releasing your code!
I ran the preprocessing code preprocess.py and meet a runtime error.

INFO:root:skip this step as /workspace/helo_word/data/conll2014 is NOT empty
INFO:root:STEP 0-8. Download language model
INFO:root:skip this step as /workspace/helo_word/data/language_model/data-bin is NOT empty
INFO:root:STEP 1. Word-tokenize the original files and merge them
INFO:root:STEP 1-1. gutenberg
INFO:root:skip this step as /workspace/helo_word/data/gutenberg/gutenberg.txt already exists
INFO:root:STEP 1-2. tatoeba
INFO:root:skip this step as /workspace/helo_word/data/tatoeba/tatoeba.txt already exists
INFO:root:STEP 1-3. wiki103
INFO:root:skip this step as /workspace/helo_word/data/wiki103/wiki103.txt already exists
INFO:root:STEP 2. Train bpe model
INFO:root:skip this step as /workspace/helo_word/data/bpe-model/gutenberg.model already exists
INFO:root:STEP 3. Split wi.dev into wi.dev.3k and wi.dev.1k
INFO:root:skip this step as /workspace/helo_word/data/bea19/wi+locness/m2/ABCN.dev.gold.bea19.3k.m2 already exists
INFO:root:STEP 4. Perturb and make parallel files
INFO:root:Track 1
INFO:root:STEP 4-1. writing perturbation scenario
INFO:root:STEP 4-2. gutenberg
# multiprocessing settings
# prepare inputs
# work
  0%|                                                                                        | 0/1 [00:00<?, ?it/s]
--- SKIP ---
  0%|                                                                                        | 0/1 [00:08<?, ?it/s]
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/pattern3/text/__init__.py", line 412, in _read
    raise StopIteration
StopIteration

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/opt/conda/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/workspace/helo_word/gec/perturb.py", line 160, in make_parallel
    perturbation = apply_perturbation(words, word2ptbs, word_change_prob, type_change_prob)
  File "/workspace/helo_word/gec/perturb.py", line 121, in apply_perturbation
    w = change_type(w, t, type_change_prob)
  File "/workspace/helo_word/gec/perturb.py", line 34, in change_type
    word = conjugate(word, verb_type)
  File "/opt/conda/lib/python3.7/site-packages/pattern3/text/__init__.py", line 2123, in conjugate
    b = self.lemma(verb, parse=kwargs.get("parse", True))
  File "/opt/conda/lib/python3.7/site-packages/pattern3/text/__init__.py", line 2088, in lemma
    self.load()
  File "/opt/conda/lib/python3.7/site-packages/pattern3/text/__init__.py", line 2042, in load
    for v in _read(self._path):
RuntimeError: generator raised StopIteration
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "preprocess.py", line 169, in <module>
    args.word_change_prob, args.type_change_prob))
  File "preprocess.py", line 15, in maybe_do
    func(*inputs)
  File "/workspace/helo_word/gec/perturb.py", line 183, in do
    p.map(make_parallel, inputs_li)
  File "/opt/conda/lib/python3.7/multiprocessing/pool.py", line 268, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/opt/conda/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
RuntimeError: generator raised StopIteration

I tried to skip processing gutenberg corpus, but the same error raised when processing the next corpus.
How can I fix it?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions