Skip to content

Panic (stack overflow) when encoding a certain string #245

Open
@Crazytieguy

Description

@Crazytieguy

Hi, I'm getting a panic when trying to encode the attached file with the gpt-4 tokenizer. This is from the AMPS dataset that was published along with the MATH dataset. Backtrace:

called `Result::unwrap()` on an `Err` value: RuntimeError(StackOverflow)
stack backtrace:
   0: rust_begin_unwind
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panicking.rs:597:5
   1: core::panicking::panic_fmt
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/panicking.rs:72:14
   2: core::result::unwrap_failed
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/result.rs:1652:5
   3: _tiktoken::CoreBPE::_encode_native
   4: _tiktoken::_::<impl _tiktoken::CoreBPE>::__pymethod_encode__
   5: pyo3::impl_::trampoline::fastcall_with_keywords
   6: _PyEval_EvalFrameDefault
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/Python/ceval.c:5258:29
   7: _PyEval_EvalFrame
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/./Include/internal/pycore_ceval.h:73:16
   8: _PyEval_Vector
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/Python/ceval.c:6439:24
   9: _PyFunction_Vectorcall
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/Objects/call.c:393:16
  10: _PyObject_VectorcallTstate
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/./Include/internal/pycore_call.h:92:11
  11: method_vectorcall
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/Objects/classobject.c:89:18
  12: do_call_core
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/Python/ceval.c:7357:12
  13: _PyEval_EvalFrameDefault
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/Python/ceval.c:5381:22
  14: _PyEval_EvalFrame
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/./Include/internal/pycore_ceval.h:73:16
  15: _PyEval_Vector
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/Python/ceval.c:6439:24
  16: _PyFunction_Vectorcall
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/Objects/call.c:393:16
  17: do_call_core
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/Python/ceval.c:7357:12
  18: _PyEval_EvalFrameDefault
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/Python/ceval.c:5381:22
  19: _PyEval_EvalFrame
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/./Include/internal/pycore_ceval.h:73:16
  20: _PyEval_Vector
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/Python/ceval.c:6439:24
  21: _PyFunction_Vectorcall
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/Objects/call.c:393:16
  22: _PyObject_VectorcallTstate
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/./Include/internal/pycore_call.h:92:11
  23: method_vectorcall
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/Objects/classobject.c:67:20
  24: thread_run
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/./Modules/_threadmodule.c:1092
  25: pythread_wrapper
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/Python/thread_pthread.h:241:5
  26: <unknown>
  27: <unknown>
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.```

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions