Open
Description
Hi, I'm getting a panic when trying to encode the attached file with the gpt-4 tokenizer. This is from the AMPS dataset that was published along with the MATH dataset. Backtrace:
called `Result::unwrap()` on an `Err` value: RuntimeError(StackOverflow)
stack backtrace:
0: rust_begin_unwind
at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panicking.rs:597:5
1: core::panicking::panic_fmt
at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/panicking.rs:72:14
2: core::result::unwrap_failed
at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/result.rs:1652:5
3: _tiktoken::CoreBPE::_encode_native
4: _tiktoken::_::<impl _tiktoken::CoreBPE>::__pymethod_encode__
5: pyo3::impl_::trampoline::fastcall_with_keywords
6: _PyEval_EvalFrameDefault
at /tmp/python-build.20230808162458.7883/Python-3.11.4/Python/ceval.c:5258:29
7: _PyEval_EvalFrame
at /tmp/python-build.20230808162458.7883/Python-3.11.4/./Include/internal/pycore_ceval.h:73:16
8: _PyEval_Vector
at /tmp/python-build.20230808162458.7883/Python-3.11.4/Python/ceval.c:6439:24
9: _PyFunction_Vectorcall
at /tmp/python-build.20230808162458.7883/Python-3.11.4/Objects/call.c:393:16
10: _PyObject_VectorcallTstate
at /tmp/python-build.20230808162458.7883/Python-3.11.4/./Include/internal/pycore_call.h:92:11
11: method_vectorcall
at /tmp/python-build.20230808162458.7883/Python-3.11.4/Objects/classobject.c:89:18
12: do_call_core
at /tmp/python-build.20230808162458.7883/Python-3.11.4/Python/ceval.c:7357:12
13: _PyEval_EvalFrameDefault
at /tmp/python-build.20230808162458.7883/Python-3.11.4/Python/ceval.c:5381:22
14: _PyEval_EvalFrame
at /tmp/python-build.20230808162458.7883/Python-3.11.4/./Include/internal/pycore_ceval.h:73:16
15: _PyEval_Vector
at /tmp/python-build.20230808162458.7883/Python-3.11.4/Python/ceval.c:6439:24
16: _PyFunction_Vectorcall
at /tmp/python-build.20230808162458.7883/Python-3.11.4/Objects/call.c:393:16
17: do_call_core
at /tmp/python-build.20230808162458.7883/Python-3.11.4/Python/ceval.c:7357:12
18: _PyEval_EvalFrameDefault
at /tmp/python-build.20230808162458.7883/Python-3.11.4/Python/ceval.c:5381:22
19: _PyEval_EvalFrame
at /tmp/python-build.20230808162458.7883/Python-3.11.4/./Include/internal/pycore_ceval.h:73:16
20: _PyEval_Vector
at /tmp/python-build.20230808162458.7883/Python-3.11.4/Python/ceval.c:6439:24
21: _PyFunction_Vectorcall
at /tmp/python-build.20230808162458.7883/Python-3.11.4/Objects/call.c:393:16
22: _PyObject_VectorcallTstate
at /tmp/python-build.20230808162458.7883/Python-3.11.4/./Include/internal/pycore_call.h:92:11
23: method_vectorcall
at /tmp/python-build.20230808162458.7883/Python-3.11.4/Objects/classobject.c:67:20
24: thread_run
at /tmp/python-build.20230808162458.7883/Python-3.11.4/./Modules/_threadmodule.c:1092
25: pythread_wrapper
at /tmp/python-build.20230808162458.7883/Python-3.11.4/Python/thread_pthread.h:241:5
26: <unknown>
27: <unknown>
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.```
Metadata
Metadata
Assignees
Labels
No labels