Skip to content

OutOfMemory / Misleading Hint #30

@AdamBien

Description

@AdamBien

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Follow the suggested Mac installation
  2. Use: Meta-Llama-3-8B-Instruct-Q4_0.gguf

Expected behavior
Model inference slightly faster than Llama3.java

Screenshots

TornadoVM GPU execution plan creation: 523,85 ms
Java to GPU JIT compiler warmup: 3019,81 ms
Exception in thread "main" uk.ac.manchester.tornado.api.exceptions.TornadoOutOfMemoryException: Unable to allocate 117440536 bytes of memory.
To increase the maximum device memory, use -Dtornado.device.memory=GB

at tornado.drivers.common@1.1.1-dev/uk.ac.manchester.tornado.drivers.common.TornadoBufferProvider.freeUnusedNativeBufferAndAssignRegion(TornadoBufferProvider.java:184)
at tornado.drivers.common@1.1.1-dev/uk.ac.manchester.tornado.drivers.common.TornadoBufferProvider.getOrAllocateBufferWithSize(TornadoBufferProvider.java:211)
at tornado.drivers.opencl@1.1.1-dev/uk.ac.manchester.tornado.drivers.opencl.mm.OCLMemorySegmentWrapper.allocate(OCLMemorySegmentWrapper.java:184)
at tornado.drivers.opencl@1.1.1-dev/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.newDeviceBufferAllocation(OCLTornadoDevice.java:617)
at tornado.drivers.opencl@1.1.1-dev/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.allocate(OCLTornadoDevice.java:630)
at tornado.drivers.opencl@1.1.1-dev/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.allocateObjects(OCLTornadoDevice.java:593)
at tornado.runtime@1.1.1-dev/uk.ac.manchester.tornado.runtime.interpreter.TornadoVMInterpreter.executeAlloc(TornadoVMInterpreter.java:499)
at tornado.runtime@1.1.1-dev/uk.ac.manchester.tornado.runtime.interpreter.TornadoVMInterpreter.execute(TornadoVMInterpreter.java:296)
at tornado.runtime@1.1.1-dev/uk.ac.manchester.tornado.runtime.interpreter.TornadoVMInterpreter.execute(TornadoVMInterpreter.java:1028)
at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:1024)
at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:762)
at tornado.runtime@1.1.1-dev/uk.ac.manchester.tornado.runtime.TornadoVM.executeInterpreterSingleThreaded(TornadoVM.java:127)

Desktop (please complete the following information):

  • OS: Sequoia 15.5
  • Installed Java OpenJDK Runtime Environment Corretto-21.0.0.35.1 (build 21+35-LTS

Fix

./llama-tornado --gpu-memory 96GB --gpu --verbose-init --opencl --model Meta-Llama-3-8B-Instruct-Q4_0.gguf --prompt "tell me a joke"

(using -gpu-memory 96GB instead of -Dtornado.device.memory=96GB)

LLama3.java is slightly faster:

java --enable-preview --source 21 --add-modules jdk.incubator.vector LLama3.java -i --model Meta-Llama-3-8B-Instruct-Q4_0.gguf

Note: LLama3.java uses preview features of Java SE 21.
Note: Recompile with -Xlint:preview for details.
Parse Meta-Llama-3-8B-Instruct-Q4_0.gguf: 417 millis
Load LlaMa model: 576 millis

tell me a joke
Here's one:

Why couldn't the bicycle stand up by itself?

(Wait for it...)

Because it was two-tired!

Hope that made you smile!
7,35 tokens/s (47)

vs.

./llama-tornado --gpu-memory 96GB --gpu --verbose-init --opencl --model Meta-Llama-3-8B-Instruct-Q4_0.gguf --prompt "tell me a joke"
WARNING: Using incubator modules: jdk.incubator.vector
Parse Meta-Llama-3-8B-Instruct-Q4_0.gguf: 403 millis
Loading model weights in TornadoVM format (loading Q4_0 -> F16)
Load LlaMa model: 19720 millis

Starting TornadoVM initialization...
TornadoVM GPU execution plan creation: 370,35 ms
Java to GPU JIT compiler warmup: 1105,70 ms
Transfer read-only weights to GPU: 2001,91 ms
Finished TornadoVM initialization...

Here's one:

Why couldn't the bicycle stand up by itself?

(wait for it...)

Because it was two-tired!

Hope that made you smile!

achieved tok/s: 5,10. Tokens: 46, seconds: 9,02

Sub-issues

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentationenhancementNew feature or request

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions