OutOfMemory / Misleading Hint

**Describe the bug**
A clear and concise description of what the bug is.

**To Reproduce**
Steps to reproduce the behavior:
1. Follow the suggested Mac installation
2. Use: Meta-Llama-3-8B-Instruct-Q4_0.gguf

**Expected behavior**
Model inference slightly faster than Llama3.java

**Screenshots**

TornadoVM GPU execution plan creation: 523,85 ms
Java to GPU JIT compiler warmup: 3019,81 ms
Exception in thread "main" uk.ac.manchester.tornado.api.exceptions.TornadoOutOfMemoryException: Unable to allocate 117440536 bytes of memory.
	To increase the maximum device memory, use -Dtornado.device.memory=<X>GB

	at tornado.drivers.common@1.1.1-dev/uk.ac.manchester.tornado.drivers.common.TornadoBufferProvider.freeUnusedNativeBufferAndAssignRegion(TornadoBufferProvider.java:184)
	at tornado.drivers.common@1.1.1-dev/uk.ac.manchester.tornado.drivers.common.TornadoBufferProvider.getOrAllocateBufferWithSize(TornadoBufferProvider.java:211)
	at tornado.drivers.opencl@1.1.1-dev/uk.ac.manchester.tornado.drivers.opencl.mm.OCLMemorySegmentWrapper.allocate(OCLMemorySegmentWrapper.java:184)
	at tornado.drivers.opencl@1.1.1-dev/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.newDeviceBufferAllocation(OCLTornadoDevice.java:617)
	at tornado.drivers.opencl@1.1.1-dev/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.allocate(OCLTornadoDevice.java:630)
	at tornado.drivers.opencl@1.1.1-dev/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.allocateObjects(OCLTornadoDevice.java:593)
	at tornado.runtime@1.1.1-dev/uk.ac.manchester.tornado.runtime.interpreter.TornadoVMInterpreter.executeAlloc(TornadoVMInterpreter.java:499)
	at tornado.runtime@1.1.1-dev/uk.ac.manchester.tornado.runtime.interpreter.TornadoVMInterpreter.execute(TornadoVMInterpreter.java:296)
	at tornado.runtime@1.1.1-dev/uk.ac.manchester.tornado.runtime.interpreter.TornadoVMInterpreter.execute(TornadoVMInterpreter.java:1028)
	at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:1024)
	at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:762)
	at tornado.runtime@1.1.1-dev/uk.ac.manchester.tornado.runtime.TornadoVM.executeInterpreterSingleThreaded(TornadoVM.java:127)




**Desktop (please complete the following information):**
 - OS: Sequoia 15.5
 - Installed Java OpenJDK Runtime Environment Corretto-21.0.0.35.1 (build 21+35-LTS

**Fix**

./llama-tornado --gpu-memory 96GB --gpu  --verbose-init --opencl --model Meta-Llama-3-8B-Instruct-Q4_0.gguf --prompt "tell me a joke"

(using -gpu-memory 96GB instead of -Dtornado.device.memory=96GB)

LLama3.java is slightly faster:

java --enable-preview --source 21 --add-modules jdk.incubator.vector LLama3.java -i --model Meta-Llama-3-8B-Instruct-Q4_0.gguf

Note: LLama3.java uses preview features of Java SE 21.
Note: Recompile with -Xlint:preview for details.
Parse Meta-Llama-3-8B-Instruct-Q4_0.gguf: 417 millis
Load LlaMa model: 576 millis
> tell me a joke 
Here's one:

Why couldn't the bicycle stand up by itself?

(Wait for it...)

Because it was two-tired!

Hope that made you smile!
7,35 tokens/s (47)

vs. 

./llama-tornado --gpu-memory 96GB --gpu  --verbose-init --opencl --model Meta-Llama-3-8B-Instruct-Q4_0.gguf --prompt "tell me a joke"
WARNING: Using incubator modules: jdk.incubator.vector
Parse Meta-Llama-3-8B-Instruct-Q4_0.gguf: 403 millis
Loading model weights in TornadoVM format (loading Q4_0 -> F16)
Load LlaMa model: 19720 millis

Starting TornadoVM initialization...
TornadoVM GPU execution plan creation: 370,35 ms
Java to GPU JIT compiler warmup: 1105,70 ms
Transfer read-only weights to GPU: 2001,91 ms
Finished TornadoVM initialization...
 
Here's one:

Why couldn't the bicycle stand up by itself?

(wait for it...)

Because it was two-tired!

Hope that made you smile!

achieved tok/s: 5,10. Tokens: 46, seconds: 9,02


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OutOfMemory / Misleading Hint #30

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OutOfMemory / Misleading Hint #30

Description

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions