-
Notifications
You must be signed in to change notification settings - Fork 698
Building on Jetson AGX Xavier Development Kit fails #221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Same issue here. Apparently nvcc does not support NEON. |
OK it works now, I have included the arm_neon intrinsics and moved the parts using those intructions to a separate .cpp file. The compiled library loads correctly in python, does not complain but produces wrong results (I think). Here's what happens when quantitizing a very small model (code from https://huggingface.co/blog/hf-bitsandbytes-integration):
Output (=before quantitization):
Now calling
It seems like all the negative values have been set to zero. Running the same thing in Google Colab works produces negative integers, no zeros. Any ideas why that is? (Note: I did not modify any of the .cpp/.cu functions, I just moved stuff around) |
can you share the modified code? we are facing the same problem, and can debug together. |
@g588928812 If you create a fork in your github account you can do the changes there. That way it is possible to collaborate and also to send a pull request. |
is the issue of no negative numbers related to this: pytorch/pytorch#52146 ? |
sure. here's the fork: https://github.com/g588928812/bitsandbytes_jetsonX |
i'm using pytorch 1.14 and this should be have been fixed with 1.8.1 already if i understand this correctly |
you were right! I systematically replaced all chars with in8_t and it works now, it was somewhere in kernels.cu. will find out which change exactly did it and update the repository later |
thanks for your repo the test with torch 2.0 has a similar effect. now that you have located the kernels.cu, i will also try to modify it |
would -fsigned-char suffice? |
it's done, repository updated. but i dont know yet if the rest of the library works fine, i will check the pytests and see what they produce |
Support for Apple silicon #252 shows another Aarch64 approach. Would be a good idea to merge these efforts. |
Another thing that needs looking into is building proper platform-specific wheels. I've started investigating this in #257, but it is not 100% working yet. I don't know if someone else have started looking into it. As I don't have access to CUDA on Linux currently (my NVIDIA PC is hijacked by my kids for gaming and it runs Win11 right now :) ) I've also tried setting it up on GitHub pipelines. |
I think I might have solved this in #257, but I have no hardware to test it on so I can't verify. Wheel is built by https://github.com/rickardp/bitsandbytes/actions/runs/4653237487/jobs/8233937589 (go to Summary, Artifacts then download the I had to move some #ifdefs and includes around. Then it seems nvcc doesn't like Neon intrinsics, so I had to compile the Cuda version without Neon support. If anyone has the hardware to try, please check out this build |
tried it and seems like it doesn't do anything (?). Changing Some of the unit tests (test_autograd) still fail however, I'm not sure why. Apart from the tests, inference works now and I've got Open Assistant running. There still seems to be an issue tough, it only runs with a an |
Yes, sorry this is what I meant. No code that runs through nvcc can have Neon, the g++-compiled code can. |
I tried compiling on a Jetson Orin Jetpack 5.0.2 from your fork with CUDA_VERSION=114 make And CUDA_VERSION=114 make cuda11x It still said the bitsandbytes lib was compiled without GPU support. Do I need to use CUDA_VERSION=114 make cuda11x_nomatmul instead? |
Make cuda11x should be fine, the compute capability of the Orin is >7.5. Could you post the output of make cuda11x please Also you're using CUDA 11.4 right? Just making sure |
Ok, I started over from scratch and documented what I am doing in a gist here: https://gist.github.com/androiddrew/9470fc5cfde190a71a5971abc7c2aa9f It appears that I do have the correct binary built now
ops.cu
Seems to not like this line CUDA_CHECK_RETURN(cudaPeekAtLastError()); Could I be missing some apt packages that were required for your fork to work? |
Same here, Jetson Orin Jetpack 5.1 |
No additional packages needed. The problem is I'm working on the Xavier, building and using the cuda11x_nomatmul version and it works for me. Could you please try |
I think I have it fixed using the https://github.com/g588928812/bitsandbytes_jetsonX fork! I added the sm_87 in the make file.
Following the same workflow in https://gist.github.com/androiddrew/9470fc5cfde190a71a5971abc7c2aa9f I was able to use
The jetpack wheel for pytorch is from jetson zoo and was compiled for my version of Jetpack (5.0.2) |
true! Jetson Orin has CC 8.7. Thanks, i've updated the repository |
Now that the library loads successfully on the Orin, could you guys please check if the unit tests work? |
I started the tests a while ago and there is one error in tests/test_cuda_setup_evaluator.py and 4 more in tests/test_functional.py. Now it seems to be stuck at 85%. I will report back tomorrow. |
I too experienced the same problem compiling the main branch on AGX Orin 32GB. Trace below: ================================================================================
exit======================== I'll be testing @g588928812's branch later this evening and will update with results UPDATE: Using the branch ha would appear to support compute capability 87 the following errors are returned: ================================================================================
============ |
so, the bitsandbytes_jetsonX builds without complaints but the error is still the same? Did you install the built library using
|
the error message is misleading. from what i understand, this error is thrown when 1) the file does not exist OR 2) the file exists but is incompatible |
what does ldd /usr/local/lib/python3.8/dist-packages/bitsandbytes/libbitsandbytes_cuda114.so say? |
As you suspected, it's not being built correctly.
I noticed in the build, that it compiles for all available/supported Cuda 11x compute capabilities, but didn't see in the Makefile, an existing option to specify only a single compute capability value. In my case above, I used (in a dockerfile build):
The LDD error seems to suggest a larger issue though. At second glance, i looks like the SM_87 args aren' being passed in the case of the UPDATE:
|
Strangely, while
So it's compiling for x86_64 not ARM64 (aarm64) as it should be... UPDATE: Sanity check - I'm using his build container: NGC PyTorch |
Are you sure this is the file you've built rather than a file that was there before? |
You're right. Silly me for thinking an installation script might actually install the artifacts from a build/compile operation... |
When the newly compiled library is manually copied into the proper production location and
I'm not familiar enough with the bitsandbytes codebase to say what's oing on here but I found others, on Windows are seing the same issue which is apparently resolved in version 0.37.2 oobabooga/text-generation-webui#1193 I'll take some branch merging to test that, but it's also worth noting that when making the target |
@hillct did you manage to fix this? |
I haven't moved past the |
I turns out I may have had a cache issue or otherwise unclean environment. I'm now able to build using:
F................................................................................................FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFssssssssroot@08c36af24c07:/app/tests# ``` |
Finally, As for interpreting the tes_fnctional.py results, there are 344 tests but only 298 characters there so one can't map them to test success/failure In case anyone wants it, a trivial Dockerfile for testing in a clean environment https://gist.github.com/hillct/42045e9664834f8c666017d82fa276dd |
thanks! that doesnt look too bad
try running pytest with the |
i've merged the bitsandbytes 0.38 into my fork, if you have some time please check if this resolved the |
|
These are resolved. they were a symptom of my uclean build environment (which is what prompted creation of the docker image for cleaner testing |
Hi - this issue is referenced from another issue relating to Apple Silicon, presumably because if an ARM architecture CPU-only version of bitsandbytes could work on the Jetson then it might be able to compile and work on a Mac.. however, clearly the Jetson has on-board CUDA.. so.. Can I ask (please) anyone who has this working on a Jetson - is it running with CPU only or is it running using on-board CUDA? Many thanks in advance. |
I am using Cuda. Hosting local Llama models.
…On Wed, Jun 7, 2023 at 09:39 deep-pipeline ***@***.***> wrote:
Hi - this issue is referenced from another issue relating to Apple
Silicon, presumably because if an ARM architecture CPU-only version of
bitsandbytes could work on the Jetson then it might be able to compile and
work on a Mac.. however, clearly the Jetson has on-board CUDA.. so..
Can I ask (please) anyone who has this working on a Jetson - is it running
with CPU only or is it running using on-board CUDA?
Many thanks in advance.
—
Reply to this email directly, view it on GitHub
<#221 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABAQF2RYUOAWSNITJOG6QNTXKCACLANCNFSM6AAAAAAWIFLO6Q>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
you should be able to compile a CPU-only version though using |
The CPU only version is not feature complete |
@g588928812 is it working on jetson with cuda now ? with fork bitsandbytes_jetsonX ? |
it (my fork) compiles but some tests fail, you would simply have to try and see. i'm not working on it anymore though, sorry |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
Hi,
i am trying to build bitsandbytes on a Nvidia Jetson AGX Xavier Kit, but it fails, not finding emmintrin.h:
/home/g/bitsandbytes# CUDA_VERSION=114 make cuda11x_nomatmul
Did a bit of research and, not knowing what i am doing, I changed SMID.h to include sse2neon.h instead of emmintrin.h. NOW it fails again, catastrophically, not finding builtin functions:
SETUP:
Flashed using JetPack 5.1 (Ubuntu 20.04)
full_output_nvcc-verbose.txt
Any help would be greatly appreciated, thank you!
The text was updated successfully, but these errors were encountered: