Skip to content

OCP 4.14 + QAT Operator 0.28 ; setting enable services is not working #370

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lazharh opened this issue Feb 28, 2025 · 16 comments
Open

OCP 4.14 + QAT Operator 0.28 ; setting enable services is not working #370

lazharh opened this issue Feb 28, 2025 · 16 comments

Comments

@lazharh
Copy link

lazharh commented Feb 28, 2025

I have been also testing this and I found the bugs that is causing this:
$ oc logs -c intel-qat-initcontainer intel-qat-plugin-6p2pf -n openshift-operators | grep -B5 -A5 sym
/usr/local/bin/qat-init.sh: line 44: echo: write error: Invalid argument <---------------------------------------------
Device f7:00.0 configured with services: dc
/usr/local/bin/qat-init.sh: line 44: echo: write error: Invalid argument
Device f3:00.0 configured with services: sym;asym


when checking the script:
line 44: echo "$SERVICES_ENABLED" > "$DEVPATH"/qat/cfg_services

It is trying to write the updated configuration from user to the device by the write fails.

I have tried to use a different kernel module as specified in Intel's Doc but could not find a suitable one: Only 4xxxvf

oc apply -f 004-qat-device-plugin-cr.yaml
The QatDevicePlugin "qatdeviceplugin-sample" is invalid: spec.kernelVfDrivers[0]: Unsupported value: "402xx": supported values: "4xxxvf"

oc apply -f 004-qat_device_plugin-orig.yaml
The QatDevicePlugin "qatdeviceplugin-sample" is invalid: spec.kernelVfDrivers[0]: Unsupported value: "401xx": supported values: "4xxxvf"


even doing it manually does not work

bash-5.1# echo sym > /sys/bus/pci/devices/0000:f3:00.0/qat/cfg_services
bash: echo: write error: Invalid argument


Full test:

[root@master-0 ~]# echo -n down > /sys/bus/pci/devices/0000:f3:00.0/qat/state
[root@master-0 ~]# echo -n dc > /sys/bus/pci/devices/0000:f3:00.0/qat/cfg_services
[root@master-0 ~]# echo -n asym > /sys/bus/pci/devices/0000:f3:00.0/qat/cfg_services
-bash: echo: write error: Invalid argument
[root@master-0 ~]# echo -n asym;sym > /sys/bus/pci/devices/0000:f3:00.0/qat/cfg_services
asym-bash: sym: command not found
[root@master-0 ~]# echo -n 'asym;sym' > /sys/bus/pci/devices/0000:f3:00.0/qat/cfg_services
-bash: echo: write error: Invalid argument
[root@master-0 ~]# echo -n 'sym;asym' > /sys/bus/pci/devices/0000:f3:00.0/qat/cfg_services
[root@master-0 ~]# echo -n 'sym;dc' > /sys/bus/pci/devices/0000:f3:00.0/qat/cfg_services
-bash: echo: write error: Invalid argument
[root@master-0 ~]# echo -n 'asym;dc' > /sys/bus/pci/devices/0000:f3:00.0/qat/cfg_services
-bash: echo: write error: Invalid argument
[root@master-0 ~]# echo -n 'dc;asym' > /sys/bus/pci/devices/0000:f3:00.0/qat/cfg_services
-bash: echo: write error: Invalid argument
[root@master-0 ~]# echo -n 'dc;sym' > /sys/bus/pci/devices/0000:f3:00.0/qat/cfg_services
-bash: echo: write error: Invalid argument
[root@master-0 ~]#


so in OCP 4.14 with QAT operator 0.28 ;
It is only possible to set dc or sym;asym
default : when no ServicesEnabled is set ;
"qat.intel.com/cy": "16",
"qat.intel.com/dc": "16"
when set dc
"qat.intel.com/cy": "0",
"qat.intel.com/dc": "32"
when set sym;asym
"qat.intel.com/cy": "32",
"qat.intel.com/dc": "0"

@uMartinXu
Copy link
Contributor

uMartinXu commented Mar 3, 2025

Thanks for your testing, I think your observation is right. It exactly matches the QAT Resource Configuration readme in 1.3.0 release We support document history from 1.3.0. 1.2.1 release is what you are working on. And it should be same with 1.3.0 on this contents.
Please let me know whether above configurations can meet your requirement? If you have some other requirements on QAT configuration, be free to to let us know. :-)
Thanks again for your good contribution.

@lazharh
Copy link
Author

lazharh commented Mar 3, 2025

@uMartinXu : thank, I had hard time to find the right version of docs maybe.
But as I asked : is it possible to have more flexibily on OCP 4.14 ? i.e be able to set the other values such sym ; etc.

@uMartinXu
Copy link
Contributor

Fully understandable, OCP releases frequently.
And you can see detail for each release

And I am afraid that on OCP 4.14 the flexibility you want like what in iOCP-1.5.2 for OCP-4.17 can not be supported.

So I suggest you to try Intel® Technology Enabling for OpenShift* version 1.5.2 which target for OCP-4.17.

If you want to use the EUS OCP release. OCP-4.16 supported byIntel® Technology Enabling for OpenShift* version 1.4.0. It might support the feature you want, but we did not fully test this feature on this this release.

Another possible EUS OCP release you can use is OCP-4.18 which will be supported by Intel Technology Enabling for OpenShift version 1.6.0. We are working on this release now. And if there is no block issue, we might release it quite soon.

I hope above information can help. Anything we can help, please let us know. :-)

@lazharh
Copy link
Author

lazharh commented Mar 4, 2025

Ok thank you so much for the clarifications. I work for RH on OCP and I am aware of all the releases (4.18 is just GA'ed)
We have some partner and customers in Telco who are still on OCP 4.14 any maybe even earler ...
Thanks

@mythi
Copy link

mythi commented Mar 11, 2025

And I am afraid that on OCP 4.14 the flexibility you want like what in iOCP-1.5.2 for OCP-4.17 can not be supported.

@lazharh to explain this a bit further: the limitation comes from the QAT driver features available in the host OS used by 4.14. QAT plugin/operator is able to support this but gives the errors because the kernel does not recognize those services you wish to configure.

I have asked @uMartinXu to document the driver capabilities for each OCP versions better because it's not clear at all.

@uMartinXu
Copy link
Contributor

And I am afraid that on OCP 4.14 the flexibility you want like what in iOCP-1.5.2 for OCP-4.17 can not be supported.

@lazharh to explain this a bit further: the limitation comes from the QAT driver features available in the host OS used by 4.14. QAT plugin/operator is able to support this but gives the errors because the kernel does not recognize those services you wish to configure.

I have asked @uMartinXu to document the driver capabilities for each OCP versions better because it's not clear at all.

Thanks @mythi for your further explain, and actually in each release and the correspond document, the supported configuration already be documented clearly. I think according to @lazharh feedback, looks like it took him quite a few efforts to find the decommitments and release. I think we can enhance the readme page https://github.com/intel/intel-technology-enabling-for-openshift/blob/main/docs/releases.rst and also make it more easily to be found by the user. @mythi @lazharh once we have PR, please help us review. What do you think of this solution?

@mythi
Copy link

mythi commented Mar 12, 2025

once we have PR, please help us review. What do you think of this solution?

sure I can help with the review

@lazharh
Copy link
Author

lazharh commented Mar 12, 2025

Sure, I will have a look

@lazharh
Copy link
Author

lazharh commented Mar 31, 2025

@uMartinXu ; @mythi : can you tell me which kernel version has the fix for the driver to be able to set sym and all other combination of parameters?
and if based on RHEL 9.4 or 9.6 please?

@uMartinXu
Copy link
Contributor

@lazharh,
As I mentioned, we tested and supported the related QAT options from (Intel Technology Enabling for OpenShift 1.5.2 release. And in this release the OCP version is 4.17.1 and the kernel version is 5.14.0-427.40.1.el9_4.x86_64, you know OCP using RHCOS kernel. BTW, since iOCP-1.6.0 we are maintaining the table Component Matrix, user can check the kernel version there.

So as to OCP-4.14.11 (Intel Technology Enabling for OpenShift 1.2.0) the full QAT options are not supported yet. The kernel version is
5.14.0-284.50.1.el9_2.x86_64.
I hope above information is helpful.

@lazharh
Copy link
Author

lazharh commented Apr 1, 2025

@uMartinXu thank you for the update. The Component Matrix Link point to 4.18 OCP. but in your comment above you say 4.17.1 ?
So even 4.16 it is not supported?

@mythi
Copy link

mythi commented Apr 2, 2025

I hope above information is helpful.

Not really. The question is what is the first kernel version where the full cfg_services support is available and in which OCP version that kernel is included.

@lazharh
Copy link
Author

lazharh commented Apr 2, 2025

Thank you team @uMartinXu @mythi

@uMartinXu
Copy link
Contributor

uMartinXu commented Apr 2, 2025

@mythi @lazharh We did not test every OCP/RHCOS Kernel.
Maybe OCP 4.16 is the first OCP version can support that. But we did not test it.
And I think the accurate answer can only be acquired by checking the change log of RHCOS Kernel for each OCP release. :-)

@lazharh
Copy link
Author

lazharh commented Apr 2, 2025

@uMartinXu all right thanks!

@lazharh
Copy link
Author

lazharh commented Apr 16, 2025

@uMartinXu Hope you are doing well.
Please, When is the latest version of QAT plugin Operator will be ready on OCP 4.18?
I still see Version: 0.29.1 : I think the latest is 0.32?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants