Skip to content

Commit 2d96d3e

Browse files
HumairAKmprahldroctothorpe
authored
chore: (cherry-pick) Add SDK support for setting resource limits on older KFP versions (#11853)
* fix(sdk): Add SDK support for setting resource limits on older KFP versions (#11839) For context, the commit 70aaf8a removed support for the old fields (without a resource_ prefix). This was added back in commit 6ebf4aa but done in a way that broke any usage of pipeline input parameters but was to support the current KFP backend which did not yet support the new fields. In commit 7c931ae, the old fields were removed again but added support for the new field in KFP backend. This commit addresses the case where a user is using a new SDK but with a KFP backend prior to 2.4. (cherry picked from commit f9d487c) Signed-off-by: mprahl <mprahl@users.noreply.github.com> Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com> * chore(sdk): release kfp sdk 2.12.2 (#11843) (cherry picked from commit e21bbba) Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com> * fix(tests): free up space in some test runners (#11818) * fix(tests): free up space in some test runners Signed-off-by: droctothorpe <mythicalsunlight@gmail.com> * Add comments and logs Signed-off-by: droctothorpe <mythicalsunlight@gmail.com> --------- (cherry picked from commit 478ca08) Signed-off-by: droctothorpe <mythicalsunlight@gmail.com> Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com> --------- Signed-off-by: mprahl <mprahl@users.noreply.github.com> Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com> Signed-off-by: droctothorpe <mythicalsunlight@gmail.com> Co-authored-by: Matt Prahl <mprahl@users.noreply.github.com> Co-authored-by: Alex <mythicalsunlight@gmail.com>
1 parent 1956d69 commit 2d96d3e

File tree

9 files changed

+204
-12
lines changed

9 files changed

+204
-12
lines changed

.github/workflows/kfp-kubernetes-execution-tests.yml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,18 @@ jobs:
2828
with:
2929
python-version: '3.9'
3030

31+
# This is intended to address disk space issues that have surfaced
32+
# intermittently during CI -
33+
# https://github.com/actions/runner-images/issues/2840#issuecomment-1284059930
34+
- name: Free up space in /dev/root
35+
run: |
36+
echo "Disk usage before clean up:"
37+
df -h
38+
sudo rm -rf /usr/share/dotnet
39+
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
40+
echo "Disk usage after clean up:"
41+
df -h
42+
3143
- name: Create KFP cluster
3244
uses: ./.github/actions/kfp-cluster
3345
with:

.github/workflows/sdk-execution.yml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,18 @@ jobs:
2828
with:
2929
python-version: 3.9
3030

31+
# This is intended to address disk space issues that have surfaced
32+
# intermittently during CI -
33+
# https://github.com/actions/runner-images/issues/2840#issuecomment-1284059930
34+
- name: Free up space in /dev/root
35+
run: |
36+
echo "Disk usage before clean up:"
37+
df -h
38+
sudo rm -rf /usr/share/dotnet
39+
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
40+
echo "Disk usage after clean up:"
41+
df -h
42+
3143
- name: Create KFP cluster
3244
uses: ./.github/actions/kfp-cluster
3345
with:

docs/versions.json

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,17 @@
11
[
22
{
3-
"version": "https://kubeflow-pipelines.readthedocs.io/en/sdk-2.12.1/",
4-
"title": "2.12.1",
3+
"version": "https://kubeflow-pipelines.readthedocs.io/en/sdk-2.12.2/",
4+
"title": "2.12.2",
55
"aliases": [
66
"stable",
77
"latest"
88
]
99
},
10+
{
11+
"version": "https://kubeflow-pipelines.readthedocs.io/en/sdk-2.12.1/",
12+
"title": "2.12.1",
13+
"aliases": []
14+
},
1015
{
1116
"version": "https://kubeflow-pipelines.readthedocs.io/en/sdk-2.12.0/",
1217
"title": "2.12.0",

sdk/RELEASE.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,18 @@
1010

1111
## Documentation updates
1212

13+
# 2.12.2
14+
15+
## Features
16+
17+
## Breaking changes
18+
19+
## Deprecations
20+
21+
## Bug fixes and other changes
22+
23+
* fix(sdk): Add SDK support for setting resource limits on older KFP versions (#11839)
24+
1325
# 2.12.1
1426

1527
## Features

sdk/python/kfp/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
# https://packaging.python.org/guides/packaging-namespace-packages/#pkgutil-style-namespace-packages
1717
__path__ = __import__('pkgutil').extend_path(__path__, __name__)
1818

19-
__version__ = '2.12.1'
19+
__version__ = '2.12.2'
2020

2121
import sys
2222
import warnings

sdk/python/kfp/compiler/compiler_test.py

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3527,29 +3527,82 @@ def simple_pipeline():
35273527
self.assertEqual(
35283528
'5', dict_format['deploymentSpec']['executors']['exec-return-1-2']
35293529
['container']['resources']['resourceCpuLimit'])
3530+
self.assertEqual(
3531+
5.0, dict_format['deploymentSpec']['executors']['exec-return-1-2']
3532+
['container']['resources']['cpuLimit'])
35303533
self.assertNotIn(
35313534
'memoryLimit', dict_format['deploymentSpec']['executors']
35323535
['exec-return-1-2']['container']['resources'])
35333536

35343537
self.assertEqual(
35353538
'50G', dict_format['deploymentSpec']['executors']['exec-return-1-3']
35363539
['container']['resources']['resourceMemoryLimit'])
3540+
self.assertEqual(
3541+
50.0, dict_format['deploymentSpec']['executors']['exec-return-1-3']
3542+
['container']['resources']['memoryLimit'])
35373543
self.assertNotIn(
35383544
'cpuLimit', dict_format['deploymentSpec']['executors']
35393545
['exec-return-1-3']['container']['resources'])
35403546

35413547
self.assertEqual(
35423548
'2', dict_format['deploymentSpec']['executors']['exec-return-1-4']
35433549
['container']['resources']['resourceCpuRequest'])
3550+
self.assertEqual(
3551+
2.0, dict_format['deploymentSpec']['executors']['exec-return-1-4']
3552+
['container']['resources']['cpuRequest'])
35443553
self.assertEqual(
35453554
'5', dict_format['deploymentSpec']['executors']['exec-return-1-4']
35463555
['container']['resources']['resourceCpuLimit'])
3556+
self.assertEqual(
3557+
5.0, dict_format['deploymentSpec']['executors']['exec-return-1-4']
3558+
['container']['resources']['cpuLimit'])
35473559
self.assertEqual(
35483560
'4G', dict_format['deploymentSpec']['executors']['exec-return-1-4']
35493561
['container']['resources']['resourceMemoryRequest'])
3562+
self.assertEqual(
3563+
4.0, dict_format['deploymentSpec']['executors']['exec-return-1-4']
3564+
['container']['resources']['memoryRequest'])
35503565
self.assertEqual(
35513566
'50G', dict_format['deploymentSpec']['executors']['exec-return-1-4']
35523567
['container']['resources']['resourceMemoryLimit'])
3568+
self.assertEqual(
3569+
50.0, dict_format['deploymentSpec']['executors']['exec-return-1-4']
3570+
['container']['resources']['memoryLimit'])
3571+
3572+
def test_cpu_memory_input_parameter(self):
3573+
3574+
@dsl.pipeline
3575+
def simple_pipeline(
3576+
cpu_request: str,
3577+
cpu_limt: str,
3578+
memory_request: str,
3579+
memory_limit: str,
3580+
ac_type: str,
3581+
ac_count: int,
3582+
):
3583+
return_1().set_cpu_request(cpu_request)\
3584+
.set_cpu_limit(cpu_limt)\
3585+
.set_memory_request(memory_request)\
3586+
.set_memory_limit(memory_limit)\
3587+
.set_accelerator_limit(ac_count)\
3588+
.set_accelerator_type(ac_type)
3589+
3590+
dict_format = json_format.MessageToDict(simple_pipeline.pipeline_spec)
3591+
resources = dict_format['deploymentSpec']['executors']['exec-return-1'][
3592+
'container']['resources']
3593+
3594+
self.assertIn('resourceCpuRequest', resources)
3595+
self.assertNotIn('cpuRequest', resources)
3596+
self.assertIn('resourceCpuLimit', resources)
3597+
self.assertNotIn('cpuLimit', resources)
3598+
self.assertIn('resourceMemoryRequest', resources)
3599+
self.assertNotIn('memoryRequest', resources)
3600+
self.assertIn('resourceMemoryLimit', resources)
3601+
self.assertNotIn('memoryLimit', resources)
3602+
self.assertIn('resourceType', resources['accelerator'])
3603+
self.assertNotIn('type', resources['accelerator'])
3604+
self.assertIn('resourceCount', resources['accelerator'])
3605+
self.assertNotIn('count', resources['accelerator'])
35533606

35543607

35553608
class TestPlatformConfig(unittest.TestCase):

sdk/python/kfp/compiler/compiler_utils.py

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -804,3 +804,60 @@ def recursive_replace_placeholders(data: Union[Dict, List], old_value: str,
804804
if isinstance(data, pipeline_channel.PipelineChannel):
805805
data = str(data)
806806
return new_value if data == old_value else data
807+
808+
809+
# Note that cpu_to_float assumes the string has already been validated by the _validate_cpu_request_limit method.
810+
def _cpu_to_float(cpu: str) -> float:
811+
"""Converts the validated CPU request/limit string and to its numeric float
812+
value.
813+
814+
Args:
815+
cpu: CPU requests or limits. This string should be a number or a
816+
number followed by an "m" to indicate millicores (1/1000). For
817+
more information, see `Specify a CPU Request and a CPU Limit
818+
Returns:
819+
The numeric value (float) of the cpu request/limit.
820+
"""
821+
return float(cpu[:-1]) / 1000 if cpu.endswith('m') else float(cpu)
822+
823+
824+
# Note that memory_to_float assumes the string has already been validated by the _validate_memory_request_limit method.
825+
def _memory_to_float(memory: str) -> float:
826+
"""Converts the validated memory request/limit string to its numeric value.
827+
828+
Args:
829+
memory: Memory requests or limits. This string should be a number or
830+
a number followed by one of "E", "Ei", "P", "Pi", "T", "Ti", "G",
831+
"Gi", "M", "Mi", "K", or "Ki".
832+
Returns:
833+
The numeric value (float) of the memory request/limit.
834+
"""
835+
if memory.endswith('E'):
836+
memory = float(memory[:-1]) * constants._E / constants._G
837+
elif memory.endswith('Ei'):
838+
memory = float(memory[:-2]) * constants._EI / constants._G
839+
elif memory.endswith('P'):
840+
memory = float(memory[:-1]) * constants._P / constants._G
841+
elif memory.endswith('Pi'):
842+
memory = float(memory[:-2]) * constants._PI / constants._G
843+
elif memory.endswith('T'):
844+
memory = float(memory[:-1]) * constants._T / constants._G
845+
elif memory.endswith('Ti'):
846+
memory = float(memory[:-2]) * constants._TI / constants._G
847+
elif memory.endswith('G'):
848+
memory = float(memory[:-1])
849+
elif memory.endswith('Gi'):
850+
memory = float(memory[:-2]) * constants._GI / constants._G
851+
elif memory.endswith('M'):
852+
memory = float(memory[:-1]) * constants._M / constants._G
853+
elif memory.endswith('Mi'):
854+
memory = float(memory[:-2]) * constants._MI / constants._G
855+
elif memory.endswith('K'):
856+
memory = float(memory[:-1]) * constants._K / constants._G
857+
elif memory.endswith('Ki'):
858+
memory = float(memory[:-2]) * constants._KI / constants._G
859+
else:
860+
# By default interpret as a plain integer, in the unit of Bytes.
861+
memory = float(memory) / constants._G
862+
863+
return memory

sdk/python/kfp/compiler/pipeline_spec_builder.py

Lines changed: 43 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -651,27 +651,62 @@ def convert_to_placeholder(input_value: str) -> str:
651651
for name, value in (task.container_spec.env or {}).items()
652652
]))
653653

654+
# All the fields with the resource_ prefix are newer fields to support pipeline input parameters. The below code
655+
# will check if the value is a placeholder and if not, it will also set the value on the old deprecated fields
656+
# without the resource_ prefix to work on older KFP installations.
654657
if task.container_spec.resources is not None:
655658
if task.container_spec.resources.cpu_request is not None:
656-
container_spec.resources.resource_cpu_request = convert_to_placeholder(
659+
placeholder = convert_to_placeholder(
657660
task.container_spec.resources.cpu_request)
661+
container_spec.resources.resource_cpu_request = placeholder
662+
663+
if task.container_spec.resources.cpu_request == placeholder:
664+
container_spec.resources.cpu_request = compiler_utils._cpu_to_float(
665+
task.container_spec.resources.cpu_request)
658666
if task.container_spec.resources.cpu_limit is not None:
659-
container_spec.resources.resource_cpu_limit = convert_to_placeholder(
667+
placeholder = convert_to_placeholder(
660668
task.container_spec.resources.cpu_limit)
669+
container_spec.resources.resource_cpu_limit = placeholder
670+
671+
if task.container_spec.resources.cpu_limit == placeholder:
672+
container_spec.resources.cpu_limit = compiler_utils._cpu_to_float(
673+
task.container_spec.resources.cpu_limit)
661674
if task.container_spec.resources.memory_request is not None:
662-
container_spec.resources.resource_memory_request = convert_to_placeholder(
675+
placeholder = convert_to_placeholder(
663676
task.container_spec.resources.memory_request)
677+
container_spec.resources.resource_memory_request = placeholder
678+
679+
if task.container_spec.resources.memory_request == placeholder:
680+
container_spec.resources.memory_request = compiler_utils._memory_to_float(
681+
task.container_spec.resources.memory_request)
664682
if task.container_spec.resources.memory_limit is not None:
665-
container_spec.resources.resource_memory_limit = convert_to_placeholder(
683+
placeholder = convert_to_placeholder(
666684
task.container_spec.resources.memory_limit)
685+
container_spec.resources.resource_memory_limit = placeholder
686+
687+
if task.container_spec.resources.memory_limit == placeholder:
688+
container_spec.resources.memory_limit = compiler_utils._memory_to_float(
689+
task.container_spec.resources.memory_limit)
667690
if task.container_spec.resources.accelerator_count is not None:
691+
ac_type = None
692+
ac_type_placholder = convert_to_placeholder(
693+
task.container_spec.resources.accelerator_type)
694+
if task.container_spec.resources.accelerator_type == ac_type_placholder:
695+
ac_type = task.container_spec.resources.accelerator_type
696+
697+
ac_count = None
698+
ac_count_placeholder = convert_to_placeholder(
699+
task.container_spec.resources.accelerator_count)
700+
if task.container_spec.resources.accelerator_count == ac_count_placeholder:
701+
ac_count = int(task.container_spec.resources.accelerator_count)
702+
668703
container_spec.resources.accelerator.CopyFrom(
669704
pipeline_spec_pb2.PipelineDeploymentConfig.PipelineContainerSpec
670705
.ResourceSpec.AcceleratorConfig(
671-
resource_type=convert_to_placeholder(
672-
task.container_spec.resources.accelerator_type),
673-
resource_count=convert_to_placeholder(
674-
task.container_spec.resources.accelerator_count),
706+
resource_type=ac_type_placholder,
707+
resource_count=ac_count_placeholder,
708+
type=ac_type,
709+
count=ac_count,
675710
))
676711

677712
return container_spec

sdk/python/test_data/pipelines/pipeline_with_resource_spec.yaml

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,8 +61,14 @@ deploymentSpec:
6161
image: gcr.io/my-project/my-fancy-trainer
6262
resources:
6363
accelerator:
64+
count: '1'
6465
resourceCount: '1'
6566
resourceType: tpu-v3
67+
type: tpu-v3
68+
cpuLimit: 4.0
69+
cpuRequest: 2.0
70+
memoryLimit: 15.032385536
71+
memoryRequest: 4.294967296
6672
resourceCpuLimit: '4'
6773
resourceCpuRequest: '2'
6874
resourceMemoryLimit: 14Gi
@@ -119,4 +125,4 @@ root:
119125
isOptional: true
120126
parameterType: STRING
121127
schemaVersion: 2.1.0
122-
sdkVersion: kfp-2.11.0
128+
sdkVersion: kfp-2.12.1

0 commit comments

Comments
 (0)