Skip to content

Commit 2c4d375

Browse files
authored
[LoginNodes] Add DCV support for login nodes (#6363)
* Add CustomActions configuration for login nodes - Added schema for login nodes custom actions - Added CustomAction to LoginNodesPool resource in cluster_config.py Signed-off-by: Chris Makin <cmakin@amazon.com> * Update unit tests to include login node custom action changes - Modified login node dna_json unit test to include pool_name - Ran tox autoformatter * Update test_dcv integration test to include login nodes * Merge head and login nodes custom action schema unit test - Update changelog - Fix DescribeStacks resource error in unit test * Add DCV params to login node config and schema * Modify `cluster_config.py` to support login node DCV. * Added `has_dcv_enabled()` to the `LoginNodesPool` resource. * Checks if dcv is enabled in a login node pool. * Added `has_dcv_configured()` to `LoginNodes` resource * Checks for dcv configuration in login node pool. * For use in validating DCV to avoid validating region multiple times. * Added `architecture` and `instance_type_info` properties to `LoginNodesPool` resource to check DCV eligibility. * Added `DcvValidator` for login node pools in `SlurmClusterConfig`. * Modify `cluster_schema.py` to support login node DCV * Add Dcv field to `LoginNodePoolSchema` Signed-off-by: Chris Makin <cmakin@amazon.com> * Add login node DCV param to config unit tests * Modify dcv-connect and login node security group to support DCV on login nodes * Modify `dcv-connect.py` * Added `--login-node-ip` parameter * Added `_validate_login_node_ip_arg()` * Checks whether the `--login-node-ip` argument is a valid IPv4 address. * Checks that `--login-node-ip` is a login node public or private IP address. * Modify `cluster.py` * Add `login_node_instances` property to Cluster. * For use in `dcv_connect.py` to get public and private IPs of login nodes in the cluster. * Add DCV access to login node security group * Update `add_login_nodes_security_group` in `cluster_stack.py` to include ingress property for DCV. * Update unit tests to support login node DCV configuration * Add S3 access policy to login node IAM resources This is needed so DCV can determine whether a valid license is available. * Add login node support to test_cloudwatch_logging integration test * Modified `test_cloudwatch_logging` to support login nodes. * The current changes will not support testing multiple pools with different configurations (e.g. DCV enabled on one pool but not another). * Added `login_node_ip` param to `RemoteCommandExecutor`. * This is used in `test_cloudwatch_logging` to run commands on all login nodes in a cluster. * Add S3 bucket access to login node IAM unit test * Update CHANGELOGOC * Fix dcv-connect help message --------- Signed-off-by: Chris Makin <cmakin@amazon.com>
1 parent b07ae79 commit 2c4d375

File tree

26 files changed

+463
-139
lines changed

26 files changed

+463
-139
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ CHANGELOG
77
**ENHANCEMENTS**
88

99
- Add support for custom actions on login nodes.
10+
- Allow DCV connection on login nodes.
1011

1112
**BUG FIXES**
1213
- Fix validator `EfaPlacementGroupValidator` so that it does not suggest to configure a Placement Group when Capacity Blocks are used.

cli/src/pcluster/cli/commands/dcv_connect.py

Lines changed: 42 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@
2727

2828
DCV_CONNECT_SCRIPT = "/opt/parallelcluster/scripts/pcluster_dcv_connect.sh"
2929
LOGGER = logging.getLogger(__name__)
30+
IPV4_REGEX = r"^((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])$"
3031

3132

3233
class DCVConnectionError(Exception):
@@ -42,28 +43,55 @@ def _check_command_output(cmd):
4243
return sub.check_output(cmd, shell=True, universal_newlines=True, stderr=sub.STDOUT).strip() # nosec B602 nosemgrep
4344

4445

46+
def _validate_login_node_ip_arg(ip, login_nodes):
47+
"""Check if ip is a valid IPv4 address and belongs to a login node."""
48+
if not re.match(IPV4_REGEX, ip):
49+
error(f"{ip} is not a valid IPv4 address.")
50+
51+
login_node_ips = set()
52+
53+
for node in login_nodes:
54+
login_node_ips.update([node.public_ip, node.private_ip])
55+
56+
if ip not in login_node_ips:
57+
error(f"{ip} is not an IP address of a login node.")
58+
59+
4560
def _dcv_connect(args):
4661
"""
4762
Execute pcluster dcv connect command.
4863
4964
:param args: pcluster cli arguments.
5065
"""
5166
try:
52-
head_node = Cluster(args.cluster_name).head_node_instance
67+
cluster = Cluster(args.cluster_name)
68+
69+
if args.login_node_ip:
70+
login_nodes = cluster.login_node_instances
71+
_validate_login_node_ip_arg(args.login_node_ip, login_nodes)
72+
73+
node_ip = args.login_node_ip
74+
default_user = login_nodes[0].default_user
75+
else:
76+
head_node = cluster.head_node_instance
77+
78+
node_ip = head_node.public_ip or head_node.private_ip
79+
default_user = head_node.default_user
80+
5381
except Exception as e:
5482
error(f"Unable to connect to the cluster.\n{e}")
5583
else:
56-
head_node_ip = head_node.public_ip or head_node.private_ip
57-
# Prepare ssh command to execute in the head node instance
58-
cmd = 'ssh {CFN_USER}@{HEAD_NODE_IP} {KEY} "{REMOTE_COMMAND} /home/{CFN_USER}"'.format(
59-
CFN_USER=head_node.default_user,
60-
HEAD_NODE_IP=head_node_ip,
84+
85+
# Prepare ssh command to execute in the node instance
86+
cmd = 'ssh {CFN_USER}@{NODE_IP} {KEY} "{REMOTE_COMMAND} /home/{CFN_USER}"'.format(
87+
CFN_USER=default_user,
88+
NODE_IP=node_ip,
6189
KEY="-i {0}".format(args.key_path) if args.key_path else "",
6290
REMOTE_COMMAND=DCV_CONNECT_SCRIPT,
6391
)
6492

6593
try:
66-
url = _retry(_retrieve_dcv_session_url, func_args=[cmd, args.cluster_name, head_node_ip], attempts=4)
94+
url = _retry(_retrieve_dcv_session_url, func_args=[cmd, args.cluster_name, node_ip], attempts=4)
6795
url_message = f"Please use the following one-time URL in your browser within 30 seconds:\n{url}"
6896

6997
if args.show_url:
@@ -80,12 +108,12 @@ def _dcv_connect(args):
80108
error(
81109
"Something went wrong during DCV connection.\n{0}"
82110
"Please check the logs in the /var/log/parallelcluster/ folder "
83-
"of the head node and submit an issue {1}\n".format(e, PCLUSTER_ISSUES_LINK)
111+
"of the node and submit an issue {1}\n".format(e, PCLUSTER_ISSUES_LINK)
84112
)
85113

86114

87-
def _retrieve_dcv_session_url(ssh_cmd, cluster_name, head_node_ip):
88-
"""Connect by ssh to the head node instance, prepare DCV session and return the DCV session URL."""
115+
def _retrieve_dcv_session_url(ssh_cmd, cluster_name, node_ip):
116+
"""Connect by ssh to the head or login node instance, prepare DCV session and return the DCV session URL."""
89117
try:
90118
LOGGER.debug("SSH command: %s", ssh_cmd)
91119
output = _check_command_output(ssh_cmd)
@@ -104,7 +132,7 @@ def _retrieve_dcv_session_url(ssh_cmd, cluster_name, head_node_ip):
104132
error(
105133
"Something went wrong during DCV connection. Please manually execute the command:\n{0}\n"
106134
"If the problem persists, please check the logs in the /var/log/parallelcluster/ folder "
107-
"of the head node and submit an issue {1}".format(ssh_cmd, PCLUSTER_ISSUES_LINK)
135+
"of the node and submit an issue {1}".format(ssh_cmd, PCLUSTER_ISSUES_LINK)
108136
)
109137

110138
except sub.CalledProcessError as e:
@@ -117,7 +145,7 @@ def _retrieve_dcv_session_url(ssh_cmd, cluster_name, head_node_ip):
117145
raise DCVConnectionError(e.output)
118146

119147
return "https://{IP}:{PORT}?authToken={TOKEN}#{SESSION_ID}".format(
120-
IP=head_node_ip,
148+
IP=node_ip,
121149
PORT=dcv_server_port, # pylint: disable=E0606
122150
TOKEN=dcv_session_token, # pylint: disable=E0606
123151
SESSION_ID=dcv_session_id, # pylint: disable=E0606
@@ -152,7 +180,7 @@ class DcvConnectCommand(CliCommand):
152180

153181
# CLI
154182
name = "dcv-connect"
155-
help = "Permits to connect to the head node through an interactive session by using NICE DCV."
183+
help = "Permits connection to the head or login nodes through an interactive session by using NICE DCV."
156184
description = help
157185

158186
def __init__(self, subparsers):
@@ -162,6 +190,7 @@ def register_command_args(self, parser: ArgumentParser) -> None: # noqa: D102
162190
parser.add_argument("-n", "--cluster-name", help="Name of the cluster to connect to", required=True)
163191
parser.add_argument("--key-path", dest="key_path", help="Key path of the SSH key to use for the connection")
164192
parser.add_argument("--show-url", action="store_true", default=False, help="Print URL and exit")
193+
parser.add_argument("--login-node-ip", dest="login_node_ip", help="IP address of a login node to connect to")
165194

166195
def execute(self, args: Namespace, extra_args: List[str]) -> None: # noqa: D102 #pylint: disable=unused-argument
167196
_dcv_connect(args)

cli/src/pcluster/config/cluster_config.py

Lines changed: 69 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1346,6 +1346,7 @@ def __init__(
13461346
networking: LoginNodesNetworking = None,
13471347
count: int = None,
13481348
ssh: LoginNodesSsh = None,
1349+
dcv: Dcv = None,
13491350
custom_actions: CustomActions = None,
13501351
iam: LoginNodesIam = None,
13511352
gracetime_period: int = None,
@@ -1358,9 +1359,11 @@ def __init__(
13581359
self.networking = networking
13591360
self.count = Resource.init_param(count, default=1)
13601361
self.ssh = ssh
1362+
self.dcv = dcv
13611363
self.custom_actions = custom_actions
13621364
self.iam = iam or LoginNodesIam(implied=True)
13631365
self.gracetime_period = Resource.init_param(gracetime_period, default=10)
1366+
self.__instance_type_info = None
13641367

13651368
@property
13661369
def instance_profile(self):
@@ -1372,6 +1375,23 @@ def instance_role(self):
13721375
"""Return the IAM role for login nodes, if set."""
13731376
return self.iam.instance_role if self.iam else None
13741377

1378+
@property
1379+
def architecture(self):
1380+
"""Return login node pool architecture based on instance type."""
1381+
return self.instance_type_info.supported_architecture()[0]
1382+
1383+
@property
1384+
def instance_type_info(self) -> InstanceTypeInfo:
1385+
"""Return login node pool instance type information as returned from aws ec2 describe-instance-types."""
1386+
if not self.__instance_type_info:
1387+
self.__instance_type_info = AWSApi.instance().ec2.get_instance_type_info(self.instance_type)
1388+
return self.__instance_type_info
1389+
1390+
@property
1391+
def has_dcv_enabled(self):
1392+
"""Return True if DCV is enabled."""
1393+
return self.dcv and self.dcv.enabled
1394+
13751395
def _register_validators(self, context: ValidatorContext = None): # noqa: D102 #pylint: disable=unused-argument
13761396
self._register_validator(InstanceTypeValidator, instance_type=self.instance_type)
13771397
self._register_validator(NameValidator, name=self.name)
@@ -1388,6 +1408,14 @@ def __init__(
13881408
super().__init__(**kwargs)
13891409
self.pools = pools
13901410

1411+
@property
1412+
def has_dcv_enabled(self):
1413+
"""Returns True if there is a pool in the cluster with DCV enabled."""
1414+
for pool in self.pools:
1415+
if pool.has_dcv_enabled:
1416+
return True
1417+
return False
1418+
13911419

13921420
class HeadNode(Resource):
13931421
"""Represent the Head Node resource."""
@@ -2938,6 +2966,46 @@ def login_nodes_subnet_ids(self):
29382966
subnet_ids_set.add(subnet_id)
29392967
return list(subnet_ids_set)
29402968

2969+
def _register_login_node_validators(self):
2970+
"""Register all login node validators to ensure that the resource parameters are valid."""
2971+
has_dcv_configured = False
2972+
# Check if all subnets(head node, Login nodes, compute nodes) are in the same VPC and support DNS.
2973+
self._register_validator(
2974+
SubnetsValidator,
2975+
subnet_ids=self.login_nodes_subnet_ids + self.compute_subnet_ids + [self.head_node.networking.subnet_id],
2976+
)
2977+
self._register_validator(LoginNodesSchedulerValidator, scheduler=self.scheduling.scheduler)
2978+
2979+
for pool in self.login_nodes.pools:
2980+
# Check the LoginNodes CustomAMI must be an ami of the same os family and the same arch.
2981+
if pool.image and pool.image.custom_ami:
2982+
self._register_validator(AmiOsCompatibleValidator, os=self.image.os, image_id=pool.image.custom_ami)
2983+
self._register_validator(
2984+
InstanceTypeBaseAMICompatibleValidator,
2985+
instance_type=pool.instance_type,
2986+
image=self.login_nodes_ami[pool.name],
2987+
)
2988+
self._register_validator(
2989+
InstanceTypeOSCompatibleValidator,
2990+
instance_type=pool.instance_type,
2991+
os=self.image.os,
2992+
)
2993+
# Check pool instance compatability with NICE DCV.
2994+
if pool.dcv:
2995+
self._register_validator(
2996+
DcvValidator,
2997+
instance_type=pool.instance_type,
2998+
dcv_enabled=pool.dcv.enabled,
2999+
allowed_ips=pool.dcv.allowed_ips,
3000+
port=pool.dcv.port,
3001+
os=self.image.os,
3002+
architecture=pool.architecture,
3003+
)
3004+
has_dcv_configured = True
3005+
3006+
if has_dcv_configured:
3007+
self._register_validator(FeatureRegionValidator, feature=Feature.DCV, region=self.region)
3008+
29413009
def _register_validators(self, context: ValidatorContext = None): # noqa: C901
29423010
super()._register_validators(context)
29433011
self._register_validator(
@@ -2946,33 +3014,8 @@ def _register_validators(self, context: ValidatorContext = None): # noqa: C901
29463014
queues=self.scheduling.queues,
29473015
)
29483016

2949-
# Check if all subnets(head node, Login nodes, compute nodes) are in the same VPC and support DNS.
29503017
if self.login_nodes:
2951-
self._register_validator(
2952-
SubnetsValidator,
2953-
subnet_ids=self.login_nodes_subnet_ids
2954-
+ self.compute_subnet_ids
2955-
+ [self.head_node.networking.subnet_id],
2956-
)
2957-
2958-
if self.login_nodes:
2959-
self._register_validator(LoginNodesSchedulerValidator, scheduler=self.scheduling.scheduler)
2960-
2961-
# Check the LoginNodes CustomAMI must be an ami of the same os family and the same arch.
2962-
if self.login_nodes:
2963-
for pool in self.login_nodes.pools:
2964-
if pool.image and pool.image.custom_ami:
2965-
self._register_validator(AmiOsCompatibleValidator, os=self.image.os, image_id=pool.image.custom_ami)
2966-
self._register_validator(
2967-
InstanceTypeBaseAMICompatibleValidator,
2968-
instance_type=pool.instance_type,
2969-
image=self.login_nodes_ami[pool.name],
2970-
)
2971-
self._register_validator(
2972-
InstanceTypeOSCompatibleValidator,
2973-
instance_type=pool.instance_type,
2974-
os=self.image.os,
2975-
)
3018+
self._register_login_node_validators()
29763019

29773020
if self.scheduling.settings and self.scheduling.settings.dns and self.scheduling.settings.dns.hosted_zone_id:
29783021
self._register_validator(

cli/src/pcluster/models/cluster.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -713,6 +713,15 @@ def head_node_instance(self) -> ClusterInstance:
713713
else:
714714
raise ClusterActionError("Unable to retrieve head node information.")
715715

716+
@property
717+
def login_node_instances(self) -> List[ClusterInstance]:
718+
"""Get login node instances."""
719+
instances, _ = self.describe_instances(node_type=NodeType.LOGIN_NODE)
720+
if instances:
721+
return instances
722+
else:
723+
raise ClusterActionError("Unable to retrieve login node information.")
724+
716725
def _get_instance_filters(self, node_type: NodeType, queue_name: str = None):
717726
filters = [
718727
{"Name": f"tag:{PCLUSTER_CLUSTER_NAME_TAG}", "Values": [self.stack_name]},

cli/src/pcluster/schemas/cluster_schema.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1434,6 +1434,7 @@ class LoginNodesPoolSchema(BaseSchema):
14341434
),
14351435
metadata={"update_policy": UpdatePolicy.SUPPORTED},
14361436
)
1437+
dcv = fields.Nested(DcvSchema, metadata={"update_policy": UpdatePolicy.UNSUPPORTED})
14371438
ssh = fields.Nested(LoginNodesSshSchema, metadata={"update_policy": UpdatePolicy.LOGIN_NODES_STOP})
14381439
custom_actions = fields.Nested(LoginNodesCustomActionsSchema, metadata={"update_policy": UpdatePolicy.IGNORED})
14391440
iam = fields.Nested(LoginNodesIamSchema, metadata={"update_policy": UpdatePolicy.LOGIN_NODES_STOP})

cli/src/pcluster/templates/cdk_builder_utils.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -958,6 +958,21 @@ def _build_policy(self) -> List[iam.PolicyStatement]:
958958
)
959959
],
960960
),
961+
iam.PolicyStatement(
962+
sid="DcvLicense",
963+
actions=[
964+
"s3:GetObject",
965+
],
966+
effect=iam.Effect.ALLOW,
967+
resources=[
968+
self._format_arn(
969+
service="s3",
970+
resource="dcv-license.{0}/*".format(Stack.of(self).region),
971+
region="",
972+
account="",
973+
)
974+
],
975+
),
961976
]
962977

963978

cli/src/pcluster/templates/cluster_stack.py

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -887,11 +887,23 @@ def _get_source_ingress_rule(self, setting):
887887
return ec2.CfnSecurityGroup.IngressProperty(ip_protocol="tcp", from_port=22, to_port=22, cidr_ip=setting)
888888

889889
def _add_login_nodes_security_group(self):
890+
# TODO review this once we allow more pools to be defined in the LoginNodes section
890891
login_nodes_security_group_ingress = [
891892
# SSH access
892-
# TODO review this once we allow more pools to be defined in the LoginNodes section
893893
self._get_source_ingress_rule(self.config.login_nodes.pools[0].ssh.allowed_ips)
894894
]
895+
896+
if self.config.login_nodes.has_dcv_enabled:
897+
login_nodes_security_group_ingress.append(
898+
# DCV access
899+
ec2.CfnSecurityGroup.IngressProperty(
900+
ip_protocol="tcp",
901+
from_port=self.config.login_nodes.pools[0].dcv.port,
902+
to_port=self.config.login_nodes.pools[0].dcv.port,
903+
cidr_ip=self.config.login_nodes.pools[0].dcv.allowed_ips,
904+
)
905+
)
906+
895907
return ec2.CfnSecurityGroup(
896908
self.stack,
897909
"LoginNodesSecurityGroup",

cli/src/pcluster/templates/login_nodes_stack.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -256,6 +256,8 @@ def _add_login_nodes_pool_launch_template(self):
256256
"head_node_private_ip": self._head_eni.attr_primary_private_ip_address,
257257
"dns_domain": (str(self._cluster_hosted_zone.name) if self._cluster_hosted_zone else ""),
258258
"hosted_zone": (str(self._cluster_hosted_zone.ref) if self._cluster_hosted_zone else ""),
259+
"dcv_enabled": "login_node" if self._pool.has_dcv_enabled else "false",
260+
"dcv_port": self._pool.dcv.port if self._pool.dcv else "NONE",
259261
"log_group_name": self._log_group.log_group_name,
260262
"log_rotation_enabled": "true" if self._config.is_log_rotation_enabled else "false",
261263
"pool_name": self._pool.name,

cli/tests/pcluster/cli/test_dcv_connect/TestDcvConnectCommand/test_helper/pcluster-help.txt

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
usage: pcluster dcv-connect [-h] [--debug] [-r REGION] -n CLUSTER_NAME
22
[--key-path KEY_PATH] [--show-url]
3+
[--login-node-ip LOGIN_NODE_IP]
34

4-
Permits to connect to the head node through an interactive session by using
5-
NICE DCV.
5+
Permits connection to the head or login nodes through an interactive session
6+
by using NICE DCV.
67

78
options:
89
-h, --help show this help message and exit
@@ -13,3 +14,5 @@ options:
1314
Name of the cluster to connect to
1415
--key-path KEY_PATH Key path of the SSH key to use for the connection
1516
--show-url Print URL and exit
17+
--login-node-ip LOGIN_NODE_IP
18+
IP address of a login node to connect to

cli/tests/pcluster/cli/test_entrypoint/TestParallelClusterCli/test_helper/pcluster-help.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -49,8 +49,8 @@ COMMANDS:
4949
list-official-images
5050
List Official ParallelCluster AMIs.
5151
configure Start the AWS ParallelCluster configuration.
52-
dcv-connect Permits to connect to the head node through an
53-
interactive session by using NICE DCV.
52+
dcv-connect Permits connection to the head or login nodes through
53+
an interactive session by using NICE DCV.
5454
export-cluster-logs
5555
Export the logs of the cluster to a local tar.gz
5656
archive by passing through an Amazon S3 Bucket.

0 commit comments

Comments
 (0)