Skip to content
This repository was archived by the owner on Apr 18, 2024. It is now read-only.

Commit a7abcd7

Browse files
committed
Updated to support CDH v5
1 parent e00260d commit a7abcd7

File tree

3 files changed

+57
-90
lines changed

3 files changed

+57
-90
lines changed

README.md

Lines changed: 12 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# oci-quickstart-cloudera
22
This is a Terraform module that deploys [Cloudera Enterprise Data Hub](https://www.cloudera.com/products/enterprise-data-hub.html) on [Oracle Cloud Infrastructure (OCI)](https://cloud.oracle.com/en_US/cloud-infrastructure). It is developed jointly by Oracle and Cloudera.
33

4-
## Alternate Versions
5-
Future development will include support for EDH v5 clusters. In the meantime, use the [1.0.0 release](https://github.com/oci-quickstart/oci-cloudera/releases/tag/1.0.0) for v5 deployments.
4+
## Deployment Information
5+
The following table shows Recommended and Minimum supported OCI shapes for each cluster role:
66

77
| | Worker Nodes | Bastion Instance | Utility and Master Instances |
88
|-------------|----------------|------------------|------------------------------|
@@ -12,6 +12,15 @@ Future development will include support for EDH v5 clusters. In the meantime, u
1212
## Resource Manager Deployment
1313
Using [OCI Resource Manager](https://docs.cloud.oracle.com/iaas/Content/ResourceManager/Concepts/resourcemanager.htm) makes deployment quite easy. Simply [download the .zip](https://github.com/oracle/oci-quickstart-cloudera/zipball/resource-manager) and follow the [Resource Manager instructions](https://docs.cloud.oracle.com/iaas/Content/ResourceManager/Tasks/usingconsole.htm) for how to build a stack. Prior to building the Stack, you may want to modify some parts of the deployment detailed in the sections below.
1414

15+
Alternatively you can also use a schema file to make setting deployment variables much easier. In order to leverage this feature, the GitHub zipball must be re-packaged so that it's contents are top-level prior to creating the ORM Stack. This is a straight forward process:
16+
```
17+
unzip oci-quickstart-cloudera*.zip
18+
cd oci-quickstart-cloudera-<TAB_COMPLETE>
19+
zip -r oci-quickstart-cloudera.zip *
20+
```
21+
22+
Use the oci-quickstart-cloudera.zip file created in the last step to create the ORM Stack.
23+
1524
## Python Deployment using cm_client
1625
The deployment script "deploy_on_oci.py" uses cm_client against Cloudera Manger API v31. As such it does require some customization before execution. Reference the header section in the script, it is highly encouraged you modify the following variables before deployment:
1726

@@ -42,58 +51,7 @@ High Availability is also offered as part of the deployment process. When secur
4251
You can customize the default root password for MySQL by editing the source script [cms_mysql.sh](https://github.com/oracle/oci-quickstart-cloudera/blob/master/scripts/cms_mysql.sh#L188). For the various Cloudera databases, random passwords are generated and used. These are stored in a flat file on the Utility host for use at deployment time.
4352

4453
## Object Storage Integration
45-
As of the 2.1.0 release, included with this template is a means to deploy clusters with configuration to allow use of OCI Object Storage using S3 Compatability. In order to implement, an S3 Access and Secret key must be set up in the OCI Tenancy first. This process is detailed [here](https://docs.cloud.oracle.com/iaas/Content/Identity/Tasks/managingcredentials.htm#Working2). Once that is in place, modify the [deploy_on_oci.py](https://github.com/oracle/oci-quickstart-cloudera/blob/master/scripts/deploy_on_oci.py#L101-L108) script, and set the following values:
46-
47-
s3_compat_enable = 'False'
48-
s3a_secret_key = 'None'
49-
s3a_access_key = 'None'
50-
s3a_endpoint = 'None'
51-
52-
The first should be set to 'True', then replace 'None" with each of the required values. This configuration will then be pushed as part of the cluster deployment.
53-
54-
## Resource Manager Variables
55-
Step 2 for setting up a stack is Configure Variables. By default all variables are filled in, with the exception of the SSH Public and Private keypair used for host access. If you don't have a keypair for use with this deployment, generating one on Linux/Mac is simply:
56-
57-
ssh-keygen -t rsa
58-
59-
Follow the prompts to generate the key, do not associate a password with it. Copy the contents of each file and paste into the appropriate variable fields as shown here:
60-
61-
![Resource Manager Variables](https://github.com/oracle/oci-quickstart-cloudera/blob/resource-manager/images/RM_variables.png)
62-
63-
This list also can be modified to suit your specific deployment requirements. You should review the settings for the following and ensure you have the capacity in your Tenancy prior to deployment:
64-
65-
worker_instance_shape
66-
worker_node_count
67-
block_volumes_per_worker
68-
utility_instance_shape
69-
master_instance_shape
70-
bastion_instance_shape
71-
72-
Note that it is not suggested to modify the data_blocksize_in_gbs to lower than the default value of 700GB. This is because 700GB is the minimum value to achieve maximum throughput per block volume. Lowering this has a negative impact on HDFS performance. If you need more HDFS capacity, best practice is to increase the block_volumes_per_worker which adds more DFS volumes for capacity and aggregate throughput. For even higher density, the data_blocksize_in_gbs can be increased in tandem.
73-
74-
When using DenseIO shapes, it's also possible to set the block_volumes_per_worker to "0" to leverage only local NVME disk for HDFS. In the case that you have both local NVME and block, data tiering will automatically be enabled as part of the deployment process.
75-
76-
## Resource Manager Stack Steps
77-
After building the stack, it only takes 2 actions to deploy:
78-
79-
Terraform Actions -> Plan
80-
Terraform Actions -> Apply
81-
82-
This will create all the required elements in a compartment in the target OCI tenancy. This includes VCN and Security List parameters. Security audit of these in the [network module](https://github.com/oracle/oci-quickstart-cloudera/blob/master/terraform/modules/network/main.tf) is suggested.
83-
84-
The output of the Apply command will contain a URL to access Cloudera Manager. This is the public IP of the Utility Host, which runs the deployment.
85-
86-
## Monitoring Cluster Build
87-
Because all tasks are done in CloudInit, there are two ways to monitor the deployment. Firstly you can login to the Cloudera Manager URL once it is up and running a few minutes after the Apply command finishes. Alternatively you can SSH into the Utility node, and monitor the log file "/var/log/cloudera-OCI-initialize.log" which contains detailed output from the deployment.
88-
89-
## Destroy the Deployment
90-
91-
When you no longer need the deployment, you can destroy it:
92-
93-
Terraform Actions -> Destroy
94-
95-
## Deployment Architecture
96-
54+
???LINES MISSING
9755
Here is a diagram showing what is deployed using this template. Note that resources are automatically distributed among Fault Domains in an Availability Domain to ensure fault tolerance. Additional workers deployed will stripe between the 3 fault domains in sequence starting with the Fault Domain 1 and incrementing sequentially.
9856

9957
![Deployment Architecture Diagram](https://github.com/oracle/oci-quickstart-cloudera/blob/master/images/deployment_architecture.png)

schema.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,13 @@ variables:
6767
cdh_version:
6868
type: enum
6969
enum:
70+
- "5.10.2.5"
71+
- "5.11.2.4"
72+
- "5.12.2.4"
73+
- "5.13.3.2"
74+
- "5.14.4.3"
75+
- "5.15.2.3"
76+
- "5.16.2.8"
7077
- "6.0.0"
7178
- "6.0.1"
7279
- "6.1.0"

scripts/deploy_on_oci.py

Lines changed: 38 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -20,11 +20,7 @@
2020
import subprocess
2121

2222
start_time = time.time()
23-
24-
#
2523
# Global Parameter Defaults - These are passed to the script, do not modify
26-
#
27-
2824
disk_count = 'None'
2925
worker_shape = 'None'
3026
cm_server = 'None'
@@ -33,53 +29,36 @@
3329
host_fqdn_list = []
3430
data_tiering = 'False'
3531
nvme_disks = 0
36-
cluster_version = '6.2.0' # type: str
3732
deployment_type = 'simple' # type: str
38-
39-
#
4033
# Custom Global Parameters - Customize below here
41-
#
42-
43-
# Enable Debug Output set this to 'True' for detailed API output during execution
4434
debug = 'False' # type: str
45-
4635
# Define new admin username and password for Cloudera Manager
4736
# This replaces the default (insecure) admin/admin account
4837
admin_user_name = 'cdhadmin' # type: str
4938
admin_password = 'somepassword' # type: str
50-
51-
# Define cluster name
39+
# Defaults
5240
cluster_name = 'TestCluster' # type: str
53-
54-
# Set this to 'True' (default) to enable secure cluster (Kerberos) functionality
55-
# Set this to 'False" to deploy an insecure cluster - This is automatically off for simple deployment
41+
cdh_version = ' ' # type: str
42+
cluster_primary_version = ' '
43+
kafka_parcel_url = ' '
5644
secure_cluster = 'True' # type: bool
57-
58-
# Set this to 'False' if you do not want HDFS HA - useful for Development or if you want to save some setup time
59-
# This is automatically off for simple deployment
6045
hdfs_ha = 'True' # type: bool
61-
62-
# They should match what is in the Cloudera Manager CloudInit bootstrap file and instance boot files
46+
# These should match what is in the Cloudera Manager CloudInit bootstrap file and instance boot files
6347
realm = 'HADOOP.COM'
6448
kdc_admin = 'cloudera-scm@HADOOP.COM'
6549
kdc_password = 'somepassword'
66-
6750
# Set port number for Cloudera Manager - used to build API endpoints and check if CM is up/listening
6851
# Only modify if you customized the Cloudera Manager deployment to use a non-standard port
6952
cm_port = '7180'
7053
# Set API version to use with Cloudera Manager
7154
api_version = 'v31'
72-
7355
# Define DB port number for Cluster Metadata
7456
# This should match the CloudInit boot file you chose in Terraform
7557
# For MySQL this is 3306
7658
# For Postgres this is 5432
7759
meta_db_port = '3306'
78-
79-
# Define Remote Parcel URL & Distribution Rate if desired
80-
remote_parcel_url = 'https://archive.cloudera.com/cdh6/' + cluster_version + '/parcels' # type: str
60+
# Distribution rate for remote repo
8161
parcel_distribution_rate = "1024000" # type: int
82-
8362
# Cluster Services List
8463
cluster_service_list = ['SOLR', 'HBASE', 'HIVE', 'SPARK_ON_YARN', 'HDFS', 'OOZIE', 'SQOOP_CLIENT', 'ZOOKEEPER',
8564
'YARN', 'KAFKA', 'IMPALA']
@@ -366,12 +345,12 @@ def delete_default_admin_user():
366345
print('Exception when calling UsersResourceApi->delete_user2: {}\n'.format(e))
367346

368347

369-
def init_cluster():
348+
def init_cluster(cdh_version):
370349
"""
371350
Initialize Cluster
372351
:return:
373352
"""
374-
cluster = [cm_client.ApiCluster(name=cluster_name, display_name=cluster_name, full_version=cluster_version)]
353+
cluster = [cm_client.ApiCluster(name=cluster_name, display_name=cluster_name, full_version=cdh_version)]
375354
body = cm_client.ApiClusterList(cluster)
376355

377356
try:
@@ -574,7 +553,8 @@ def dda_parcel(parcel_product):
574553
:param parcel_product: Parcel Product Name - e.g. CDH, SPARK_ON_YARN
575554
:return:
576555
"""
577-
556+
parcel_version = ' '
557+
target_stage = ' '
578558
def monitor_parcel(parcel_product, parcel_version, target_stage):
579559
while True:
580560
parcel = parcel_api.read_parcel(cluster_name, parcel_product, parcel_version)
@@ -2081,9 +2061,11 @@ def options_parser(args=None):
20812061
:return:
20822062
"""
20832063
global objects
2064+
global remote_parcel_url
2065+
global cdh_version
20842066
parser = argparse.ArgumentParser(prog='python deploy_on_oci.py', description='Deploy a Cloudera EDH %s Cluster on '
20852067
'OCI using cm_client with Cloudera '
2086-
'Manager API %s' % (cluster_version,
2068+
'Manager API %s' % (cdh_version,
20872069
api_version))
20882070
parser.add_argument('-D', '--deployment_type', metavar='deployment_type', help='simple, no HA or Kerberos at deployment, or secure to enable both')
20892071
parser.add_argument('-m', '--cm_server', metavar='cm_server', required='True',
@@ -2101,6 +2083,19 @@ def options_parser(args=None):
21012083
parser.add_argument('-ad', '--availability_domain', metavar='availability_domain', help='OCI Availability Domain')
21022084
parser.add_argument('-N', '--cluster_name', metavar='cluster_name', help='CDH Cluster Name')
21032085
options = parser.parse_args(args)
2086+
cluster_primary_version = options.cdh_version.split('.')
2087+
cluster_primary_version = cluster_primary_version[0]
2088+
cdh_version = options.cdh_version
2089+
if cluster_primary_version == '6':
2090+
remote_parcel_url = 'https://archive.cloudera.com/cdh6/' + options.cdh_version + '/parcels' # type: str
2091+
else:
2092+
remote_parcel_url = 'https://archive.cloudera.com/cdh5/parcels/' + options.cdh_version #type: str
2093+
if options.cdh_version.split('.')[2] >= '13':
2094+
kafka_version = '4.1.0.4'
2095+
else:
2096+
kafka_version = '2.2.0.68'
2097+
2098+
kafka_parcel_url = 'https://archive.cloudera.com/kafka/parcels/' + kafka_version # type: str
21042099
if not options.cm_server:
21052100
print('Cloudera Manager Server IP required.')
21062101
parser.print_help()
@@ -2134,13 +2129,14 @@ def options_parser(args=None):
21342129
sys.exit()
21352130

21362131
return (options.cm_server, options.input_host_list, options.disk_count, options.license_file, options.worker_shape,
2137-
options.num_workers, options.deployment_type, options.cdh_version, options.availability_domain, options.cluster_name)
2132+
options.num_workers, options.deployment_type, options.cdh_version, options.availability_domain, options.cluster_name,
2133+
cluster_primary_version, kafka_parcel_url)
21382134

21392135
#
21402136
# MAIN FUNCTION FOR CLUSTER DEPLOYMENT
21412137
#
21422138

2143-
def build_cloudera_cluster():
2139+
def build_cloudera_cluster(cluster_primary_version):
21442140
"""
21452141
Deploy and Configure a Cloudera EDH Cluster
21462142
:return:
@@ -2166,7 +2162,7 @@ def build_cloudera_cluster():
21662162
build_success = 'False'
21672163
while build_success == 'False':
21682164
print('->Initializing Cluster %s' % cluster_name)
2169-
init_cluster()
2165+
init_cluster(cdh_version)
21702166
build_host_list(input_host_list)
21712167
if len(host_fqdn_list) < 6:
21722168
print('Error - %d hosts found, Minimum 6 required to build %s!' % (len(host_fqdn_list), cluster_name))
@@ -2191,6 +2187,12 @@ def build_cloudera_cluster():
21912187
update_parcel_repo(remote_parcel_url, parcel_distribution_rate)
21922188
print('->Parcel Setup Running')
21932189
dda_parcel('CDH')
2190+
if cluster_primary_version == '5':
2191+
update_parcel_repo(kafka_parcel_url, parcel_distribution_rate)
2192+
dda_parcel('KAFKA')
2193+
else:
2194+
pass
2195+
time.sleep(10)
21942196
print('->Mapping Cluster Hostnames and Host IDs')
21952197
cluster_host_id_map()
21962198
print('->Reading DB Passwords')
@@ -2320,7 +2322,7 @@ def enable_kerberos():
23202322
#
23212323

23222324
if __name__ == '__main__':
2323-
cm_server, input_host_list, disk_count, license_file, worker_shape, num_workers, deployment_type, cdh_version, cms_version, cluster_name =\
2325+
cm_server, input_host_list, disk_count, license_file, worker_shape, num_workers, deployment_type, cdh_version, cms_version, cluster_name, cluster_primary_version, kafka_parcel_url =\
23242326
options_parser(sys.argv[1:])
23252327
if debug == 'True':
23262328
print('cm_server = %s' % cm_server)
@@ -2366,7 +2368,7 @@ def enable_kerberos():
23662368
else:
23672369
print('Cluster Deployment options - HA: %s - Kerberos: %s' % (hdfs_ha, secure_cluster))
23682370

2369-
build_cloudera_cluster()
2371+
build_cloudera_cluster(cluster_primary_version)
23702372
if deployment_type == 'simple':
23712373
exit(0)
23722374
else:

0 commit comments

Comments
 (0)