You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Apr 18, 2024. It is now read-only.
This is a major change to deployment model. All deployment is now done leveraging CloudInit, and user_data plus extended_metadata to enable "hands-off" cluster deployment. This simplifies deployment of Cloudera Enterprise Data Hub on OCI, removing requirements for the end user to install python and dependent libraries, as well as needing the Cloudera Manager host to be publicly accessible.
Host types can be customized in this template. Also included with this template is an easy method to customize block volume quantity and size as pertains to HDFS capacity. See "variables.tf" for more information in-line.
12
+
Host types can be customized in this template. Also included with this template is an easy method to customize block volume quantity and size as pertains to HDFS capacity. See [variables.tf](https://github.com/oracle/oci-quickstart-cloudera/blob/master/terraform/variables.tf#L48-L62) for more information in-line.
13
13
14
14
## Prerequisites
15
15
First off you'll need to do some pre deploy setup. That's all detailed [here](https://github.com/oracle/oci-quickstart-prerequisites).
16
16
17
-
## Additional Python Dependencies
18
-
This module depends on Python, Paramiko, PIP, and cm_client. These should be installed on the host you are using to deploy the Terraform module.
19
-
20
-
On EL7 hosts, installation can be performed using the following commands:
The deployment script "deploy_on_oci.py" uses cm_client against Cloudera Manger API v31. As such it does require some customization before execution. Reference the header section in the script, it is highly encouraged you modify the following variables before deployment, ssh_keyfile is required or deployment will fail:
24
+
The deployment script "deploy_on_oci.py" uses cm_client against Cloudera Manger API v31. As such it does require some customization before execution. Reference the header section in the script, it is highly encouraged you modify the following variables before deployment:
42
25
43
26
admin_user_name
44
27
admin_password
45
28
cluster_name
46
-
ssh_keyfile (REQUIRED)
47
-
cluster_service_list
48
29
49
-
Also if you modify the compute.tf in any way to change hostname parameters, you will need to update these variables for pattern matching, otherwise host detection and cluster layout will fail:
30
+
Also if you modify the compute.tf in any way to change hostname parameters, you will need to update these variables for pattern matching, otherwise cluster deployment will fail:
50
31
51
-
worker_hosts_contain
52
-
master_hosts_contain
53
-
namenode_host_contains
54
-
secondary_namenode_host_contains
55
-
cloudera_manager_host_contains
32
+
worker_hosts_prefix = 'cdh-worker'
33
+
namenode_host = 'cdh-master-1'
34
+
secondary_namenode_host = 'cdh-master-2'
35
+
cloudera_manager_host = 'cdh-utility-1'
56
36
57
37
In addition, further customization of the cluster deployment can be done by modification of the following functions:
58
38
59
39
setup_mgmt_rcg
60
40
update_cluster_rcg_configuration
61
41
62
-
This does require some knowledge of Python - modify at your own risk. These functions contain Cloudera specific tuning parameters as well as host mapping for roles.
42
+
This does require some knowledge of Python and Cloudera - modify at your own risk. These functions contain Cloudera specific tuning parameters as well as host mapping for roles.
63
43
64
-
## Kerberos Secure Cluster by Default
44
+
## Kerberos Secure Cluster option
65
45
66
-
This automation now defaults to using a local KDC deployed on the Cloudera Manager instance for secure cluster operation. Please read the scripts [README](https://github.com/oci-quickstart/oci-cloudera/blob/master/scripts/README.md) for information regarding how to set these parameters prior to deployment.
46
+
This automation supports using a local KDC deployed on the Cloudera Manager instance for secure cluster operation. Please read the scripts [README](https://github.com/oracle/oci-quickstart-cloudera/blob/master/scripts/README.md) for information regarding how to set these parameters prior to deployment.
67
47
68
48
Also - for cluster management, you will need to manually create at a minimum the HDFS Superuser Principal as [detailed here](https://www.cloudera.com/documentation/enterprise/latest/topics/cm_sg_using_cm_sec_config.html#create-hdfs-superuser) after deployment.
69
49
70
-
## Cloudera Manager and Cluster Metadata Database
71
-
You are able to customize which database you want to use for Cloudera Manager and Cluster Metadata. In compute.tf you will see a "user_data" field for the Utility instance:
Enabling Kerberos is managed using a terraform metadata tag "deployment_type" which is set in [variables.tf](https://github.com/oracle/oci-quickstart-cloudera/blob/master/terraform/variables.tf#L32). Setting this value to "secure" will enable cluster security as part of the setup process. Changing this to "simple" will deploy an unsecured cluster.
74
51
75
-
This is set to use MySQL for the database. If you want to use Postgres, you would change it:
High Availability is also offered as part of the deployment process. When secure cluster operation is chosen this is enabled by default. It can be disabled by either changing the deployment_type to "simple", or modifying the [deploy_on_oci.py](https://github.com/oracle/oci-quickstart-cloudera/blob/master/scripts/deploy_on_oci.py#L60) script and changing the value for "hdfs_ha" to False.
78
55
79
-
You can customize the default root password for MySQL by editing the source script. For the various Cloudera databases, random passwords are generated and used. The same is true when using Postgres.
56
+
## Metadata and MySQL
80
57
81
-
Note that you will also need to change "meta_db_port" in deploy_on_oci.py if you choose to run Postgres.
58
+
You can customize the default root password for MySQL by editing the source script [cms_mysql.sh](https://github.com/oracle/oci-quickstart-cloudera/blob/master/scripts/cms_mysql.sh#L188). For the various Cloudera databases, random passwords are generated and used. These are stored in a flat file on the Utility host for use at deployment time.
82
59
83
60
## Object Storage Integration
84
-
As of the 2.1.0 release, included with this template is a means to deploy clusters with configuration to allow use of OCI Object Storage using S3 Compatability. In order to implement, an S3 Access and Secret key must be set up in the OCI Tenancy first. This process is detailed [here](https://docs.cloud.oracle.com/iaas/Content/Identity/Tasks/managingcredentials.htm#Working2). Once that is in place, modify the [deploy_on_oci.py](https://github.com/oci-quickstart/oci-cloudera/blob/master/scripts/deploy_on_oci.py#L133-L141) script, and set the following values:
61
+
As of the 2.1.0 release, included with this template is a means to deploy clusters with configuration to allow use of OCI Object Storage using S3 Compatability. In order to implement, an S3 Access and Secret key must be set up in the OCI Tenancy first. This process is detailed [here](https://docs.cloud.oracle.com/iaas/Content/Identity/Tasks/managingcredentials.htm#Working2). Once that is in place, modify the [deploy_on_oci.py](https://github.com/oracle/oci-quickstart-cloudera/blob/master/scripts/deploy_on_oci.py#L101-L108) script, and set the following values:
85
62
86
63
s3_compat_enable = 'False'
87
64
s3a_secret_key = 'None'
@@ -97,23 +74,17 @@ Deployment of the module is straight forward using the following Terraform comma
97
74
terraform plan
98
75
terraform apply
99
76
100
-
This will create all the required elements in a compartment in the target OCI tenancy. This includes VCN and Security List parameters. Security audit of these in the network.tf is suggested.
101
-
102
-
After Terraform is finished deploying, the output will show the Python syntax to trigger cluster deployment. This command can be run immediately following deployment, as it has built-in checks to wait until Cloudera Manager API is up and responding before it executes deployment. The syntax is as follows:
It is also possible to destroy an existing cluster with this script using Cloudera Manager
107
-
108
-
python scripts/deploy_on_oci.py -D -m <master_ip>
77
+
This will create all the required elements in a compartment in the target OCI tenancy. This includes VCN and Security List parameters. Security audit of these in the [network module](https://github.com/oracle/oci-quickstart-cloudera/blob/master/terraform/modules/network/main.tf) is suggested.
109
78
110
79
## Destroy the Deployment
111
80
112
81
When you no longer need the deployment, you can run this command to destroy it:
113
82
114
83
terraform destroy
115
84
116
-
## Deployment Caveats
117
-
Currently this module requires Cloudera Manager API to be on an edge host with a Public IP address. This is used to trigger cluster deployment, as well as SSH into the Cloudera Manger host to perform dynamic host discovery to map for Cluster topology.
85
+
## Deployment Architecture
86
+
87
+
Here is a diagram showing what is deployed using this template. Note that resources are automatically distributed among Fault Domains in an Availability Domain to ensure fault tolerance. Additional workers deployed will stripe between the 3 fault domains in sequence starting with the Fault Domain 1 and incrementing sequentially.
All scripts in this location are referenced for deployment automation
3
3
4
-
* boot.sh is invoked by cloudinit on each instance creation via Terraform. It contains steps which perform inital bootstrapping of the instance prior to provisioning.
5
-
* boot_plus_tmp.sh is an alternate version of boot.sh which demonstrates configuring a RAID0 Block Volume array for use as /tmp. This is useful for caching data when using Object Storage directly.
6
-
* cm_boot_mysql.sh is invoked by cloudinit on the Utility node to stand up Cloudera Manager and Pre-requisites using MySQL for Metadata.
7
-
* cm_boot_postgres.sh can be used instead of cm_boot_mysql.sh if you want to use Postgres for Cloudera Manager and Cluster Metadata.
8
-
* deploy_on_oci.py is the primary Python script invoked to deploy Cloudera EDH v6 using cm_client python libraries
4
+
* boot.sh is invoked by CloudInit on each instance creation via Terraform. It contains steps which perform inital bootstrapping of the instance prior to provisioning.
5
+
* boot_plus_tmp.sh is an alternate version of boot.sh which demonstrates configuring a RAID0 Block Volume array for use as /tmp. This is useful for caching data when using Object Storage.
6
+
* cloudera_manager_boot.sh is a top level boot script for Cloudera Manager (Utility) instance. This is required because subsequent scripts are too large to fit in metadata without compression.
7
+
* cms_mysql.sh is invoked by cloudinit on the Utility node to stand up Cloudera Manager and Pre-requisites using MySQL for Metadata. It is compressed and loaded into extended metadata.
8
+
* cms_postgres.sh is an older installaltion method using Postgres instead of MySQL for cluster metadata. This is depracated.
9
+
* deploy_on_oci.py is the primary Python script invoked to deploy Cloudera EDH v6 using cm_client python libraries. It is compressed and loaded into extended metdata.
9
10
10
11
# CloudInit boot scripts
11
12
@@ -15,13 +16,13 @@ With the introduction of local KDC for secure cluster, this requires some setup
15
16
* kdc_server - This is the hostname where KDC is deployed (defaults to Cloudera Manager host)
16
17
* realm - This is set to hadoop.com by default.
17
18
* REALM - This is set to HADOOP.COM by default.
18
-
*cm_boot_mysql.sh
19
+
*cms_mysql.sh
19
20
* KERBEROS_PASSWORD - This is used for the root/admin account.
20
21
* SCM_USER_PASSWORD - By default the cloudera-scm user is given admin control of the KDC. This is required for Cloudera Manager to setup and manage principals, and the password here is used by that account.
21
22
* kdc_server - Defaults to local hostname.
22
23
* realm - This is set to hadoop.com by default.
23
24
* REALM - This is set to HADOOP.COM by default.
24
-
*cm_boot_postgres.sh - Same items as cm_boot_mysql.sh
25
+
*cms_postgres.sh - Same items as cm_boot_mysql.sh
25
26
* deploy_on_oci.py
26
27
* realm - This is HADOOP.COM by default.
27
28
* kdc_admin - Set to cloudera-scm@HADOOP.COM by default.
@@ -30,4 +31,4 @@ With the introduction of local KDC for secure cluster, this requires some setup
30
31
It is highly suggested you modify at a minimum the default passwords prior to deployment.
31
32
32
33
## CAUTION WHEN MODIFYING BOOT SCRIPTS
33
-
Because boot.sh and cm_boot_mysql.sh/cm_boot_postgres.sh are invoked as part of user_data in Terraform, if you modify these files and re-run a deployment, default behavior is existing instances will be destroyed and re-deployed because of this change.
34
+
Because boot.sh and cms_mysql.sh/cms_postgres.sh are invoked as part of user_data and extended_metadata in Terraform, if you modify these files and re-run a deployment, default behavior is existing instances will be destroyed and re-deployed because of this change.
0 commit comments