Skip to content
This repository was archived by the owner on Apr 18, 2024. It is now read-only.

Commit 865654a

Browse files
committed
Merge branch 'simple'
This is a major change to deployment model. All deployment is now done leveraging CloudInit, and user_data plus extended_metadata to enable "hands-off" cluster deployment. This simplifies deployment of Cloudera Enterprise Data Hub on OCI, removing requirements for the end user to install python and dependent libraries, as well as needing the Cloudera Manager host to be publicly accessible.
2 parents b5bf5ac + 0acb4c7 commit 865654a

20 files changed

+591
-1009
lines changed

README.md

Lines changed: 22 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -9,79 +9,56 @@ Future development will include support for EDH v5 clusters. In the meantime, u
99
| Recommended | BM.DenseIO2.52 | VM.Standard2.4 | VM.Standard2.16 |
1010
| Minimum | VM.Standard2.8 | VM.Standard2.1 | VM.Standard2.8 |
1111

12-
Host types can be customized in this template. Also included with this template is an easy method to customize block volume quantity and size as pertains to HDFS capacity. See "variables.tf" for more information in-line.
12+
Host types can be customized in this template. Also included with this template is an easy method to customize block volume quantity and size as pertains to HDFS capacity. See [variables.tf](https://github.com/oracle/oci-quickstart-cloudera/blob/master/terraform/variables.tf#L48-L62) for more information in-line.
1313

1414
## Prerequisites
1515
First off you'll need to do some pre deploy setup. That's all detailed [here](https://github.com/oracle/oci-quickstart-prerequisites).
1616

17-
## Additional Python Dependencies
18-
This module depends on Python, Paramiko, PIP, and cm_client. These should be installed on the host you are using to deploy the Terraform module.
19-
20-
On EL7 hosts, installation can be performed using the following commands:
21-
22-
sudo yum install python python-pip python-paramiko.noarch -y
23-
sudo pip install --upgrade pip
24-
sudo pip install cm_client
25-
26-
On Mac, installation can be peformed using the following commands:
27-
28-
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
29-
sudo python get-pip.py
30-
sudo pip install --upgrade pip
31-
sudo pip install cm_client paramiko
32-
3317
### Clone the Module
3418
Now, you'll want a local copy of this repo. You can make that with the commands:
3519

3620
git clone https://github.com/oracle/oci-quickstart-cloudera.git
3721
cd oci-quickstart-cloudera
38-
ls
3922

4023
## Python Deployment using cm_client
41-
The deployment script "deploy_on_oci.py" uses cm_client against Cloudera Manger API v31. As such it does require some customization before execution. Reference the header section in the script, it is highly encouraged you modify the following variables before deployment, ssh_keyfile is required or deployment will fail:
24+
The deployment script "deploy_on_oci.py" uses cm_client against Cloudera Manger API v31. As such it does require some customization before execution. Reference the header section in the script, it is highly encouraged you modify the following variables before deployment:
4225

4326
admin_user_name
4427
admin_password
4528
cluster_name
46-
ssh_keyfile (REQUIRED)
47-
cluster_service_list
4829

49-
Also if you modify the compute.tf in any way to change hostname parameters, you will need to update these variables for pattern matching, otherwise host detection and cluster layout will fail:
30+
Also if you modify the compute.tf in any way to change hostname parameters, you will need to update these variables for pattern matching, otherwise cluster deployment will fail:
5031

51-
worker_hosts_contain
52-
master_hosts_contain
53-
namenode_host_contains
54-
secondary_namenode_host_contains
55-
cloudera_manager_host_contains
32+
worker_hosts_prefix = 'cdh-worker'
33+
namenode_host = 'cdh-master-1'
34+
secondary_namenode_host = 'cdh-master-2'
35+
cloudera_manager_host = 'cdh-utility-1'
5636

5737
In addition, further customization of the cluster deployment can be done by modification of the following functions:
5838

5939
setup_mgmt_rcg
6040
update_cluster_rcg_configuration
6141

62-
This does require some knowledge of Python - modify at your own risk. These functions contain Cloudera specific tuning parameters as well as host mapping for roles.
42+
This does require some knowledge of Python and Cloudera - modify at your own risk. These functions contain Cloudera specific tuning parameters as well as host mapping for roles.
6343

64-
## Kerberos Secure Cluster by Default
44+
## Kerberos Secure Cluster option
6545

66-
This automation now defaults to using a local KDC deployed on the Cloudera Manager instance for secure cluster operation. Please read the scripts [README](https://github.com/oci-quickstart/oci-cloudera/blob/master/scripts/README.md) for information regarding how to set these parameters prior to deployment.
46+
This automation supports using a local KDC deployed on the Cloudera Manager instance for secure cluster operation. Please read the scripts [README](https://github.com/oracle/oci-quickstart-cloudera/blob/master/scripts/README.md) for information regarding how to set these parameters prior to deployment.
6747

6848
Also - for cluster management, you will need to manually create at a minimum the HDFS Superuser Principal as [detailed here](https://www.cloudera.com/documentation/enterprise/latest/topics/cm_sg_using_cm_sec_config.html#create-hdfs-superuser) after deployment.
6949

70-
## Cloudera Manager and Cluster Metadata Database
71-
You are able to customize which database you want to use for Cloudera Manager and Cluster Metadata. In compute.tf you will see a "user_data" field for the Utility instance:
72-
73-
user_data = "${base64encode(file("scripts/cm_boot_mysql.sh"))}"
50+
Enabling Kerberos is managed using a terraform metadata tag "deployment_type" which is set in [variables.tf](https://github.com/oracle/oci-quickstart-cloudera/blob/master/terraform/variables.tf#L32). Setting this value to "secure" will enable cluster security as part of the setup process. Changing this to "simple" will deploy an unsecured cluster.
7451

75-
This is set to use MySQL for the database. If you want to use Postgres, you would change it:
52+
## High Availability
7653

77-
user_data = "${base64encode(file("scripts/cm_boot_postgres.sh"))}"
54+
High Availability is also offered as part of the deployment process. When secure cluster operation is chosen this is enabled by default. It can be disabled by either changing the deployment_type to "simple", or modifying the [deploy_on_oci.py](https://github.com/oracle/oci-quickstart-cloudera/blob/master/scripts/deploy_on_oci.py#L60) script and changing the value for "hdfs_ha" to False.
7855

79-
You can customize the default root password for MySQL by editing the source script. For the various Cloudera databases, random passwords are generated and used. The same is true when using Postgres.
56+
## Metadata and MySQL
8057

81-
Note that you will also need to change "meta_db_port" in deploy_on_oci.py if you choose to run Postgres.
58+
You can customize the default root password for MySQL by editing the source script [cms_mysql.sh](https://github.com/oracle/oci-quickstart-cloudera/blob/master/scripts/cms_mysql.sh#L188). For the various Cloudera databases, random passwords are generated and used. These are stored in a flat file on the Utility host for use at deployment time.
8259

8360
## Object Storage Integration
84-
As of the 2.1.0 release, included with this template is a means to deploy clusters with configuration to allow use of OCI Object Storage using S3 Compatability. In order to implement, an S3 Access and Secret key must be set up in the OCI Tenancy first. This process is detailed [here](https://docs.cloud.oracle.com/iaas/Content/Identity/Tasks/managingcredentials.htm#Working2). Once that is in place, modify the [deploy_on_oci.py](https://github.com/oci-quickstart/oci-cloudera/blob/master/scripts/deploy_on_oci.py#L133-L141) script, and set the following values:
61+
As of the 2.1.0 release, included with this template is a means to deploy clusters with configuration to allow use of OCI Object Storage using S3 Compatability. In order to implement, an S3 Access and Secret key must be set up in the OCI Tenancy first. This process is detailed [here](https://docs.cloud.oracle.com/iaas/Content/Identity/Tasks/managingcredentials.htm#Working2). Once that is in place, modify the [deploy_on_oci.py](https://github.com/oracle/oci-quickstart-cloudera/blob/master/scripts/deploy_on_oci.py#L101-L108) script, and set the following values:
8562

8663
s3_compat_enable = 'False'
8764
s3a_secret_key = 'None'
@@ -97,23 +74,17 @@ Deployment of the module is straight forward using the following Terraform comma
9774
terraform plan
9875
terraform apply
9976

100-
This will create all the required elements in a compartment in the target OCI tenancy. This includes VCN and Security List parameters. Security audit of these in the network.tf is suggested.
101-
102-
After Terraform is finished deploying, the output will show the Python syntax to trigger cluster deployment. This command can be run immediately following deployment, as it has built-in checks to wait until Cloudera Manager API is up and responding before it executes deployment. The syntax is as follows:
103-
104-
python scripts/deploy_on_oci.py -B -m <master_ip> -d <disk_count> -w <worker_shape>
105-
106-
It is also possible to destroy an existing cluster with this script using Cloudera Manager
107-
108-
python scripts/deploy_on_oci.py -D -m <master_ip>
77+
This will create all the required elements in a compartment in the target OCI tenancy. This includes VCN and Security List parameters. Security audit of these in the [network module](https://github.com/oracle/oci-quickstart-cloudera/blob/master/terraform/modules/network/main.tf) is suggested.
10978

11079
## Destroy the Deployment
11180

11281
When you no longer need the deployment, you can run this command to destroy it:
11382

11483
terraform destroy
11584

116-
## Deployment Caveats
117-
Currently this module requires Cloudera Manager API to be on an edge host with a Public IP address. This is used to trigger cluster deployment, as well as SSH into the Cloudera Manger host to perform dynamic host discovery to map for Cluster topology.
85+
## Deployment Architecture
86+
87+
Here is a diagram showing what is deployed using this template. Note that resources are automatically distributed among Fault Domains in an Availability Domain to ensure fault tolerance. Additional workers deployed will stripe between the 3 fault domains in sequence starting with the Fault Domain 1 and incrementing sequentially.
88+
89+
![Deployment Architecture Diagram](https://github.com/oracle/oci-quickstart-cloudera/blob/master/images/deployment_architecture.png)
11890

119-
Future enhancements to this module are planned to support a completely Private (non-Internet exposed) cluster deployment.

images/deployment_architecture.png

150 KB
Loading

scripts/README.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
11
# scripts
22
All scripts in this location are referenced for deployment automation
33

4-
* boot.sh is invoked by cloudinit on each instance creation via Terraform. It contains steps which perform inital bootstrapping of the instance prior to provisioning.
5-
* boot_plus_tmp.sh is an alternate version of boot.sh which demonstrates configuring a RAID0 Block Volume array for use as /tmp. This is useful for caching data when using Object Storage directly.
6-
* cm_boot_mysql.sh is invoked by cloudinit on the Utility node to stand up Cloudera Manager and Pre-requisites using MySQL for Metadata.
7-
* cm_boot_postgres.sh can be used instead of cm_boot_mysql.sh if you want to use Postgres for Cloudera Manager and Cluster Metadata.
8-
* deploy_on_oci.py is the primary Python script invoked to deploy Cloudera EDH v6 using cm_client python libraries
4+
* boot.sh is invoked by CloudInit on each instance creation via Terraform. It contains steps which perform inital bootstrapping of the instance prior to provisioning.
5+
* boot_plus_tmp.sh is an alternate version of boot.sh which demonstrates configuring a RAID0 Block Volume array for use as /tmp. This is useful for caching data when using Object Storage.
6+
* cloudera_manager_boot.sh is a top level boot script for Cloudera Manager (Utility) instance. This is required because subsequent scripts are too large to fit in metadata without compression.
7+
* cms_mysql.sh is invoked by cloudinit on the Utility node to stand up Cloudera Manager and Pre-requisites using MySQL for Metadata. It is compressed and loaded into extended metadata.
8+
* cms_postgres.sh is an older installaltion method using Postgres instead of MySQL for cluster metadata. This is depracated.
9+
* deploy_on_oci.py is the primary Python script invoked to deploy Cloudera EDH v6 using cm_client python libraries. It is compressed and loaded into extended metdata.
910

1011
# CloudInit boot scripts
1112

@@ -15,13 +16,13 @@ With the introduction of local KDC for secure cluster, this requires some setup
1516
* kdc_server - This is the hostname where KDC is deployed (defaults to Cloudera Manager host)
1617
* realm - This is set to hadoop.com by default.
1718
* REALM - This is set to HADOOP.COM by default.
18-
* cm_boot_mysql.sh
19+
* cms_mysql.sh
1920
* KERBEROS_PASSWORD - This is used for the root/admin account.
2021
* SCM_USER_PASSWORD - By default the cloudera-scm user is given admin control of the KDC. This is required for Cloudera Manager to setup and manage principals, and the password here is used by that account.
2122
* kdc_server - Defaults to local hostname.
2223
* realm - This is set to hadoop.com by default.
2324
* REALM - This is set to HADOOP.COM by default.
24-
* cm_boot_postgres.sh - Same items as cm_boot_mysql.sh
25+
* cms_postgres.sh - Same items as cm_boot_mysql.sh
2526
* deploy_on_oci.py
2627
* realm - This is HADOOP.COM by default.
2728
* kdc_admin - Set to cloudera-scm@HADOOP.COM by default.
@@ -30,4 +31,4 @@ With the introduction of local KDC for secure cluster, this requires some setup
3031
It is highly suggested you modify at a minimum the default passwords prior to deployment.
3132

3233
## CAUTION WHEN MODIFYING BOOT SCRIPTS
33-
Because boot.sh and cm_boot_mysql.sh/cm_boot_postgres.sh are invoked as part of user_data in Terraform, if you modify these files and re-run a deployment, default behavior is existing instances will be destroyed and re-deployed because of this change.
34+
Because boot.sh and cms_mysql.sh/cms_postgres.sh are invoked as part of user_data and extended_metadata in Terraform, if you modify these files and re-run a deployment, default behavior is existing instances will be destroyed and re-deployed because of this change.

0 commit comments

Comments
 (0)