|
1 | 1 | # oci-cloudera
|
2 |
| -These are Terraform modules for deploying Cloudera Enterprise Data Hub (EDH) on Oracle Cloud Infrastructure (OCI). This consists of two sub-modules, one for Cloudera EDH v5, and one for Cloudera EDH v6: |
| 2 | +This module deploys a cluster of arbitrary size using Cloudera Enterprise Data Hub v6 and Cloudera Manager v6.1. |
| 3 | + |
| 4 | +Future development will include support for EDH v5 clusters. In the meantime, use the [1.0.0 release](https://github.com/oci-quickstart/oci-cloudera/releases/tag/1.0.0) for v5 deployments. |
| 5 | + |
| 6 | +| | Worker Nodes | Bastion Instance | Utility and Master Instances | |
| 7 | +|-------------|----------------|------------------|------------------------------| |
| 8 | +| Recommended | BM.DenseIO2.52 | VM.Standard2.4 | VM.Standard2.16 | |
| 9 | + |
| 10 | +Host types can be customized in the env-vars file referenced below. Also included with this template is an easy method to customize block volume quantity and size as pertains to HDFS capacity. See "variables.tf" for more information in-line. |
| 11 | + |
| 12 | +## Prerequisites |
| 13 | +First off you'll need to do some pre deploy setup. That's all detailed [here](https://github.com/oci-quickstart/oci-prerequisites). |
| 14 | + |
| 15 | +### Additional Python Dependencies |
| 16 | +This module depends on Python, Paramiko, PIP, and cm_client. These should be installed on the host you are using to deploy the Terraform module. |
| 17 | + |
| 18 | +On EL7 hosts, installation can be performed using the following commands: |
| 19 | + |
| 20 | + sudo yum install python python-pip python-paramiko.noarch -y |
| 21 | + sudo pip install --upgrade pip |
| 22 | + sudo pip install cm_client |
| 23 | + |
| 24 | +On Mac, installation can be peformed using the following commands: |
| 25 | + |
| 26 | + curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py |
| 27 | + sudo python get-pip.py |
| 28 | + sudo pip install --upgrade pip |
| 29 | + sudo pip install cm_client paramiko |
| 30 | + |
| 31 | +### Clone the Module |
| 32 | +Now, you'll want a local copy of this repo. You can make that with the commands: |
| 33 | + |
| 34 | + git clone https://github.com/oci-quickstart/oci-cloudera.git |
| 35 | + cd oci-cloudera/v6 |
| 36 | + ls |
| 37 | + |
| 38 | +## Python Deployment using cm_client |
| 39 | +The deployment script "deploy_on_oci.py" uses cm_client against Cloudera Manger API v31. As such it does require some customization before execution. Reference the header section in the script, it is highly encouraged you modify the following variables before deployment, ssh_keyfile is required or deployment will fail: |
| 40 | + |
| 41 | + admin_user_name |
| 42 | + admin_password |
| 43 | + cluster_name |
| 44 | + ssh_keyfile (REQUIRED) |
| 45 | + cluster_service_list |
| 46 | + |
| 47 | +Also if you modify the compute.tf in any way to change hostname parameters, you will need to update these variables for pattern matching, otherwise host detection and cluster layout will fail: |
| 48 | + |
| 49 | + worker_hosts_contain |
| 50 | + master_hosts_contain |
| 51 | + namenode_host_contains |
| 52 | + secondary_namenode_host_contains |
| 53 | + cloudera_manager_host_contains |
| 54 | + |
| 55 | +In addition, further customization of the cluster deployment can be done by modification of the following functions: |
| 56 | + |
| 57 | + setup_mgmt_rcg |
| 58 | + update_cluster_rcg_configuration |
| 59 | + |
| 60 | +This does require some knowledge of Python - modify at your own risk. These functions contain Cloudera specific tuning parameters as well as host mapping for roles. |
| 61 | + |
| 62 | +## Kerberos Secure Cluster by Default |
| 63 | + |
| 64 | +This automation now defaults to using a local KDC deployed on the Cloudera Manager instance for secure cluster operation. Please read the scripts [README](../v6/scripts/README.md) for information regarding how to set these parameters prior to deployment. |
| 65 | + |
| 66 | +Also - for cluster management, you will need to manually create at a minimum the HDFS Superuser Principal as [detailed here](https://www.cloudera.com/documentation/enterprise/latest/topics/cm_sg_using_cm_sec_config.html#create-hdfs-superuser) after deployment. |
| 67 | + |
| 68 | +## Cloudera Manager and Cluster Metadata Database |
| 69 | +You are able to customize which database you want to use for Cloudera Manager and Cluster Metadata. In compute.tf you will see a "user_data" field for the Utility instance: |
| 70 | + |
| 71 | + user_data = "${base64encode(file("scripts/cm_boot_mysql.sh"))}" |
| 72 | + |
| 73 | +This is set to use MySQL for the database. If you want to use Postgres, you would change it: |
| 74 | + |
| 75 | + user_data = "${base64encode(file("scripts/cm_boot_postgres.sh"))}" |
| 76 | + |
| 77 | +You can customize the default root password for MySQL by editing the source script. For the various Cloudera databases, random passwords are generated and used. The same is true when using Postgres. |
| 78 | + |
| 79 | +Note that you will also need to change "meta_db_port" in deploy_on_oci.py if you choose to run Postgres. |
| 80 | + |
| 81 | +## Deployment Syntax |
| 82 | +Deployment of the module is straight forward using the following Terraform commands |
| 83 | + |
| 84 | + terraform init |
| 85 | + terraform plan |
| 86 | + terraform apply |
| 87 | + |
| 88 | +This will create all the required elements in a compartment in the target OCI tenancy. This includes VCN and Security List parameters. Security audit of these in the network.tf is suggested. |
| 89 | + |
| 90 | +After Terraform is finished deploying, the output will show the Python syntax to trigger cluster deployment. This command can be run immediately following deployment, as it has built-in checks to wait until Cloudera Manager API is up and responding before it executes deployment. The syntax is as follows: |
| 91 | + |
| 92 | + python scripts/deploy_on_oci.py -B -m <master_ip> -d <disk_count> -w <worker_shape> |
| 93 | + |
| 94 | +It is also possible to destroy an existing cluster with this script using Cloudera Manager |
| 95 | + |
| 96 | + python scripts/deploy_on_oci.py -D -m <master_ip> |
| 97 | + |
| 98 | +## Destroy the Deployment |
| 99 | + |
| 100 | +When you no longer need the deployment, you can run this command to destroy it: |
| 101 | + |
| 102 | + terraform destroy |
| 103 | + |
| 104 | +## Deployment Caveats |
| 105 | +Currently this module requires Cloudera Manager API to be on an edge host with a Public IP address. This is used to trigger cluster deployment, as well as SSH into the Cloudera Manger host to perform dynamic host discovery to map for Cluster topology. |
| 106 | + |
| 107 | +Future enhancements to this module are planned to support a completely Private (non-Internet exposed) cluster deployment. |
| 108 | + |
| 109 | + |
| 110 | + |
| 111 | + |
3 | 112 |
|
4 |
| -* Cloudera EDH v5 uses cm_api python based deployment, which is currently deprecated (v19). |
5 |
| -* Cloudera EDH v6 uses cm_client python based deployment, which is current (v31). |
|
0 commit comments