You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Apr 18, 2024. It is now read-only.
Copy file name to clipboardExpand all lines: README.md
+12-54Lines changed: 12 additions & 54 deletions
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,8 @@
1
1
# oci-quickstart-cloudera
2
2
This is a Terraform module that deploys [Cloudera Enterprise Data Hub](https://www.cloudera.com/products/enterprise-data-hub.html) on [Oracle Cloud Infrastructure (OCI)](https://cloud.oracle.com/en_US/cloud-infrastructure). It is developed jointly by Oracle and Cloudera.
3
3
4
-
## Alternate Versions
5
-
Future development will include support for EDH v5 clusters. In the meantime, use the [1.0.0 release](https://github.com/oci-quickstart/oci-cloudera/releases/tag/1.0.0) for v5 deployments.
4
+
## Deployment Information
5
+
The following table shows Recommended and Minimum supported OCI shapes for each cluster role:
@@ -12,6 +12,15 @@ Future development will include support for EDH v5 clusters. In the meantime, u
12
12
## Resource Manager Deployment
13
13
Using [OCI Resource Manager](https://docs.cloud.oracle.com/iaas/Content/ResourceManager/Concepts/resourcemanager.htm) makes deployment quite easy. Simply [download the .zip](https://github.com/oracle/oci-quickstart-cloudera/zipball/resource-manager) and follow the [Resource Manager instructions](https://docs.cloud.oracle.com/iaas/Content/ResourceManager/Tasks/usingconsole.htm) for how to build a stack. Prior to building the Stack, you may want to modify some parts of the deployment detailed in the sections below.
14
14
15
+
Alternatively you can also use a schema file to make setting deployment variables much easier. In order to leverage this feature, the GitHub zipball must be re-packaged so that it's contents are top-level prior to creating the ORM Stack. This is a straight forward process:
16
+
```
17
+
unzip oci-quickstart-cloudera*.zip
18
+
cd oci-quickstart-cloudera-<TAB_COMPLETE>
19
+
zip -r oci-quickstart-cloudera.zip *
20
+
```
21
+
22
+
Use the oci-quickstart-cloudera.zip file created in the last step to create the ORM Stack.
23
+
15
24
## Python Deployment using cm_client
16
25
The deployment script "deploy_on_oci.py" uses cm_client against Cloudera Manger API v31. As such it does require some customization before execution. Reference the header section in the script, it is highly encouraged you modify the following variables before deployment:
17
26
@@ -42,58 +51,7 @@ High Availability is also offered as part of the deployment process. When secur
42
51
You can customize the default root password for MySQL by editing the source script [cms_mysql.sh](https://github.com/oracle/oci-quickstart-cloudera/blob/master/scripts/cms_mysql.sh#L188). For the various Cloudera databases, random passwords are generated and used. These are stored in a flat file on the Utility host for use at deployment time.
43
52
44
53
## Object Storage Integration
45
-
As of the 2.1.0 release, included with this template is a means to deploy clusters with configuration to allow use of OCI Object Storage using S3 Compatability. In order to implement, an S3 Access and Secret key must be set up in the OCI Tenancy first. This process is detailed [here](https://docs.cloud.oracle.com/iaas/Content/Identity/Tasks/managingcredentials.htm#Working2). Once that is in place, modify the [deploy_on_oci.py](https://github.com/oracle/oci-quickstart-cloudera/blob/master/scripts/deploy_on_oci.py#L101-L108) script, and set the following values:
46
-
47
-
s3_compat_enable = 'False'
48
-
s3a_secret_key = 'None'
49
-
s3a_access_key = 'None'
50
-
s3a_endpoint = 'None'
51
-
52
-
The first should be set to 'True', then replace 'None" with each of the required values. This configuration will then be pushed as part of the cluster deployment.
53
-
54
-
## Resource Manager Variables
55
-
Step 2 for setting up a stack is Configure Variables. By default all variables are filled in, with the exception of the SSH Public and Private keypair used for host access. If you don't have a keypair for use with this deployment, generating one on Linux/Mac is simply:
56
-
57
-
ssh-keygen -t rsa
58
-
59
-
Follow the prompts to generate the key, do not associate a password with it. Copy the contents of each file and paste into the appropriate variable fields as shown here:
This list also can be modified to suit your specific deployment requirements. You should review the settings for the following and ensure you have the capacity in your Tenancy prior to deployment:
64
-
65
-
worker_instance_shape
66
-
worker_node_count
67
-
block_volumes_per_worker
68
-
utility_instance_shape
69
-
master_instance_shape
70
-
bastion_instance_shape
71
-
72
-
Note that it is not suggested to modify the data_blocksize_in_gbs to lower than the default value of 700GB. This is because 700GB is the minimum value to achieve maximum throughput per block volume. Lowering this has a negative impact on HDFS performance. If you need more HDFS capacity, best practice is to increase the block_volumes_per_worker which adds more DFS volumes for capacity and aggregate throughput. For even higher density, the data_blocksize_in_gbs can be increased in tandem.
73
-
74
-
When using DenseIO shapes, it's also possible to set the block_volumes_per_worker to "0" to leverage only local NVME disk for HDFS. In the case that you have both local NVME and block, data tiering will automatically be enabled as part of the deployment process.
75
-
76
-
## Resource Manager Stack Steps
77
-
After building the stack, it only takes 2 actions to deploy:
78
-
79
-
Terraform Actions -> Plan
80
-
Terraform Actions -> Apply
81
-
82
-
This will create all the required elements in a compartment in the target OCI tenancy. This includes VCN and Security List parameters. Security audit of these in the [network module](https://github.com/oracle/oci-quickstart-cloudera/blob/master/terraform/modules/network/main.tf) is suggested.
83
-
84
-
The output of the Apply command will contain a URL to access Cloudera Manager. This is the public IP of the Utility Host, which runs the deployment.
85
-
86
-
## Monitoring Cluster Build
87
-
Because all tasks are done in CloudInit, there are two ways to monitor the deployment. Firstly you can login to the Cloudera Manager URL once it is up and running a few minutes after the Apply command finishes. Alternatively you can SSH into the Utility node, and monitor the log file "/var/log/cloudera-OCI-initialize.log" which contains detailed output from the deployment.
88
-
89
-
## Destroy the Deployment
90
-
91
-
When you no longer need the deployment, you can destroy it:
92
-
93
-
Terraform Actions -> Destroy
94
-
95
-
## Deployment Architecture
96
-
54
+
???LINES MISSING
97
55
Here is a diagram showing what is deployed using this template. Note that resources are automatically distributed among Fault Domains in an Availability Domain to ensure fault tolerance. Additional workers deployed will stripe between the 3 fault domains in sequence starting with the Fault Domain 1 and incrementing sequentially.
0 commit comments