Cloudera delivers the modern platform for machine learning and analytics optimized for the cloud. Cloudera is the inventor of Hadoop, Solr -Search, Kudu, Impala, HDFS, Yarn and host more open source projects.
Cloudera and Hortonworks recently merged to create the largest independent open source copmany creating the world’s first Enterprise Data Cloud from EDGE2AI
Cloudera runs on Azure using IaaS and local storage and runs on ADLS. Cloudera also has a PaaS based service Altus that runs on Azure and uses ADLS.
Cloudera has 4 editions:
- Data Engineering / Data Science
- Operational Database
- Cloudera Data Warehouse
- Enterprise Data Hub.
Cloudera also has a Data Science at Scale developer tool for data engineering and data science -Cloudera Data Science Workbench which runs on edge / gateway nodes. This provides developers and customers the enterprise security, lineage, catalog, team based development and developer isolation using Docker and Kubernetes and a simple way to deploy real-time REST API models into production at scale. Cloudera largest customer has 64,000 nodes and Cloudera has over 400 customers running Cloudera on Petabyte scale.
Cloudera runs anywhere; both on-premise and on multi-cloud. It provides customer with the same enterprise experience with Security, Governance, Catalog, Control plane and life cycle management with Cloudera’s Shared Data eXperience –SDX.
This is a Lab guide for Big Data / Cloudera on Azure using the Azure marketplace. The labs start with provisioning a Cloudera cluster on Azure, ingesting structured and semi-structured data, performing analytics with Impala, log file parsing using Hive and SerDe, creating cross table joins with semi-structured data and developing and running pySpark.
-
Lab 1 - User Registration for Cloudera on Azure Hands-On Lab
-
Lab 2 - Create Cloudera Big Data Cluster on Azure using Azure Marketplace
- 1 - Basics
- 2 - Infrastructure Information
- 3 - Cloudera Setup Information
- 4 - User Information
- 5 - Summary
- 6 - Post completion steps
-
Lab 3 - Cloudera Manager and Hue
- Cloudera Manager
- HUE - Hadoop User Experience
-
Lab 4 - Ingestion and Analytics
- Using Impala
-
Lab 5 - Hive and Impala Analytics for Web Logs
- Select Files
- Hive Editor
- Impala Editor
-
Lab 6 - Running Spark on Cloudera
- pySpark Program
- Cloudera on Azure Reference Architecture
-
Cloudera Documentation