Elasticsearch

Elasticsearch Intro

Elasticsearch is a distributed NoSQL JSON document database derived from Lucene. Elasticsearch provides a full-text search service and is used quite extensively with websites such as Quora, Github, StackExchange and many more. The The RESTful API provides a simple to use interface with the distributed database allowing simple integration with websites. In this dev-op we will be deploying Elasticsearch on an AWS cluster and perform a simple query.

## Spin up AWS instances

We would recommend using t2.micro instances with Ubuntu Server 14.04 LTS (HVM), SSD Volume Type and take advantage of Amazon’s Free Tier program. Be sure to terminate the instances when you are finished to prevent AWS charges if you go over the 700 hour limit. For practice you can try spinning up 3 nodes for Elasticsearch.

## Setup Elasticsearch

Elasticsearch will be installed on all nodes with the same configuration.

Run the following on the all nodes by SSH-ing into each node:

node$ sudo apt-get update

Install java-development-kit
node$ sudo apt-get install openjdk-7-jdk

Install Elasticsearch
node$ wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.5.2.tar.gz -P ~/Downloads
node$ sudo tar -xvf ~/Downloads/elasticsearch-1.5.2.tar.gz -C /usr/local
node$ sudo mv /usr/local/elasticsearch-1.5.2 /usr/local/elasticsearch

Set the ELASTICSEARCH_HOME environment variable and add to PATH in .profile
node$ nano ~/.profile

# Add the following
export ELASTICSEARCH_HOME=/usr/local/elasticsearch
export PATH=$PATH:$ELASTICSEARCH_HOME/bin

node$ source ~/.profile

Install AWS Cloud Plugin for Elasticsearch
node$ sudo $ELASTICSEARCH_HOME/bin/plugin install elasticsearch/elasticsearch-cloud-aws/2.5.0

Configure Elasticsearch for node discovery
node$ sudo nano $ELASTICSEARCH_HOME/config/elasticsearch.yml

Change the access_key (AWS access key id), secret_key (AWS secret access key), region (cluster region), and group (security group name) to your AWS settings. Also change the name of your cluster to be something specific to you (otherwise Elasticsearch will assume all the nodes on your EC2 are yours.) Warning: BE CAREFUL NOT TO COMMIT THIS SCRIPT TO GITHUB SINCE IT HAS YOUR AWS CREDENTIALS.

cloud.aws.access_key: AKIAJVKQLSNIFBFH66EA
cloud.aws.secret_key: d79HExZf1tyy9xl7IPNXogDfdc4lQR92scWQIZ+H
cloud.aws.region: us-west-2

discovery.type: ec2
discovery.ec2.groups: your-security-group
################### Elasticsearch Configuration Example ###################
…
…
cluster.name: my-cluster-name

Start Elasticsearch

node$ sudo $ELASTICSEARCH_HOME/bin/elasticsearch &

## Check status of Elasticsearch cluster You can check to see if all nodes are up and running by executing the following on any of the nodes.

node$ curl 'localhost:9200/_cat/health?v'

Output should look like the following with a 3 node cluster . . . . . . .

## Example Search with Python Elasticsearch Client ### Install Python Elasticsearch Client

Install git and clone the repository

node$ sudo apt-get install git node$ git clone https://github.com/andrewvc/ee-datasets

### Put the movie_db data onto the Elasticsearch cluster node$ cd ee-datasets node$ java -jar elastic-loader.jar http://localhost:9200 datasets/movie_db.eloader ### Perform a simple search in python Here we will simply look for a movie that contains the word CIA in its description field. In the previous step when we loaded the movie_db data, we actually created an index called movie_db in Elasticsearch. ### Additional notes

Interested in transitioning to a career in data engineering?

Find out more about the Insight Data Engineering Fellows Program in New York and Silicon Valley, apply today, or sign up for program updates.

Elasticsearch

Elasticsearch Intro

Table of Contents

Install git and clone the repository

Interested in transitioning to a career in data engineering?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Home

AWS

Ingestion

File Systems

Batch Processing

Stream Processing

Databases

Web frameworks

Other

Clone this wiki locally