Skip to content
This repository was archived by the owner on Jul 3, 2024. It is now read-only.
Neha-Setia edited this page Sep 14, 2017 · 26 revisions

Welcome to the graph-db-insights wiki!

Umbrella SI Journey

Engineering Insights: Leverage IBM Watson and IBM Enterprise Cloud to deliver breakthrough insights with a cognitive approach across the Application Engineering Lifecycle

Short Name

Orientdb Crud & Insights on DSX.

Short Description

Get insights from OrientDB graph database in IBM Data Science Experience.

Offering Type

Cognitive

Introduction

Graph databases are well-suited for analyzing interconnections, which is why there has been a lot of interest in using graph databases to mine data from social media.This tutorial gives you a head start on how to work with graphDB-Orientdb on IBM Data Science Experience(DSX) using pyorient.This journey will help developers get started with various orientDB operations like CRUD, basic traversal and extracting insights using both SQL and gremlin- which is a specialized query language for property graphs by Apache Tinkerpop that works on all major graph databases on both console as well as orientDB studio.

Author

By Neha

Code

Demo

N/A

Video

Overview

The GraphDB Insights journey gives the developers a head start on how to work with OrientDB database on IBM Data Science Experience(DSX) using pyorient.This journey will help developers get started with various orientDB operations like CRUD, basic traversal and extracting insights using python driver for orientDB- pyorient.

In this journey we will demonstrate:

  • Setting up ipython notebook on DSX connecting to orientDB using pyorient.
  • Hands-on on the crud operations and extracting insights from the graph database.

Flow

  1. The developer sets up the kubernetes cluster using kubernetes service on IBM Bluemix.
  2. The orientDB instance is deployed on the kubernetes cluster created by the developer in the first step with persistent volume, exposing the ports(2424, 2480) used by orientDB on bluemix.
  3. The developer creates a Jupyter notebook on the IBM Data science experience powered by spark. While creation of notebook, an instance of Object Storage is attached to the notebook for storing the data used by the notebook.
  4. The developer uploads the configuration file(config.json) and Kaggle IMDb movie data(graph-insights.csv) in the object storage.
  5. The credentials of the files from Object Storage are updated in the notebook and files are loaded to create graph database from them.
  6. The notebook communicates with the orientDB through pyorient driver. And various operations are performed on the graph database using functions written in the Jupyter notebook.

Included components

  • Orientdb: A Multi-Model Open Source NoSQL DBMS.

  • IBM Data Science Experience: Analyze data using RStudio, Jupyter, and Python in a configured, collaborative environment that includes IBM value-adds, such as managed Spark.

  • Bluemix Object Storage: A Bluemix service that provides an unstructured cloud data store to build and deliver cost effective apps and services with high reliability and fast speed to market.

  • Jupyter Notebooks: An open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text.

  • Kubernetes Clusters

  • Bluemix container service

Featured technologies

  • Data Science: Systems and scientific methods to analyze structured and unstructured data in order to extract knowledge and insights.

Blog

Graphs are already prevalent in the real world, and in software development. For example, if you are twitter user you are one node of the twitter graph with attributes being the number of tweets written by you, number of people you are following, and to whom you are following, who is following you, being the relation between you and other Twitter users. Well, pretty much you are dealing with a Graph.

Graph Databases are currently gaining a lot of interest, as they can give very powerful data modeling tools that provide a closer fit to how your data works in the real world. This can allow a large level of flexibility to represent your data in a way that makes the most sense to everyone involved, whilst still making the most of the complex interactions between it. Graph Databases are the best of the both traditional relational databases like MYSQL and document based databases like MongoDB. Graph databases are also useful for working with data in business disciplines that involve complex relationships and dynamic schema, such as supply chain management, identifying the source of an IP telephony issue and creating "customers who bought this also looked at..." recommendations.

The Graph Database Insights Journey demonstrates various operations on orientDB residing on kubernetes through Data Science Experience using pyorient.The graph DB insights journey provides a guide on setting up ipython notebook on DSX connecting to orientDB and performing crud operations on the database using pyorient. This developer Journey makes use of kaggle IMDB movie data set to create a graph where person and movies act as nodes classes with few attributes like their name, facebook likes etc. Each node class is related to other node class either by worked_with(representing co-workers) or acted_in ( connecting an actor to movie in which he acted in) i.e. edge class (Relation).The tutorial also covers how to get some basic insights from orientDB like the most mentioned nodes and finding clusters based on some condition.

By the end of this tutorial, The users will have a good understanding of the orientDB which can be extended to create their own domain specific knowledge graph as per their business requirements and extract interesting information from it. View the entire [Orient DB operations on IBM DATA SCIENCE EXPERIENCE] (https://github.com/IBM/graph-db-insights/) Journey, including demos, code, and more!

Links

Clone this wiki locally