Skip to content
This repository was archived by the owner on Jul 3, 2024. It is now read-only.
Neha-Setia edited this page Sep 14, 2017 · 26 revisions

Welcome to the OrientDB-Insights wiki!

Umbrella SI Journey

Engineering Insights: Leverage IBM Watson and IBM Enterprise Cloud to deliver breakthrough insights with a cognitive approach across the Application Engineering Lifecycle

Short Name

OrientDB Crud & Insights on DSX.

Short Description

Get Insights from OrientDB in IBM Data Science Experience.

Offering Type

Cognitive

Introduction

The Journey gives you a head start on how to work with graphs in OrientDB through IBM Data Science Experience(DSX) using PyOrient module - an OrientDB driver for python to operate on data and to get insights from OrientDB. IBM Data Science Experience can be used to analyze data using Jupyter notebooks. Graph databases are well-suited for analyzing interconnections, which is why there has been a lot of interest in using graph databases to mine data from social media. Graph databases are well-suited for analyzing interconnections like to mine data from social media. It is also useful for working with data in business disciplines that involve complex relationships and dynamic schema and creating recommendations like "customers who bought this also looked at...". This journey will help you to understand end-to-end flow starting from downloading the data-set, cleansing of data, extract entities and relations from the data-set, connect with orientDB, create a new orientDB database, populate database with node classes, edge classes, vertices, relations and then execute queries to get more insights from the orientDB database.

Author

By Neha

Code

Demo

N/A

Video

Overview

The Journey gives you a head start on how to work with graphs in OrientDB through IBM Data Science Experience(DSX) using PyOrient module - an OrientDB driver for python to operate on data and to get insights from OrientDB. IBM Data Science Experience can be used to analyze data using Jupyter notebooks.

Graph databases are well-suited for analyzing interconnections like to mine data from social media. It is also useful for working with data in business disciplines that involve complex relationships and dynamic schema and creating recommendations like "customers who bought this also looked at...". This can allow a large level of flexibility to represent your data in a way that makes the most sense to everyone involved, whilst still making the most of the complex interactions between it. OrientDB Database is the best of the both traditional relational databases like MYSQL and document based databases like MongoDB. This journey will help you to understand end-to-end flow starting from downloading the data-set, cleansing of data, extract entities and relations from the data-set, connect with orientDB, create a new orientDB database, populate database with node classes, edge classes, vertices, relations and then execute queries to get more insights from the orientDB database.

In this journey we will demonstrate:

  • Setting up ipython notebook on DSX connecting to orientDB using pyorient.
  • Hands-on on the crud operations and extracting insights from the graph database.

Flow

  1. The developer sets up the kubernetes cluster using kubernetes service on IBM Bluemix.
  2. The orientDB instance is deployed on the kubernetes cluster created by the developer in the first step with persistent volume, exposing the ports(2424, 2480) used by orientDB on bluemix.
  3. The developer creates a Jupyter notebook on the powered by spark. While creation of notebook, an instance of Object Storage is attached to the notebook for storing the data used by the notebook.
  4. The developer uploads the configuration file(config.json) and the dataset (graph-insights.csv) in the object storage.
  5. The credentials of the files from Object Storage are updated in the notebook and files are loaded to create graph database from them.
  6. The notebook communicates with the orientDB through pyorient driver. And various operations are performed on the graph database using functions written in the Jupyter notebook.

Included components

  • OrientDB: A Multi-Model Open Source NoSQL DBMS.

  • IBM Data Science Experience: Analyze data using RStudio, Jupyter, and Python in a configured, collaborative environment that includes IBM value-adds, such as managed Spark.

  • Bluemix Object Storage: A Bluemix service that provides an unstructured cloud data store to build and deliver cost effective apps and services with high reliability and fast speed to market.

  • Jupyter Notebooks: An open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text.

  • Kubernetes Clusters: an open-source system for automating deployment, scaling, and management of containerized applications.

Featured technologies

  • Data Science: Systems and scientific methods to analyze structured and unstructured data in order to extract knowledge and insights.

  • Graph Database: a graph database is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph (or edge or relationship), which directly relates data items in the store. The relationships allow data in the store to be linked together directly, and in many cases retrieved with one operation.

Blog

Graphs are already prevalent in the real world, and in software development. For example, if you are a twitter user, you are one node of the twitter graph with your attributes being the number of tweets written by you, number of people you are following, and to whom you are following, who is following you, is the relationship between you and other Twitter users. Well, pretty much you are dealing with a Graph. Graph Databases are currently gaining a lot of interest, as they can give very powerful data modeling tools that provide a closer fit to how your data works in the real world. Graph databases are also useful for working with data in business disciplines that involve complex relationships and dynamic schema, such as supply chain management, identifying the source of an IP telephony issue and creating "customers who bought this also looked at..." recommendations.

The Journey gives you a head start on how to work with graphs in OrientDB through IBM Data Science Experience(DSX) using PyOrient module - an OrientDB driver for python to operate on data and to get insights from OrientDB. IBM Data Science Experience can be used to analyze data using Jupyter notebooks. The Insights journey provides a guide on setting up ipython notebook on DSX connecting to orientDB and performing crud operations on the database using pyorient. This developer journey will help you to understand end-to-end flow starting from downloading the data-set, cleansing of data, extract entities and relations from the data-set, connect with orientDB, create a new orientDB database, populate database with node classes, edge classes, vertices, relations and then execute queries to get more insights from the orientDB database.

By the end of this Journey, The users will have a good understanding of the orientDB which can be extended to create their own domain specific knowledge graph as per their business requirements and extract interesting information from it.

View the entire [Orient DB operations on IBM DATA SCIENCE EXPERIENCE] (https://github.com/IBM/graph-db-insights/) Journey, including demos, code, and more!

Links

Clone this wiki locally