Close Pair Algorithm Practice using Databricks

This repository contains an implementation of the Close Pair Algorithm using Databricks, a cloud-based data engineering platform that provides a collaborative workspace for working with big data and machine learning.

Introduction

The Close Pair Algorithm is a computational geometry algorithm that finds the closest pair of points in a set of points in a two-dimensional space. It is a widely used algorithm in various applications, such as image processing, robotics, and geographic information systems.

Databricks provides a scalable and distributed computing environment that can be used to efficiently process large datasets. This makes it an ideal platform for implementing the Close Pair Algorithm on large datasets.

Getting Started

To use this implementation of the Close Pair Algorithm on Databricks, you will need to have a Databricks account and access to a Databricks workspace. Once you have set up your Databricks workspace, you can clone this repository and import the notebook into your workspace.

The implementation is provided as a Jupyter notebook that can be run on Databricks. The notebook contains the implementation of the Close Pair Algorithm in Python, along with instructions on how to use it.

Usage

To use the Close Pair Algorithm implementation in the notebook, you will need to provide a set of points in a two-dimensional space. These points can be provided in the form of a CSV file, which can be loaded into a DataFrame in Databricks.

Once the points have been loaded into the DataFrame, you can call the close_pair function, which will find the closest pair of points in the set. The function returns the distance between the closest pair of points and the coordinates of the two points.

Conclusion

This implementation of the Close Pair Algorithm using Databricks provides a scalable and efficient way of finding the closest pair of points in a large dataset. By using the distributed computing capabilities of Databricks, the algorithm can be run on large datasets in a reasonable amount of time.

If you have any questions or feedback about this implementation, please feel free to open an issue or pull request in the repository.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Final		Final
assignment1Data/__MACOSX/assignment1Data		assignment1Data/__MACOSX/assignment1Data
.DS_Store		.DS_Store
Assignment1.ipynb		Assignment1.ipynb
README.md		README.md
Untitled.ipynb		Untitled.ipynb
geoPoints0.csv		geoPoints0.csv
geoPoints1.csv		geoPoints1.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Close Pair Algorithm Practice using Databricks

Introduction

Getting Started

Usage

Conclusion

About

Uh oh!

Releases

Packages

Languages

wtwilley17/close-pairs-databricks

Folders and files

Latest commit

History

Repository files navigation

Close Pair Algorithm Practice using Databricks

Introduction

Getting Started

Usage

Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages