Visualizing Data with Seaborn

Experimenting with Numpy and Pandas

This repository contains exercises from Codeup's bootcamp.

Elaborating on the Subject Matter

Numpy

Numpy is a library for representing and working with large and multi-dimensional arrays. Most other libraries in the data-science ecosystem depend on numpy, making it one of the fundamental data science libraries.
Numpy provides a number of useful tools for scientific programming. Convention is to import it like so:

import numpy as np

Provides capabilities pertaining to:

Indexing
- An Array type that goes beyond built-in lists.
  - Create a numpy array by passing a list to the np.array function.
  - Make it multi-dimensional by passing a list of lists to np.array
Vectorized Operations
- Vectorizing operations means that operations are automatically applied to every element in a vector
  - Not only are the arithmatic operators vectorized;the same applies to comparison operators.
Array Creation (several methods)
- np.random.randn; np.zeros; np.ones; np.full; np.arange; np.linspace
Array Methods
- .min; .max; .mean; .sum; .std (standard deviation)

Pandas

Series:

The Pandas Series object is similar to a numpy array, with added functionality and features.

A pandas Series object is a one-dimensional, labeled array made up of an autogenerated index that starts at 0 and data of a single data type.

A couple of important things to note about a Series:

When attempting to create a pandas Series using multiple datatypes(e.g., int + string), the data will be converted to the same object data type; the int values will lose their int functionality.
A pandas Series can be created in several ways; we will look at a few of these ways below. However, it will most often be created by selecting a single column from a pandas Dataframe in which case the Series retains the same index as the Dataframe. We will dive into this in the next two lessons: DataFrames and Advanced DataFrames.

Convention is to import pandas like this: import pandas as pd Pandas series are vectorized by default.

Series Attributes Attributes return useful information about a Series' properties; they don't perform operations or calculations with the Series. Attributes are easily accessible using dot notation like we will see in the examples below. There are several components comprising a Series; easily accessed individually using attributes.
Examples:
- .index: The index allows us to reference items in the series.
- .values: The values are the data itself
- .dtype: The dtype is the data type of the elements in the Series.
  - int, float, bool, object, category
- .name: The name is an optional human-friendly name for the Series.
- .size: The .size attribute returns an int representing the number of rows in the Series.
  - NULL values are included.
- .shape: The .shape attribute returns a tuple representing the rows and columns when used on a two-dimensional structure like a DataFrame, but it can also be used on a Series to return its number of rows.
  - NULL values are included.
Series Methods
- .head: The .head(n) method returns the first n rows in the Series
- .tail: The .tail(n) method returns the last n rows in the Series
- .sample: The .sample(n) method returns a random sample of rows in the Series
- .astype: used to convert the data types of the values in the series
- .value_counts: returns a new Series consisting of a labeled index representing the unique values from the original Series and values representing the frequency of each unique value that appears in the original Series.
  - It's like performing a SQL GROUP BY with a COUNT.
- nlargest: number of largest values
- nsmallest: number of smallest values
  - I can set the keep parameter to first, last, or all to deal with duplicate largest or smallest values; this is quite handy.
- sort_values: sorting in ascending or descending order
- sort_index: sorting in ascending or descending orders
- .describe: returns a Series of descriptive statistics on a pandas Series.
  - The information it returns depends on the data type of the elements in the Series.
Other descriptive statistics methods:
- count: number of non-na observations
- sum: sum of values
- mean: mean of values
- median: arithmetic median of values
- min: minimum value
- max: maximum value
- mode: most occurant value
- abs: Absolute Value
- std: bessel-corrected sample standard deviation
- quantile: sample quanitle (value at %)

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.gitignore		.gitignore
README.md		README.md
advanced_dataframes.ipynb		advanced_dataframes.ipynb
big_o_notation.ipynb		big_o_notation.ipynb
dataframes.ipynb		dataframes.ipynb
doggy_stylizations.ipynb		doggy_stylizations.ipynb
matplotlib-exercises.ipynb		matplotlib-exercises.ipynb
numpy_exercises.ipynb		numpy_exercises.ipynb
pandas_series.ipynb		pandas_series.ipynb
seaborn_exercises.ipynb		seaborn_exercises.ipynb
series_tutorial.ipynb		series_tutorial.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Visualizing Data with Seaborn

Experimenting with Numpy and Pandas

Elaborating on the Subject Matter

Numpy

Pandas

Series:

Further Reading:

About

Uh oh!

Releases

Packages

Languages

nicholas-dougherty/numpy-pandas-visualization-exercises

Folders and files

Latest commit

History

Repository files navigation

Visualizing Data with Seaborn

Experimenting with Numpy and Pandas

Elaborating on the Subject Matter

Numpy

Pandas

Series:

Further Reading:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages