You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<h2 id="Anatomy-of-a-DataFrame">Anatomy of a DataFrame<a class="anchor-link" href="#Anatomy-of-a-DataFrame">¶</a></h2><p>A <strong>DataFrame</strong> is composed of one or more <strong>Series</strong>. The names of the <strong>Series</strong> form the column names, and the row labels form the <strong>Index</strong>.</p>
14727
14727
14728
14728
</div>
@@ -14887,7 +14887,7 @@ <h2 id="Anatomy-of-a-DataFrame">Anatomy of a DataFrame<a class="anchor-link" hre
<h2 id="Creating-DataFrames">Creating DataFrames<a class="anchor-link" href="#Creating-DataFrames">¶</a></h2><p>We can create DataFrames from a variety of sources such as other Python objects, flat files, webscraping, and API requests. Here, we will see just a couple of examples, but be sure to check out <a href="https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html">this page</a> in the documentation for a complete list.</p>
<h3 id="Using-data-from-an-API">Using data from an API<a class="anchor-link" href="#Using-data-from-an-API">¶</a></h3><p>Collect the data from <a href="https://data.nasa.gov/Space-Science/Meteorite-Landings/gh4g-9sfh">NASA's Open Data Portal</a> using the Socrata Open Data API (SODA) with the <code>requests</code> library:</p>
15114
15114
15115
15115
</div>
@@ -15148,7 +15148,7 @@ <h3 id="Using-data-from-an-API">Using data from an API<a class="anchor-link" hre
<h2 id="Inspecting-the-data">Inspecting the data<a class="anchor-link" href="#Inspecting-the-data">¶</a></h2><p>Now that we have some data, we need to perform an initial inspection of it. This gives us information on what the data looks like, how many rows/columns there are, and how much data we have.</p>
15318
15318
15319
15319
</div>
@@ -15337,7 +15337,7 @@ <h2 id="Inspecting-the-data">Inspecting the data<a class="anchor-link" href="#In
<h4 id="How-many-rows-and-columns-are-there?">How many rows and columns are there?<a class="anchor-link" href="#How-many-rows-and-columns-are-there?">¶</a></h4>
15342
15342
</div>
15343
15343
</div>
@@ -15443,7 +15443,7 @@ <h4 id="What-are-the-column-names?">What are the column names?<a class="anchor-l
<h4 id="What-type-of-data-does-each-column-currently-hold?">What type of data does each column currently hold?<a class="anchor-link" href="#What-type-of-data-does-each-column-currently-hold?">¶</a></h4>
15448
15448
</div>
15449
15449
</div>
@@ -15505,7 +15505,7 @@ <h4 id="What-type-of-data-does-each-column-currently-hold?">What type of data do
<h4 id="Get-some-information-about-the-DataFrame">Get some information about the DataFrame<a class="anchor-link" href="#Get-some-information-about-the-DataFrame">¶</a></h4>
<h2 id="Extracting-subsets">Extracting subsets<a class="anchor-link" href="#Extracting-subsets">¶</a></h2><p>A crucial part of working with DataFrames is extracting subsets of the data: finding rows that meet a certain set of criteria, isolating columns/rows of interest, etc. After narrowing down our data, we are closer to discovering insights. This section will be the backbone of many analysis tasks.</p>
<h4 id="Selecting-columns">Selecting columns<a class="anchor-link" href="#Selecting-columns">¶</a></h4><p>We can select columns as attributes if their names would be valid Python variables:</p>
<h4 id="Indexing">Indexing<a class="anchor-link" href="#Indexing">¶</a></h4><p>We use <code>iloc[]</code> to select rows and columns by their position:</p>
<h4 id="Filtering-with-Boolean-masks">Filtering with Boolean masks<a class="anchor-link" href="#Filtering-with-Boolean-masks">¶</a></h4><p>A <strong>Boolean mask</strong> is a array-like structure of Boolean values – it's a way to specify which rows/columns we want to select (<code>True</code>) and which we don't (<code>False</code>).</p>
16741
16741
16742
16742
</div>
@@ -16986,7 +16986,7 @@ <h4 id="Filtering-with-Boolean-masks">Filtering with Boolean masks<a class="anch
<h2 id="Calculating-summary-statistics">Calculating summary statistics<a class="anchor-link" href="#Calculating-summary-statistics">¶</a></h2><p>In the next section of this workshop, we will discuss data cleaning for a more meaningful analysis of our datasets; however, we can already extract some interesting insights from the <code>meteorites</code> data by calculating summary statistics.</p>
<h4 id="How-many-of-the-meteorites-were-found-versus-observed-falling?">How many of the meteorites were found versus observed falling?<a class="anchor-link" href="#How-many-of-the-meteorites-were-found-versus-observed-falling?">¶</a></h4>
17183
17183
</div>
17184
17184
</div>
@@ -17244,7 +17244,7 @@ <h4 id="How-many-of-the-meteorites-were-found-versus-observed-falling?">How many
<h4 id="What-was-the-mass-of-the-average-meterorite?">What was the mass of the average meterorite?<a class="anchor-link" href="#What-was-the-mass-of-the-average-meterorite?">¶</a></h4>
17249
17249
</div>
17250
17250
</div>
@@ -17353,7 +17353,7 @@ <h4 id="What-was-the-mass-of-the-average-meterorite?">What was the mass of the a
<h4 id="What-was-the-mass-of-the-heaviest-meteorite?">What was the mass of the heaviest meteorite?<a class="anchor-link" href="#What-was-the-mass-of-the-heaviest-meteorite?">¶</a></h4>
17358
17358
</div>
17359
17359
</div>
@@ -17483,7 +17483,7 @@ <h4 id="What-was-the-mass-of-the-heaviest-meteorite?">What was the mass of the h
<h4 id="How-many-different-types-of-meteorite-classes-are-represented-in-this-dataset?">How many different types of meteorite classes are represented in this dataset?<a class="anchor-link" href="#How-many-different-types-of-meteorite-classes-are-represented-in-this-dataset?">¶</a></h4>
<h4 id="Get-some-summary-statistics-on-the-data-itself">Get some summary statistics on the data itself<a class="anchor-link" href="#Get-some-summary-statistics-on-the-data-itself">¶</a></h4><p>We can get common summary statistics for all columns at once. By default, this will only be numeric columns, but here, we will summarize everything together:</p>
0 commit comments