You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _episodes/11-joins.md
+24-11Lines changed: 24 additions & 11 deletions
Original file line number
Diff line number
Diff line change
@@ -23,13 +23,22 @@ keypoints:
23
23
24
24
There are many occasions when we have related data spread across multiple files.
25
25
26
-
The data can be related to each other in different ways. How they are related and how completely we can join the data from the datasets will vary.
26
+
The data can be related to each other in different ways. How they are related and how completely we can join the data
27
+
from the datasets will vary.
27
28
28
-
In this episode we will consider different scenarios and show we might join the data. We will use csv files and in all cases the first step will be to read the datasets into a pandas Dataframe from where we will do the joining. The csv files we are using are cut down versions of the SN7577 dataset to make the displays more manageable.
29
+
In this episode we will consider different scenarios and show we might join the data. We will use csv files and in all
30
+
cases the first step will be to read the datasets into a pandas Dataframe from where we will do the joining. The csv
31
+
files we are using are cut down versions of the SN7577 dataset to make the displays more manageable.
32
+
33
+
First, let's download the datafiles. They are listed in the [setup page][setup-page] for the lesson. Alternatively,
34
+
you can download the [GitHub repository for this lesson][gh-repo]. The data files are in the
35
+
*data* directory. If you're using Jupyter, make sure to place these files in the same directory where your notebook
36
+
file is.
29
37
30
38
### Scenario 1 - Two data sets containing the same columns but different rows of data
31
39
32
-
Here we want to add the rows from one Dataframe to the rows of the other Dataframe. In order to do this we can use the `concat()` function.
40
+
Here we want to add the rows from one Dataframe to the rows of the other Dataframe.
41
+
In order to do this we can use the `pd.concat()` function.
33
42
34
43
~~~
35
44
import pandas as pd
@@ -114,6 +123,15 @@ We can join columns from two Dataframes using the `merge()` function. This is si
114
123
115
124
A detailed discussion of different join types is given in the [SQL lesson](./episodes/sql...).
116
125
126
+
You specify the type of join you want using the `how` parameter. The default is the `inner` join which returns the columns from both tables where the `key` or common column values match in both Dataframes.
127
+
128
+
The possible values of the `how` parameter are shown in the picture below (taken from the Pandas documentation)
You specify the type of join you want using the `how` parameter. The default is the `inner` join which returns the columns from both tables where the `key` or common column values match in both Dataframes.
142
-
143
-
The possible values of the `how` parameter are shown in the picture below (taken from the Pandas documentation)
0 commit comments