Skip to content

Commit d6dfd90

Browse files
authored
Merge pull request #157 from vinisalazar/patch-1
Episode 11: Adding links and improving data download instructions.
2 parents a084459 + 5d5706a commit d6dfd90

File tree

1 file changed

+24
-11
lines changed

1 file changed

+24
-11
lines changed

_episodes/11-joins.md

Lines changed: 24 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -23,13 +23,22 @@ keypoints:
2323

2424
There are many occasions when we have related data spread across multiple files.
2525

26-
The data can be related to each other in different ways. How they are related and how completely we can join the data from the datasets will vary.
26+
The data can be related to each other in different ways. How they are related and how completely we can join the data
27+
from the datasets will vary.
2728

28-
In this episode we will consider different scenarios and show we might join the data. We will use csv files and in all cases the first step will be to read the datasets into a pandas Dataframe from where we will do the joining. The csv files we are using are cut down versions of the SN7577 dataset to make the displays more manageable.
29+
In this episode we will consider different scenarios and show we might join the data. We will use csv files and in all
30+
cases the first step will be to read the datasets into a pandas Dataframe from where we will do the joining. The csv
31+
files we are using are cut down versions of the SN7577 dataset to make the displays more manageable.
32+
33+
First, let's download the datafiles. They are listed in the [setup page][setup-page] for the lesson. Alternatively,
34+
you can download the [GitHub repository for this lesson][gh-repo]. The data files are in the
35+
*data* directory. If you're using Jupyter, make sure to place these files in the same directory where your notebook
36+
file is.
2937

3038
### Scenario 1 - Two data sets containing the same columns but different rows of data
3139

32-
Here we want to add the rows from one Dataframe to the rows of the other Dataframe. In order to do this we can use the `concat()` function.
40+
Here we want to add the rows from one Dataframe to the rows of the other Dataframe.
41+
In order to do this we can use the `pd.concat()` function.
3342

3443
~~~
3544
import pandas as pd
@@ -114,6 +123,15 @@ We can join columns from two Dataframes using the `merge()` function. This is si
114123

115124
A detailed discussion of different join types is given in the [SQL lesson](./episodes/sql...).
116125

126+
You specify the type of join you want using the `how` parameter. The default is the `inner` join which returns the columns from both tables where the `key` or common column values match in both Dataframes.
127+
128+
The possible values of the `how` parameter are shown in the picture below (taken from the Pandas documentation)
129+
130+
![pandas_join_types](../fig/pandas_join_types.png)
131+
132+
The different join types behave in the same way as they do in SQL. In Python/pandas, any missing values are shown as `NaN`
133+
134+
117135
In order to `merge` the Dataframes we need to identify a column common to both of them.
118136

119137
~~~
@@ -138,14 +156,6 @@ df_cd = pd.merge(df_SN7577i_c, df_SN7577i_d, how='inner', left_on = 'Id', right_
138156
~~~
139157
{: .language-python}
140158

141-
You specify the type of join you want using the `how` parameter. The default is the `inner` join which returns the columns from both tables where the `key` or common column values match in both Dataframes.
142-
143-
The possible values of the `how` parameter are shown in the picture below (taken from the Pandas documentation)
144-
145-
![pandas_join_types](../fig/pandas_join_types.png)
146-
147-
The different join types behave in the same way as they do in SQL. In Python/pandas, any missing values are shown as `NaN`
148-
149159

150160
> ## Exercises
151161
>
@@ -175,3 +185,6 @@ The different join types behave in the same way as they do in SQL. In Python/pan
175185
> > {: .language-python}
176186
> {: .solution}
177187
{: .challenge}
188+
189+
[gh-repo]: https://github.com/datacarpentry/python-socialsci/archive/gh-pages.zip
190+
[setup-page]: https://datacarpentry.org/python-socialsci/setup.html

0 commit comments

Comments
 (0)