Skip to content

Commit 73a64d4

Browse files
committed
Episode 11: adding links and improving instructions for data download. Code review requested in #127.
1 parent a084459 commit 73a64d4

File tree

1 file changed

+20
-3
lines changed

1 file changed

+20
-3
lines changed

_episodes/11-joins.md

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,13 +23,27 @@ keypoints:
2323

2424
There are many occasions when we have related data spread across multiple files.
2525

26-
The data can be related to each other in different ways. How they are related and how completely we can join the data from the datasets will vary.
26+
The data can be related to each other in different ways. How they are related and how completely we can join the data
27+
from the datasets will vary.
2728

28-
In this episode we will consider different scenarios and show we might join the data. We will use csv files and in all cases the first step will be to read the datasets into a pandas Dataframe from where we will do the joining. The csv files we are using are cut down versions of the SN7577 dataset to make the displays more manageable.
29+
In this episode we will consider different scenarios and show we might join the data. We will use csv files and in all
30+
cases the first step will be to read the datasets into a pandas Dataframe from where we will do the joining. The csv
31+
files we are using are cut down versions of the SN7577 dataset to make the displays more manageable.
32+
33+
There are a few ways to merge files. In database lingo, a merge operation is called a `JOIN`. Some of these are
34+
shown in the table below.
35+
36+
![pandas_join_types](../fig/pandas_join_types.png)
37+
38+
First, let's download the datafiles. They are listed in the [setup page][setup-page] for the lesson. Alternatively,
39+
you can download the [GitHub repository for this lesson][gh-repo]. The data files are in the
40+
*data* directory. If you're using Jupyter, make sure to place these files in the same directory where your notebook
41+
file is.
2942

3043
### Scenario 1 - Two data sets containing the same columns but different rows of data
3144

32-
Here we want to add the rows from one Dataframe to the rows of the other Dataframe. In order to do this we can use the `concat()` function.
45+
Here we want to add the rows from one Dataframe to the rows of the other Dataframe.
46+
In order to do this we can use the `pd.concat()` function.
3347

3448
~~~
3549
import pandas as pd
@@ -175,3 +189,6 @@ The different join types behave in the same way as they do in SQL. In Python/pan
175189
> > {: .language-python}
176190
> {: .solution}
177191
{: .challenge}
192+
193+
[gh-repo]: https://github.com/datacarpentry/python-socialsci/archive/gh-pages.zip
194+
[setup-page]: https://datacarpentry.org/python-socialsci/setup.html

0 commit comments

Comments
 (0)