You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- "Graphs can be drawn directly from Pandas, but it still uses Matplotlib"
12
13
- "Different graph types have different data requirements"
13
14
- "Graphs are created from a variety of discrete components placed on a 'canvas', you don't have to use them all"
14
-
- "Plotting multiple graphs on a single 'canvas' is possible"
15
15
---
16
16
17
17
## Plotting in Python
18
18
19
-
There are a wide variety of ways to plot in Python, like many programming languages. Some do more of the design work for you and others let you customize the look of the plots and all of the little details yourself. `Pandas` has basic plots built into it that reduce the amount of syntax, if your data is already in a DataFrame.
20
-
Matplotlib is a Python graphical library that can be used to produce a variety of different graph types, it is fully controllable down to basic elements and includes a module `pylab` that is somewhere in between (designed to feel like matlab plotting, if you happen to have done that before).
21
-
22
-
23
-
The [Pandas][pandas-web] library contains very tight integration with [Matplotlib][matplotlib-web].
24
-
There are functions in Pandas that automatically call Matplotlib functions to produce graphs.
19
+
There is a wide variety of ways to plot in Python, like many programming languages.
20
+
Some do more of the design work for you and others let you customize the look of the plots and all of the little details yourself.
21
+
[Pandas][pandas-web] has basic plots built into it that reduce the amount of syntax, if your data is already in a DataFrame.
22
+
[Matplotlib][matplotlib-web]. is a Python graphical library that can be used to produce a variety of different graph types,
23
+
it is fully controllable down to basic elements and includes a module `pylab` that is somewhere in between
24
+
(designed to feel like MATLAB plotting, if you happen to have done that before).
25
25
26
26
The Matplotlib library can be imported using any of the import techniques we have seen. As Pandas is generally imported
27
27
with `import Pandas as pd`, you will find that `matplotlib` is most commonly imported with `import matplotlib as plt` where 'plt' is the alias.
@@ -35,7 +35,18 @@ and advanced plot types. One of its most useful features is formatting.
35
35
36
36
## Plotting with Pandas
37
37
38
-
To plot with Pandas we have to import it as we have done in past episodes. We can also use the `%matplotlib inline` notebook magic to reduce syntax otherwise. Without that we need to a `show()` command
38
+
The `pandas` library contains very tight integration with `matplotlib`. There are functions in `pandas` that
39
+
automatically call `matplotlib` functions to produce graphs.
40
+
41
+
Other graphical libraries available from within Python are for example `plotnine` (a ggplot2 realisation for python)
42
+
and `seaborn`. [Seaborn](https://seaborn.pydata.org) has some very powerful features and advanced plot types.
43
+
One of its most useful features is formatting.
44
+
45
+
## Plotting with Pandas
46
+
47
+
To plot with `pandas` we have to import it as we have done in past episodes.
48
+
To tell Jupyter that when we produce a graph we want it to be displayed in a cell in the notebook just like any other results,
49
+
we use the `%matplotlib inline` directive. Without that we need to do a `show()` command.
39
50
40
51
~~~
41
52
import pandas as pd
@@ -58,7 +69,7 @@ df['years_liv'].hist()
58
69
~~~
59
70
{: .language-python}
60
71
61
-

72
+

62
73
63
74
64
75
We can change the number of bins to make it look how we would like, for example
@@ -68,32 +79,34 @@ df['years_liv'].hist(bins=20)
68
79
~~~
69
80
{: .language-python}
70
81
71
-
We can also specify the column as a parameter and a groupby column with the `by` keyword. there are a lot of keywords available to make it look better, we can see some of the most likely ones (as decided by Pandas developers) by using <kbd>shift</kbd> + <kbd>tab<kbd>. Lets try `layout`, `figsize`, and `sharex`.
82
+
We can also specify the column as a parameter and a groupby column with the `by` keyword.
83
+
there are a lot of keywords available to make it look better, we can see some of the most likely ones
84
+
(as decided by Pandas developers) by using <kbd>shift</kbd> + <kbd>tab<kbd> .
The scatter plot requires the x and y coordinates of each of the points being plotted.
81
-
To provide this we will generate two series of random data one for the x coordinates and the other for the y coordinates
82
95
83
-
We will generate two sets of points and plot them on the same graph.
96
+
## Scatter plot
84
97
85
-
We will also add other common features like a title, a legend and labels on the x and y axis.
98
+
The scatter plot requires the x and y coordinates of each of the points being plotted. We can add a third dimension as different colors with the `c` argument.
We can make it look prettier with `seaborn`, much more easily than fixing components manually with `matplotlib`. [`Seaborn`](https://seaborn.pydata.org) is a Python data visualization library based on `matplotlib`. It provides a high-level interface for drawing attractive and informative statistical graphics. `Seaborn` comes with Anaconda; to make it available in our python session we need to import it.
170
+
171
+
~~~
172
+
import seaborn as sns
173
+
sns.boxplot(data = df, x = 'village', y = 'buildings_in_compound')
174
+
~~~
135
175
{: .language-python}
136
176
177
+

178
+
179
+
We can also draw linear models in a plot using `lmplot()` from `seaborn`, e.g. for `years_farm` vs `years_liv` per `village`.
In general, most graphs can be broken down into a series of elements which, although typically related in some way, can all exist independently of each other. This allows us to create the graph in a rather piecemeal fashion.
187
+
In general, most graphs can be broken down into a series of elements which, although typically related in some way,
188
+
can all exist independently of each other. This allows us to create the graph in a rather piecemeal fashion.
144
189
145
-
The labels (if any) on the x and y axis are independent of the data values being represented. The title and the legend are also independent objects within the overall graph.
190
+
The labels (if any) on the x and y axis are independent of the data values being represented. The title and the legend
191
+
are also independent objects within the overall graph.
146
192
147
-
In Matplotlib you create the graph by providing values for all of the individual components you choose to include. When you are ready, you call the `show` function.
193
+
In Matplotlib you create the graph by providing values for all of the individual components you choose to include.
194
+
When you are ready, you call the `show` function.
148
195
149
196
Using this same approach, we can plot two sets of data on the same graph.
150
197
151
198
We will use a scatter plot to demonstrate some of the available features.
152
199
153
200
## Fine-tuning figures with Matplotlib
154
201
155
-
If we want to do more advanced or lower level things with our plots, we need to use Matplotlib directly, not through Pandas. First we need to import it.
156
-
202
+
If we want to do more advanced or lower level things with our plots, we need to use Matplotlib directly,
203
+
not through Pandas. First we need to import it.
157
204
158
-
The Matplotlib library can be imported using any of the import techniques we have seen. As `pandas` is generally imported with `import pandas as pd`, you will find that `matplotlib` is most commonly imported with `import matplotlib.pylab as plt` where 'plt' is the alias.
159
205
160
-
In addition to importing the library, in a Jupyter notebook environment we need to tell Jupyter that when we produce a graph we want it to be display the graph in a cell in the notebook just like any other results. To do this we use the `%matplotlib inline` directive.
206
+
## Customising our plots with Matplotlib
161
207
162
-
If you forget to do this, you will have to add `plt.show()` to see the graphs.
208
+
We can further customise our plots with `matplotlib` directly. First we need to import it.
209
+
The `matplotlib` library can be imported using any of the import techniques we have seen. As `pandas` is generally imported with `import pandas as pd`, you will find that `matplotlib` is most commonly imported with `import matplotlib.pylab as plt` where `plt` is the alias.
163
210
164
211
~~~
165
212
# Generate some date for 2 sets of points.
@@ -231,32 +278,49 @@ Internally the Pandas 'plot' method has called the 'bar' method of Matplotlib an
231
278
232
279
We can use Matplotlib directly to produce a similar graph. In this case we need to pass two parameters, the number of bars we need and the Pandas Series holding the values.
233
280
234
-
We also have to explicitly call the `show()` function to produce the graph.
281
+
Let's redo the boxplot we did above:
235
282
236
-
## Saving Plots
283
+
~~~
284
+
df.boxplot(column = 'buildings_in_compound', by = 'village')
285
+
~~~
286
+
{: .language-python}
287
+
288
+

289
+
290
+
The automatic title of the plot does not look good, we are missing a title for the y-axis and we do not need the extra x-axis title. We can also remove the gridlines. Let's fix these things using functions from `plt`. Note: all the adjustments for the plot have to go into the same notebook cell together with the plot statement itself.
df.boxplot(column = 'buildings_in_compound', by = 'village')
294
+
plt.suptitle('') # remove the automatic title
295
+
plt.title('Buildings in compounds per village') # add a title
296
+
plt.ylabel('Number of buildings') # add a y-axis title
297
+
plt.xlabel('') # remove the x-axis title
298
+
plt.grid(None) # remove the grid lines
242
299
~~~
243
300
{: .language-python}
244
301
302
+

245
303
246
-
For the Histogram, each data point is allocated to 1 of 10 (by default) equal 'bins' of equal size (range of numbers) which are indicated along the x axis and the number of points (frequency) is shown on the y axis.
304
+
In general most graphs can be broken down into a series of elements which, although typically related in some way, can all exist independently of each other. This allows us to create the graph in a rather piecemeal fashion.
305
+
The labels (if any) on the x and y axis are independent of the data values being represented. The title and the legend are also independent objects within the overall graph.
306
+
In `matplotlib` you create the graph by providing values for all of the individual components you choose to include.
247
307
248
-
In this case the graphs are almost identical. The only difference being in the first graph the y axis has a label 'Frequency' associated with it.
308
+
## Saving a graph
249
309
250
-
We can fix this with a call to the `ylabel` function
310
+
If you wish to save your graph as an image you can do so using the `plt.savefig()` function. The image can be saved as a pdf, jpg or png file by changing the file extension. `plt.savefig()` needs to be called at the end of all your plot statements in the same notebook cell.
251
311
252
312
~~~
253
-
plt.ylabel('Frequency')
254
-
plt.hist(s)
255
-
plt.show()
313
+
df.boxplot(column = 'buildings_in_compound', by = 'village')
314
+
plt.suptitle('') # remove the automatic title
315
+
plt.title('Buildings in compounds per village') # add a title
316
+
plt.ylabel('Number of buildings') # add a y-axis title
317
+
plt.xlabel('') # remove the x-axis title
318
+
plt.grid(None) # remove the grid lines
319
+
plt.savefig('safi_boxplot_buildings.pdf') # save as pdf file
320
+
plt.savefig('safi_boxplot_buildings.png', dpi = 150) # save as png file, some extra arguments are provided
256
321
~~~
257
322
{: .language-python}
258
323
259
-
260
324
In general most graphs can be broken down into a series of elements which, although typically related in some way, can
261
325
all exist independently of each other. This allows us to create the graph in a rather piecemeal fashion.
262
326
@@ -278,11 +342,10 @@ demonstrate some of the available features.
278
342
> 3. add a legend
279
343
> 4. save it in two different formats
280
344
>
281
-
> extension: try plotting by by wall and roof type?
345
+
> extension: try plotting by wall and roof type!
282
346
>
283
347
{: .challenge}
284
348
285
-
286
349
## Saving a graph
287
350
288
351
If you wish to save your graph as an image you can do so using the `savefig()` function. The image can be saved as a pdf, jpg or png file by changing the file extension.
0 commit comments