You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _episodes/13-matplotlib.md
+28-30Lines changed: 28 additions & 30 deletions
Original file line number
Diff line number
Diff line change
@@ -24,9 +24,9 @@ Although we are using Matplotlib in this episode, pandas can make use of several
24
24
25
25
## Importing matplotlib
26
26
27
-
The matplotlib library can be imported using any of the import techniques we have seen. As `pandas` is generally imported with `import panas as pd`, you will find that `matplotlib` is most commonly imported with `import matplotlib as plt` where 'plt' is the alias.
27
+
The matplotlib library can be imported using any of the import techniques we have seen. As `pandas` is generally imported with `import pandas as pd`, you will find that `matplotlib` is most commonly imported with `import matplotlib as plt` where 'plt' is the alias.
28
28
29
-
In addition to importing the library, in a Jupyter notebook environment we need to tell Jupyter that when we produce a graph we want it to be display the graph in a cell in the notebook just like any other results. To do this we use the `%matplotlib inline` directive.
29
+
In addition to importing the library, in a Jupyter notebook environment we need to tell Jupyter that when we produce a graph, we want it to be display the graph in a cell in the notebook just like any other results. To do this we use the `%matplotlib inline` directive.
30
30
31
31
If you forget to do this, you will have to add `plt.show()` to see the graphs.
32
32
@@ -64,7 +64,7 @@ import numpy as np
64
64
import pandas as pd
65
65
66
66
np.random.seed(12345) # set a seed value to ensure reproducibility of the plots
67
-
s = pd.Series(np.random.rand(20))
67
+
s = pd.Series(np.random.rand(20))
68
68
#s
69
69
# plot the bar chart
70
70
s.plot(kind='bar')
@@ -94,7 +94,7 @@ plt.show()
94
94
> > The width of the bars can be changed with a parameter in the 'bar' function
95
95
> >
96
96
> > ~~~
97
-
> > plt.bar(range ( len ( s )), s, width = 0.5) # the default width is 0.8
97
+
> > plt.bar(range(len(s)), s, width = 0.5) # the default width is 0.8
98
98
> > ~~~
99
99
> > {: .language-python}
100
100
> {: .solution}
@@ -134,13 +134,13 @@ plt.show()
134
134
~~~
135
135
{: .language-python}
136
136
137
-
In general most graphs can be broken down into a series of elements which, although typically related in some way, can all exist independently of each other. This allows us to create the graph in a rather piecemeal fashion.
137
+
In general, most graphs can be broken down into a series of elements which, although typically related in some way, can all exist independently of each other. This allows us to create the graph in a rather piecemeal fashion.
138
138
139
139
The labels (if any) on the x and y axis are independent of the data values being represented. The title and the legend are also independent objects within the overall graph.
140
140
141
141
In matplotlib you create the graph by providing values for all of the individual components you choose to include. When you are ready, you call the `show` function.
142
142
143
-
Using this same approach we can plot two sets of data on the same graph
143
+
Using this same approach, we can plot two sets of data on the same graph.
144
144
145
145
We will use a scatter plot to demonstrate some of the available features.
146
146
@@ -158,11 +158,11 @@ We will also add other common features like a title, a legend and labels on the
158
158
159
159
~~~
160
160
# Generate some date for 2 sets of points.
161
-
x1 = pd.Series(np.random.rand(20) - 0.5)
162
-
y1 = pd.Series(np.random.rand(20) - 0.5)
161
+
x1 = pd.Series(np.random.rand(20) - 0.5)
162
+
y1 = pd.Series(np.random.rand(20) - 0.5)
163
163
164
-
x2 = pd.Series(np.random.rand(20) + 0.5)
165
-
y2 = pd.Series(np.random.rand(20) + 0.5)
164
+
x2 = pd.Series(np.random.rand(20) + 0.5)
165
+
y2 = pd.Series(np.random.rand(20) + 0.5)
166
166
167
167
168
168
# Add some features
@@ -171,10 +171,10 @@ plt.ylabel('Range of y values')
171
171
plt.xlabel('Range of x values')
172
172
173
173
# plot the points in a scatter plot
174
-
plt.scatter(x1,y1, c='red', label='Red Range') # 'c' parameter is the colour and 'label' is the text for the legend
175
-
plt.scatter(x2,y2, c='blue', label='Blue Range')
174
+
plt.scatter(x1,y1, c='red', label='Red Range') # 'c' parameter is the colour and 'label' is the text for the legend
175
+
plt.scatter(x2,y2, c='blue', label='Blue Range')
176
176
177
-
plt.legend(loc=4) # the locations 1,2,3 and 4 are top-right, top-left, bottom-left and bottom-right
177
+
plt.legend(loc=4) # the locations 1,2,3 and 4 are top-right, top-left, bottom-left and bottom-right
178
178
# Show the graph with the two sets of points
179
179
plt.show()
180
180
~~~
@@ -185,36 +185,36 @@ The `c` or `color` parameter can be set to any color matplotlib recognises. Full
185
185
186
186
> ## Exercise
187
187
>
188
-
> In the scatterplot the s parameter determines the size of the dots. s can be a simple numeric value, say s=100, which will produce dots all of the same size. However you can pass a list of values (or a pandas Series) to provide sizes for the individual dots. This approach is very common as it allows us to provide an extra variable worth of information on the graph.
188
+
> In the scatterplot the s parameter determines the size of the dots. s can be a simple numeric value, say s=100, which will produce dots all of the same size. However, you can pass a list of values (or a pandas Series) to provide sizes for the individual dots. This approach is very common as it allows us to provide an extra variable worth of information on the graph.
189
189
>
190
190
> 1. Modify the code we used for the scatter plot to include a size value for each of the points in the series being plotted.
191
-
> (The downside is that some of the smaller dots may be completely covered by the larger dots. To try and highlight when this has happened we can change the opacity of the dots.)
191
+
> (The downside is that some of the smaller dots may be completely covered by the larger dots. To try and highlight when this has happened, we can change the opacity of the dots.)
192
192
>
193
-
> 2. Find out which parameter controls the opacity of the dots (clue - it is not called opacity), add it to you code and set it > to a reasonable value.
193
+
> 2. Find out which parameter controls the opacity of the dots (clue - it is not called opacity), add it to you code and set it > to a reasonable value.
194
194
>
195
195
> > ## Solution
196
196
> >
197
197
> > ~~~
198
198
> > # Generate some data for 2 sets of points.
199
199
> > # and additional data for the sizes - suitably scaled
200
-
> > x1 = pd.Series(np.random.rand(20) - 0.5)
201
-
> > y1 = pd.Series(np.random.rand(20) - 0.5)
202
-
> > z1 = pd.Series(np.random.rand(20)*200 )
200
+
> > x1 = pd.Series(np.random.rand(20) - 0.5)
201
+
> > y1 = pd.Series(np.random.rand(20) - 0.5)
202
+
> > z1 = pd.Series(np.random.rand(20) * 200)
203
203
> >
204
-
> > x2 = pd.Series(np.random.rand(20) + 0.5)
205
-
> > y2 = pd.Series(np.random.rand(20) + 0.5)
206
-
> > z2 = pd.Series(np.random.rand(20)*200 )
204
+
> > x2 = pd.Series(np.random.rand(20) + 0.5)
205
+
> > y2 = pd.Series(np.random.rand(20) + 0.5)
206
+
> > z2 = pd.Series(np.random.rand(20) * 200)
207
207
> >
208
208
> > # Add some features
209
209
> > plt.title('Scatter Plot')
210
210
> > plt.ylabel('Range of y values')
211
211
> > plt.xlabel('Range of x values')
212
212
> >
213
213
> > # plot the points in a scatter plot
214
-
> > plt.scatter(x1,y1, c='red', label='Red Range', s=z1, alpha=0.5) # 's' parameter is the dot size
215
-
> > plt.scatter(x2,y2, c='blue', label='Blue Range', s=z2, alpha=0.5) # 'alpha' is the opacity
214
+
> > plt.scatter(x1,y1, c='red', label='Red Range', s=z1, alpha=0.5) # 's' parameter is the dot size
215
+
> > plt.scatter(x2,y2, c='blue', label='Blue Range', s=z2, alpha=0.5) # 'alpha' is the opacity
216
216
> >
217
-
> > plt.legend(loc=4)
217
+
> > plt.legend(loc=4)
218
218
> > plt.show()
219
219
> > ~~~
220
220
> > {: .language-python}
@@ -271,18 +271,16 @@ plt.show()
271
271
~~~
272
272
{: .language-python}
273
273
274
-
will fail.
275
-
276
-
However we can use the pandas plot method
274
+
will fail. However, we can use the pandas plot method.
0 commit comments