Skip to content

Commit 85334c5

Browse files
authored
Merge pull request #150 from datacarpentry/annajiat-patch-1
Update 13-matplotlib.md
2 parents c327d51 + f750532 commit 85334c5

File tree

1 file changed

+28
-30
lines changed

1 file changed

+28
-30
lines changed

_episodes/13-matplotlib.md

Lines changed: 28 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -24,9 +24,9 @@ Although we are using Matplotlib in this episode, pandas can make use of several
2424

2525
## Importing matplotlib
2626

27-
The matplotlib library can be imported using any of the import techniques we have seen. As `pandas` is generally imported with `import panas as pd`, you will find that `matplotlib` is most commonly imported with `import matplotlib as plt` where 'plt' is the alias.
27+
The matplotlib library can be imported using any of the import techniques we have seen. As `pandas` is generally imported with `import pandas as pd`, you will find that `matplotlib` is most commonly imported with `import matplotlib as plt` where 'plt' is the alias.
2828

29-
In addition to importing the library, in a Jupyter notebook environment we need to tell Jupyter that when we produce a graph we want it to be display the graph in a cell in the notebook just like any other results. To do this we use the `%matplotlib inline` directive.
29+
In addition to importing the library, in a Jupyter notebook environment we need to tell Jupyter that when we produce a graph, we want it to be display the graph in a cell in the notebook just like any other results. To do this we use the `%matplotlib inline` directive.
3030

3131
If you forget to do this, you will have to add `plt.show()` to see the graphs.
3232

@@ -64,7 +64,7 @@ import numpy as np
6464
import pandas as pd
6565
6666
np.random.seed(12345) # set a seed value to ensure reproducibility of the plots
67-
s = pd.Series(np.random.rand(20) )
67+
s = pd.Series(np.random.rand(20))
6868
#s
6969
# plot the bar chart
7070
s.plot(kind='bar')
@@ -94,7 +94,7 @@ plt.show()
9494
> > The width of the bars can be changed with a parameter in the 'bar' function
9595
> >
9696
> > ~~~
97-
> > plt.bar(range ( len ( s )), s, width = 0.5) # the default width is 0.8
97+
> > plt.bar(range(len(s)), s, width = 0.5) # the default width is 0.8
9898
> > ~~~
9999
> > {: .language-python}
100100
> {: .solution}
@@ -134,13 +134,13 @@ plt.show()
134134
~~~
135135
{: .language-python}
136136
137-
In general most graphs can be broken down into a series of elements which, although typically related in some way, can all exist independently of each other. This allows us to create the graph in a rather piecemeal fashion.
137+
In general, most graphs can be broken down into a series of elements which, although typically related in some way, can all exist independently of each other. This allows us to create the graph in a rather piecemeal fashion.
138138
139139
The labels (if any) on the x and y axis are independent of the data values being represented. The title and the legend are also independent objects within the overall graph.
140140
141141
In matplotlib you create the graph by providing values for all of the individual components you choose to include. When you are ready, you call the `show` function.
142142
143-
Using this same approach we can plot two sets of data on the same graph
143+
Using this same approach, we can plot two sets of data on the same graph.
144144
145145
We will use a scatter plot to demonstrate some of the available features.
146146
@@ -158,11 +158,11 @@ We will also add other common features like a title, a legend and labels on the
158158
159159
~~~
160160
# Generate some date for 2 sets of points.
161-
x1 = pd.Series(np.random.rand(20) - 0.5 )
162-
y1 = pd.Series(np.random.rand(20) - 0.5 )
161+
x1 = pd.Series(np.random.rand(20) - 0.5)
162+
y1 = pd.Series(np.random.rand(20) - 0.5)
163163

164-
x2 = pd.Series(np.random.rand(20) + 0.5 )
165-
y2 = pd.Series(np.random.rand(20) + 0.5 )
164+
x2 = pd.Series(np.random.rand(20) + 0.5)
165+
y2 = pd.Series(np.random.rand(20) + 0.5)
166166

167167

168168
# Add some features
@@ -171,10 +171,10 @@ plt.ylabel('Range of y values')
171171
plt.xlabel('Range of x values')
172172

173173
# plot the points in a scatter plot
174-
plt.scatter(x1,y1, c='red', label='Red Range' ) # 'c' parameter is the colour and 'label' is the text for the legend
175-
plt.scatter(x2,y2, c='blue', label='Blue Range')
174+
plt.scatter(x1, y1, c='red', label='Red Range') # 'c' parameter is the colour and 'label' is the text for the legend
175+
plt.scatter(x2, y2, c='blue', label='Blue Range')
176176

177-
plt.legend( loc=4 ) # the locations 1,2,3 and 4 are top-right, top-left, bottom-left and bottom-right
177+
plt.legend(loc=4) # the locations 1,2,3 and 4 are top-right, top-left, bottom-left and bottom-right
178178
# Show the graph with the two sets of points
179179
plt.show()
180180
~~~
@@ -185,36 +185,36 @@ The `c` or `color` parameter can be set to any color matplotlib recognises. Full
185185
186186
> ## Exercise
187187
>
188-
> In the scatterplot the s parameter determines the size of the dots. s can be a simple numeric value, say s=100, which will produce dots all of the same size. However you can pass a list of values (or a pandas Series) to provide sizes for the individual dots. This approach is very common as it allows us to provide an extra variable worth of information on the graph.
188+
> In the scatterplot the s parameter determines the size of the dots. s can be a simple numeric value, say s=100, which will produce dots all of the same size. However, you can pass a list of values (or a pandas Series) to provide sizes for the individual dots. This approach is very common as it allows us to provide an extra variable worth of information on the graph.
189189
>
190190
> 1. Modify the code we used for the scatter plot to include a size value for each of the points in the series being plotted.
191-
> (The downside is that some of the smaller dots may be completely covered by the larger dots. To try and highlight when this has happened we can change the opacity of the dots.)
191+
> (The downside is that some of the smaller dots may be completely covered by the larger dots. To try and highlight when this has happened, we can change the opacity of the dots.)
192192
>
193-
> 2. Find out which parameter controls the opacity of the dots ( clue - it is not called opacity), add it to you code and set it > to a reasonable value .
193+
> 2. Find out which parameter controls the opacity of the dots (clue - it is not called opacity), add it to you code and set it > to a reasonable value.
194194
>
195195
> > ## Solution
196196
> >
197197
> > ~~~
198198
> > # Generate some data for 2 sets of points.
199199
> > # and additional data for the sizes - suitably scaled
200-
> > x1 = pd.Series(np.random.rand(20) - 0.5 )
201-
> > y1 = pd.Series(np.random.rand(20) - 0.5 )
202-
> > z1 = pd.Series(np.random.rand(20)*200 )
200+
> > x1 = pd.Series(np.random.rand(20) - 0.5)
201+
> > y1 = pd.Series(np.random.rand(20) - 0.5)
202+
> > z1 = pd.Series(np.random.rand(20) * 200)
203203
> >
204-
> > x2 = pd.Series(np.random.rand(20) + 0.5 )
205-
> > y2 = pd.Series(np.random.rand(20) + 0.5 )
206-
> > z2 = pd.Series(np.random.rand(20)*200 )
204+
> > x2 = pd.Series(np.random.rand(20) + 0.5)
205+
> > y2 = pd.Series(np.random.rand(20) + 0.5)
206+
> > z2 = pd.Series(np.random.rand(20) * 200)
207207
> >
208208
> > # Add some features
209209
> > plt.title('Scatter Plot')
210210
> > plt.ylabel('Range of y values')
211211
> > plt.xlabel('Range of x values')
212212
> >
213213
> > # plot the points in a scatter plot
214-
> > plt.scatter(x1,y1, c='red', label='Red Range', s=z1, alpha=0.5 ) # 's' parameter is the dot size
215-
> > plt.scatter(x2,y2, c='blue', label='Blue Range', s=z2, alpha=0.5) # 'alpha' is the opacity
214+
> > plt.scatter(x1, y1, c='red', label='Red Range', s=z1, alpha=0.5) # 's' parameter is the dot size
215+
> > plt.scatter(x2, y2, c='blue', label='Blue Range', s=z2, alpha=0.5) # 'alpha' is the opacity
216216
> >
217-
> > plt.legend( loc=4 )
217+
> > plt.legend(loc=4)
218218
> > plt.show()
219219
> > ~~~
220220
> > {: .language-python}
@@ -271,18 +271,16 @@ plt.show()
271271
~~~
272272
{: .language-python}
273273
274-
will fail.
275-
276-
However we can use the pandas plot method
274+
will fail. However, we can use the pandas plot method.
277275
278-
~~~
276+
~~~,
279277
df = pd.DataFrame(np.random.normal(size=(100,5)), columns=list('ABCDE'))
280278
df.plot(kind = 'box', return_type='axes') # the return_type='axes' is only needed for forward compatibility
281279
~~~
282280
{: .language-python}
283281

284282
We can add a title to the above by adding the `title` parameter. However there are no parameters for adding the axis labels.
285-
To add labels we can use matplotlib directly.
283+
To add labels, we can use matplotlib directly.
286284

287285
~~~
288286
df = pd.DataFrame(np.random.normal(size=(100,5)), columns=list('ABCDE'))

0 commit comments

Comments
 (0)