How To Create Boxplots, Scatterplots, and Histograms in Python Using Matplotlib

Nick (1)
Total time: 20 minutes

Python is a very popular programming language for data visualization. This is largely because of its matplotlib library, which contains a plethora of built-in capabilities for presenting data in a visual manner.

This tutorial will teach you how to create boxplots, scatterplots, and histograms in Python using matplotlib.

Nick McCullum is a Python and JavaScript developer from New Brunswick, Canada. Nick teaches Python, SQL, and JavaScript courses on his website.

Posted in these interests:

python
PRIMARY
66 guides

The data we will need for this tutorial

We will be working with a data set from the UCI Machine Learning Repository in this article, which is a free, open-source catalogue of data sets that you can use to practice your machine learning or data visualization skills.

We will be using the Iris data set from the UCI Machine Learning Repository, which is a data set that is intended to be used for predicting flower characteristics and species. The Iris data set is one of the oldest data sets in the world and is a common example used in data science education.

We'll be importing this dataset into a pandas DataFrame in this tutorial. Because of this, we'll start by importing pandas under the alias `pd` like this:

``import pandas as pd``

Once this is done, you can import the data required for this tutorial with the following statement:

``data = pd.read_json('https://raw.githubusercontent.com/nicholasmccullum/python-visualization/master/iris/iris.json')``

This imports a nicely-formatted version of the Iris data set into our program from a GitHub file that I have uploaded for the public to use.

The Iris data set is a collection of observations from flowers with five features:

• sepalLength
• sepalWidth
• petalLength
• petalWidth
• species

With our data import out of the way, let's move on to learning how to create boxplots in Python using matplotlib!

Boxplots

The first type of chart that we will create is a boxplot. Before doing this, we need to import the matplotlib data visualization library. Specifically, we will be importing the `pyplot` interface from matplotlib under the alias `plt`.

Here's the code to do this:

``import matplotlib.pyplot as plt``

Now that this is done, we need to make a slight change to our data.

Boxplots can only be performed on numerical data, and the `species` column is categorical, not numerical. To fix this, we'll create a new variable called `boxplot_data` that excludes the `species` column:

``boxplot_data = data.drop('species', axis=1)``

After creating the boxplot, you can use the `plt.show()` command to open it in a new window. It will look like this:

That boxplot is not very informative, and is generally not what we'd expect.

Why is this?

It's because the boxplot function created a separate boxplot for each row, instead of a separate boxplot for each column (as desired).

Fortunately, the solution for this problem is quite simple. We just need to call the `transpose` method on `boxplot_data` within the boxplot function:

Scatterplots

Scatterplots can be created in matplotlib using the `plt.scatter` method.

Like boxplots, scatterplots can only be created using numerical data. We will not need to create a separate dataset in this case because the `plt.scatter` method requires us to name specific columns for both the `x` and `y` axes. Because of this, we can work directly with our original `data` DataFrame and pass in particular column names in square brackets.

As an example, here is how you would plot `sepalLength` on the x axis and `sepalWidth` on the y axis using the `plt.scatter` method.

``plt.scatter(x=data['sepalLength'], y=data['sepalWidth'])``

In this particular case, you do not need to actually include the `x=` and `y=` specifications within the method's parameters. The following code generates an identical scatterplot:

``plt.scatter(data['sepalLength'], data['sepalWidth'])``

Let's move on to learning about how to create histograms in Python using matplotlib.

Histogram

Histograms are bar charts that show the frequency of observations across a data distribution. We can create histograms in Python using matplotlib with the `plt.hist` method.

As an example, let's see what the distribution look like within the `petalLength` feature of the Iris data set:

``plt.hist(data['petalLength'])``

As you can see, there seems to be a high degree of concentration for `petalLength` values around 1 and 5.

You can also create histograms that plot multiple features at once. These different features will be identified by different colors within the histogram.

As an example, here's how we would plot every feature from the Iris data set (excluding `species`, since it is non-numerical) in a histogram:

``plt.hist(data.drop('species',axis=1).transpose())``

This is a nice visualization, but it is pretty useless without knowing which colors represent each feature.

To fix this, let's add a legend:

``````plt.hist(data.drop('species',axis=1).transpose())
plt.legend(data.drop('species', axis=1).columns)``````

General guidelines for styling Python plots

So far in this tutorial, we have learned how to create basic boxplots, scatterplots, and histograms in Python using Matplotlib.

We have not discussed how to make these charts visually appealing.

Accordingly, I wanted to conclude this article by discussing some general guidelines for styling your matplotlib visualizations.

Here's the plot we will be using as an example in this last section of this tutorial:

``plt.hist(data['sepalWidth'])``

How to add titles to matplotlib visualizations

You can add a title to a matplotlib visualization with the `plt.title` method.

As an example, here's how you would add the title `A Histogram of Sepal Widths from the Iris Data Set` to our sample histogram:

``````plt.hist(data['sepalWidth'])
plt.title('A Histogram of Sepal Widths from the Iris Data Set')``````

How to label the x-axis in matplotlib visualizations

You can label the x-axis of a matplotlib visualization with the `plt.xlabel` method.

As an example, here's how we could label our x-axis with the title `Sepal Width`:

``````plt.hist(data['sepalWidth'])
plt.title('A Histogram of Sepal Widths from the Iris Data Set')
plt.xlabel('Sepal Width')``````

How to label the y-axis in matplotlib visualizations

Just like with the x-axis, we can label the y-axis of a matplotlib visualization with the `plt.ylabel` method.

Here's how we would add the title `Frequency` to our histogram:

``````plt.hist(data['sepalWidth'])
plt.title('A Histogram of Sepal Widths from the Iris Data Set')
plt.xlabel('Sepal Width')
plt.ylabel('Frequency')``````

How to change the size of matplotlib visualizations

The last styling tool that we will explore is how to resize matplotlib visualizations.

The height and width of a matplotlib canvas can be changed by passing in a `figsize` tuple into the `plt.figure` method. The default tuple is `(6.0, 4.0)`, which implies that the figure is 6 inches wide and 4 inches tall.

Here's how you could increase the size of the figure to 10 inches wide and 8 inches tall:

``````plt.figure(figsize=[10,8])
plt.hist(data['sepalWidth'])
plt.title('A Histogram of Sepal Widths from the Iris Data Set')
plt.xlabel('Sepal Width')
plt.ylabel('Frequency')
plt.figure(figsize=[50,8])``````

Final thoughts

In this tutorial, you learned how to create boxplots, scatterplots, and histograms in Python using matplotlib. We also learned the basics of how to add titles to and change the size of matplotlib plots.

Data visualization is a highly in-demand field for Python developers. This tutorial should set you on a path towards becoming an experienced practitioner of the matplotlib library.

Nick
Joined in 2020
Nick McCullum is a Python and JavaScript developer from New Brunswick, Canada. Nick teaches Python, SQL, and JavaScript courses on his website: https://nickmccullum.com/.
Related to this guide:
Run Python scripts in command prompt without typing the whole path.
Ash
In these interests: windowscodepython
You can run any Python script in a command-line interface.
Ash
In these interests: python
Got a Python question? We've probably answered it here.
Ash
In these interests: python
Not sure what version of Python you’re running? Time to find out!
In Python, comprehensions are a useful construct that allows us to create new sequences in a very concise way.
Slack has become one of the most important tools for communication in many companies, mine included.
If you're familiar with Python's keyword-only arguments, then you've probably wondered why the same constraint doesn't exist for positional arguments. This changes with Python 3.
Specify a parallel filesystem cache for compiled bytecode
AKA the Walrus Operator
Learn how to use formatted string literals in Python
Posted in these interests:
python
PRIMARY
Python is howchoo's favorite programming language. We believe python promotes the most organized and performant codebase possible. We also love Django so, naturally, we love Python.
Discuss this guide:
We're hiring!
Are you a passionate writer or editor? We want to hear from you!

Follow @howchoo and learn cool things:

Like what we do?

Want to support Howchoo? When you buy a tool or material through one of our Amazon links, we earn a small commission as an Amazon Associate.