A Complete Beginner’s Guide To Data Visualization In Python & JS

Data visualization, in the simplest terms, is a way to encode data as visual elements. The effective use of these visualizations can provide a powerful means to quickly perceive and understand phenomena, which would otherwise take much longer to comprehend if only using text-based descriptions.

Data visualization with Python and JS is important because it increases the ability of an analyst or computer to recognize trends, patterns, or other useful information that may not be immediately obvious from just looking at the raw numbers.

With python, you get some of the best graphing libraries for plotting. Each of these libraries comes with amazing features that allow you to create plots depending on your likings and preferences.

The following are the most popular plotting libraries that come with python:

Matplotlib
Seaborn
Pandas Visualization
Plotly
ggplot
Bokeh
Altair

For the sake of this article, however, we will focus on creating plots while referring to Pandas visualization, Seaborn, and Matplotlib. We will also look at how you can use particular features of each library.

Since this is a beginner's guide, we will first pay attention to the syntax, after which we can then consider looking at graphs.

Datasets

For this article, we can apply the Iris and Wine Reviews datasets, taking into consideration that they are easily available. To load them in, we shall use the pandas *read_csv* method as in the following codes.

loading iris.py

loading wine_reviews.py

Matplotlib

Introduced by John Hunter in 2002, this Python and Javascript data visualization library offer the best freedom in plotting. In addition to being written in Python, this library also uses the NumPy library. As such, it can be utilized in Python and Ipython shells, web application servers, and Jupyter notebooks.

Users can make use of the wide variety of plots that come with this library such as bar, scatter, line, histogram, and others. All of these plots come in handy in helping users to understand patterns, trends, and correlations.

One thing to keep in mind about Matplotlib is that as a low-level library, it comes with a Matlab-like interface. This gives the user a lot of freedom, albeit at the cost of being required to write more code.

You can always use pip and conda to install Matplotlib as in the illustration below:

conda install matplotlib

pip install matplotlib

Matplotlib comes in handy especially when you want to create bar charts, line charts, histograms, and others. You can import it using the following import command:

import matplotlib.pyplot as plt

Scatter Plot

You can use the *scatter* technique in Matplotlib to make a scatter plot. In this example code, we have also used *plt.subplots* to give the plot a title and labels.

To make things a whole lot more interesting and easy to understand, you can add color to the graph. The colors should be assigned to each one of the data points depending on their class to give the graph more meaning, as in the illustration below

Line chart

The Matplotlib library allows for the creation of line charts using the *plot* function. You will also have the option to make multiple columns in the same graph. The easiest and most effective way to do this is to loop through the columns that you are interested in, after which you plot the columns, like in the following code:

Histogram

You can also make a histogram to determine how often particular classes of a given dataset occur. To create a histogram in Matplotlib, we use the *hist* function. In this illustration, let's refer back to the wine_review dataset.

Bar chart

One thing that you need to understand about bar charts in Matplotlib is that they just won't calculate the frequency of a class automatically. As such, you will have to make use of the pandas *value_counts* function for this purpose.

To create the bar chart itself, you can use the *bar* method. It's also good to keep in mind that bar charts are best applicable for categorical data with few categories (below 30). The reason behind this is that more than 30 classes would make the whole representation a lot messy.

Pandas visualization

This is the only Python and JS data visualization library here that qualifies as having an easy-to-use interface. Interestingly, it utilizes an API of a higher level as opposed to Matplotlib. What this means is that you can achieve similar results to those of Matplotlib while using less code with Pandas Visualization.

With Pandas visualization, it's considerably easy to plot using the Pandas data frame and series.

To install Pandas, you can either use the pip or conda functions.

pip install pandas

conda install pandas

Scatter plot

To make a scatter plot, you will have to call <dataset>.plot.scatter () after which you will then pass it two different arguments. One of the arguments is the X-column's name, and the other is the Y-column's name. You also have the option to skip this and pass the title of the plot, instead. Here's the illustration code:

Once you have the image from the code, the scatter plot should have the "x" and "y" column names assigned to it automatically.