Data Visualisation: An attractive way to present data!

Hello, Folks!

I am a Student who's practicing Data Science, Machine Learning, Deep Learning and Artificial Intelligence with Cloud Computing Platform.

In this article, we are going to discuss the Data Visualisation techniques and how to implement them with simple steps. I am going to demonstrate the basic Data Visualisation techniques.

What is Data Visualisation?

There are two definitions for that:

First, in simple words, it is a technique to represent the data in picturized form.

Second, in technical words, it is the Graphical Representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data.

There are many libraries that are used in data visualization. If you are a beginner then start with the seaborn and matplotlib. I am going to use these two here.

SO, let's go through the requirements first

Download and install it with the commands provided

Libraries Used:

For Data Visualisation:

SeaBorn

pip install seaborn
MatPlotLib

pip install matplotlib

For Data Analysis and Manipulation:

Pandas

pip install pandas

Dataset:

Iris Dataset

Click on the dataset and download it. Keep it in the same folder which contains your .ipynb file.

Let's Start!

Import the required libraries and data set.

Then with the object_name.head() function, we can print the first five lines of the dataset to check whether it's imported properly or not.

import pandas as pd
import warnings
warnings.filterwarnings("ignore")
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="white", color_codes=True)
iris = pd.read_csv("iris.csv")
iris.head()

	Id	SepalLengthCm	SepalWidthCm	PetalLengthCm	PetalWidthCm	Species
0	1	5.1	3.5	1.4	0.2	Iris-setosa
1	2	4.9	3.0	1.4	0.2	Iris-setosa
2	3	4.7	3.2	1.3	0.2	Iris-setosa
3	4	4.6	3.1	1.5	0.2	Iris-setosa
4	5	5.0	3.6	1.4	0.2	Iris-setosa

Now, check the total number of values that are present there in the given dataset with the object_name.count() function.

iris["Species"].value_counts()

Output:

Iris-virginica     50
Iris-versicolor    50
Iris-setosa        50
Name: Species, dtype: int64

Now, let's plot the Scatter Plot with SepalLength on X_Axis and SepalWidth on Y_Axis.

iris.plot(kind="scatter", x="SepalLengthCm", y="SepalWidthCm")

Output:

*c* argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with *x* & *y*.  Please use the *color* keyword-argument or provide a 2-D array with a single row if you intend to specify the same RGB or RGBA value for all points.





<AxesSubplot:xlabel='SepalLengthCm', ylabel='SepalWidthCm'>

sns.jointplot(x="SepalLengthCm", y="SepalWidthCm", data=iris, size=5)

Output:

<seaborn.axisgrid.JointGrid at 0x11a2a170>

sns.FacetGrid(iris, hue="Species", size=5) \
   .map(plt.scatter, "SepalLengthCm", "SepalWidthCm") \
   .add_legend()

Output:

<seaborn.axisgrid.FacetGrid at 0x11b8ced0>

Now, let's plot the Box Plot and Strip Plot with Species on X_Axis and PetalLength on Y_Axis.

sns.boxplot(x="Species", y="PetalLengthCm", data=iris)

Output:

<AxesSubplot:xlabel='Species', ylabel='PetalLengthCm'>

ax = sns.boxplot(x="Species", y="PetalLengthCm", data=iris)
ax = sns.stripplot(x="Species", y="PetalLengthCm", data=iris, jitter=True, edgecolor="gray")

Output:

Now, Let's plot the same data with Violin Plot.

sns.violinplot(x="Species", y="PetalLengthCm", data=iris, size=6)

Output:

<AxesSubplot:xlabel='Species', ylabel='PetalLengthCm'>

sns.FacetGrid(iris, hue="Species", size=6) \
   .map(sns.kdeplot, "PetalLengthCm") \
   .add_legend()

Output:

<seaborn.axisgrid.FacetGrid at 0x11c23130>

Now Pair Plot the data set.

sns.pairplot(iris.drop("Id", axis=1), hue="Species", size=3)

Output:

<seaborn.axisgrid.PairGrid at 0x11c17a90>

sns.pairplot(iris.drop("Id", axis=1), hue="Species", size=3, diag_kind="kde")

Output:

<seaborn.axisgrid.PairGrid at 0x124a8550>

iris.drop("Id", axis=1).boxplot(by="Species", figsize=(12, 6))

array([[<AxesSubplot:title={'center':'PetalLengthCm'}, xlabel='[Species]'>,
        <AxesSubplot:title={'center':'PetalWidthCm'}, xlabel='[Species]'>],
       [<AxesSubplot:title={'center':'SepalLengthCm'}, xlabel='[Species]'>,
        <AxesSubplot:title={'center':'SepalWidthCm'}, xlabel='[Species]'>]],
      dtype=object)

Output:

from pandas.plotting import andrews_curves
andrews_curves(iris.drop("Id", axis=1), "Species")

Output:

<AxesSubplot:>

from pandas.plotting import parallel_coordinates
parallel_coordinates(iris.drop("Id", axis=1), "Species")

Output:

<AxesSubplot:>

from pandas.plotting import radviz
radviz(iris.drop("Id", axis=1), "Species")

Output:

<AxesSubplot:>

Thank You!

Data Visualisation: An attractive way to present data!

What is Data Visualisation?

Libraries Used:

Did you find this article valuable?