Hello, Folks!
I am a Student who's practicing Data Science, Machine Learning, Deep Learning and Artificial Intelligence with Cloud Computing Platform.
In this article, we are going to discuss the Data Visualisation techniques and how to implement them with simple steps. I am going to demonstrate the basic Data Visualisation techniques.
What is Data Visualisation?
There are two definitions for that:
First, in simple words, it is a technique to represent the data in picturized form.
Second, in technical words, it is the Graphical Representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data.
There are many libraries that are used in data visualization. If you are a beginner then start with the seaborn and matplotlib. I am going to use these two here.
SO, let's go through the requirements first
Download and install it with the commands provided
Libraries Used:
For Data Visualisation:
- SeaBorn
pip install seaborn
- MatPlotLib
pip install matplotlib
For Data Analysis and Manipulation:
pip install pandas
Dataset:
Click on the dataset and download it. Keep it in the same folder which contains your .ipynb file.
Let's Start!
Import the required libraries and data set.
Then with the object_name.head() function, we can print the first five lines of the dataset to check whether it's imported properly or not.
import pandas as pd
import warnings
warnings.filterwarnings("ignore")
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="white", color_codes=True)
iris = pd.read_csv("iris.csv")
iris.head()
Id | SepalLengthCm | SepalWidthCm | PetalLengthCm | PetalWidthCm | Species | |
---|---|---|---|---|---|---|
0 | 1 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
1 | 2 | 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa |
2 | 3 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
3 | 4 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
4 | 5 | 5.0 | 3.6 | 1.4 | 0.2 | Iris-setosa |
Now, check the total number of values that are present there in the given dataset with the object_name.count() function.
iris["Species"].value_counts()
Output:
Iris-virginica 50
Iris-versicolor 50
Iris-setosa 50
Name: Species, dtype: int64
Now, let's plot the Scatter Plot with SepalLength on X_Axis and SepalWidth on Y_Axis.
iris.plot(kind="scatter", x="SepalLengthCm", y="SepalWidthCm")
Output:
*c* argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with *x* & *y*. Please use the *color* keyword-argument or provide a 2-D array with a single row if you intend to specify the same RGB or RGBA value for all points.
<AxesSubplot:xlabel='SepalLengthCm', ylabel='SepalWidthCm'>
sns.jointplot(x="SepalLengthCm", y="SepalWidthCm", data=iris, size=5)
Output:
<seaborn.axisgrid.JointGrid at 0x11a2a170>
sns.FacetGrid(iris, hue="Species", size=5) \
.map(plt.scatter, "SepalLengthCm", "SepalWidthCm") \
.add_legend()
Output:
<seaborn.axisgrid.FacetGrid at 0x11b8ced0>
Now, let's plot the Box Plot and Strip Plot with Species on X_Axis and PetalLength on Y_Axis.
sns.boxplot(x="Species", y="PetalLengthCm", data=iris)
Output:
<AxesSubplot:xlabel='Species', ylabel='PetalLengthCm'>
ax = sns.boxplot(x="Species", y="PetalLengthCm", data=iris)
ax = sns.stripplot(x="Species", y="PetalLengthCm", data=iris, jitter=True, edgecolor="gray")
Output:
Now, Let's plot the same data with Violin Plot.
sns.violinplot(x="Species", y="PetalLengthCm", data=iris, size=6)
Output:
<AxesSubplot:xlabel='Species', ylabel='PetalLengthCm'>
sns.FacetGrid(iris, hue="Species", size=6) \
.map(sns.kdeplot, "PetalLengthCm") \
.add_legend()
Output:
<seaborn.axisgrid.FacetGrid at 0x11c23130>
Now Pair Plot the data set.
sns.pairplot(iris.drop("Id", axis=1), hue="Species", size=3)
Output:
<seaborn.axisgrid.PairGrid at 0x11c17a90>
sns.pairplot(iris.drop("Id", axis=1), hue="Species", size=3, diag_kind="kde")
Output:
<seaborn.axisgrid.PairGrid at 0x124a8550>
iris.drop("Id", axis=1).boxplot(by="Species", figsize=(12, 6))
array([[<AxesSubplot:title={'center':'PetalLengthCm'}, xlabel='[Species]'>,
<AxesSubplot:title={'center':'PetalWidthCm'}, xlabel='[Species]'>],
[<AxesSubplot:title={'center':'SepalLengthCm'}, xlabel='[Species]'>,
<AxesSubplot:title={'center':'SepalWidthCm'}, xlabel='[Species]'>]],
dtype=object)
Output:
from pandas.plotting import andrews_curves
andrews_curves(iris.drop("Id", axis=1), "Species")
Output:
<AxesSubplot:>
from pandas.plotting import parallel_coordinates
parallel_coordinates(iris.drop("Id", axis=1), "Species")
Output:
<AxesSubplot:>
from pandas.plotting import radviz
radviz(iris.drop("Id", axis=1), "Species")
Output:
<AxesSubplot:>
Thank You!