Visualizing High Dimensional Data



Original Source Here

Visualizing High Dimensional Data

Using Hypertools – A Python Toolbox

Source: https://github.com/ContextLab/hypertools

Data visualization helps in identifying hidden patterns, associations, and trends between different columns of data. We create different types of charts, plots, graphs, etc. in order to understand what data is all about and how different columns are related to each other.

It is easy to visualize data that have lower dimensions but when it comes to data having higher dimensions it is very difficult to analyze or visualize them because it is not possible to show a large number of dimensions in a visualization. There is a famous saying by Geoff Hinton which goes by “To deal with hyper-planes in a 14-dimensional space, visualize a 3D space and say ‘fourteen’ very loudly. Everyone does it.”

But what if I tell you that there is a python toolbox it not only creates visually appealing visualizations but also facilitates dimensionality reduction in a single function call.

Hypertools is an open-source python toolbox that creates visualizations from high dimensional datasets by reducing the dimensionality by itself. It is built on top of mainly matplotlib, sklearn, and seaborn. In this article, we will explore some of the visualizations that we can create using hypertools.

Let’s get started…

Installing required libraries

We will start by installing hypertools using pip. The command given below will do that.

pip install hypertools

Importing required libraries

In this step, we will import the required library that will be used for creating visualizations.

import hypertools as hyp

Creating Visualizations

Now we will start creating different visualizations and see how hypertools work.

  1. Basic Plot
# load example data
geo = hyp.load('weights_sample')
# plot
geo.plot(fmt='.')
Source: By Author

2. Cluster Plot

geo = hyp.load('mushrooms')
# plot
geo.plot(n_clusters=10)
Source: By Author

3. Corpus Plots

This plot is used for textual datasets.

text_samples = ['i like cats alot', 'cats r pretty cool', 'cats are better than dogs',
'dogs rule the haus', 'dogs are my jam', 'dogs are a mans best friend',
'i haz a cheezeburger?']
# plot
hyp.plot(text_samples, '*', corpus=text_samples)
Source: By Author

4. UMAP

from sklearn import datasets
digits = datasets.load_digits(n_class=6)
data = digits.data
hue = digits.target.astype('str')
hyp.plot(data, '.', reduce='UMAP', hue=hue, ndims=2)
Source: By Author

5. Animated Plots

geo = hyp.load('weights_avg')# plot
geo.plot(animate=True, chemtrails=True)
Source: By Author

Go ahead try this with different datasets and create beautiful visualizations to interpret data. In case you find any difficulty please let me know in the response section.

This article is in collaboration with Piyush Ingale.

Before You Go

Thanks for reading! If you want to get in touch with me, feel free to reach me at hmix13@gmail.com or my LinkedIn Profile. You can view my Github profile for different data science projects and packages tutorials. Also, feel free to explore my profile and read different articles I have written related to Data Science.

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: