Time to Choose TensorFlow Data over ImageDataGenerator



Original Source Here

What is from_tensor_slices ?

To Understand how from_tensor_slices method works, let’s get started by loading the CIFAR-10 data¹.

import matplotlib.pyplot as pltimport numpy as np
import time
import tensorflow as tf
from tensorflow.keras.datasets import cifar10
(x_train, y_train), (x_test, y_test) = cifar10.load_data()print ('check shapes: ', x_train.shape, y_train.shape, x_test.shape, y_test.shape)>>> check shapes: (50000, 32, 32, 3) (50000, 1) (10000, 32, 32, 3) (10000, 1)

We turn the labels to categorical representation by using to_categorical

train_lab_categorical = tf.keras.utils.to_categorical(y_train, num_classes=10, dtype=’uint8')test_lab_categorical = tf.keras.utils.to_categorical(y_test, num_classes=10, dtype=’uint8')from sklearn.model_selection import train_test_splittrain_im, valid_im, train_lab, valid_lab = train_test_split(x_train, train_lab_categorical, test_size=0.20, stratify=train_lab_categorical,random_state=40, shuffle = True)print ("validation labels shape: ", valid_lab.shape)>>> validation labels shape:  (10000, 10)

We will now combine images and labels to create ‘Dataset’ objects as below —

training_data = tf.data.Dataset.from_tensor_slices((train_im, train_lab))validation_data = tf.data.Dataset.from_tensor_slices((valid_im, valid_lab))test_data = tf.data.Dataset.from_tensor_slices((x_test, test_lab_categorical))print (‘check types; '\n' ‘, type(training_data)) >>> check types;  
<class 'tensorflow.python.data.ops.dataset_ops.TensorSliceDataset'>

From type we cannot get much but let’s try to figure out what are these ‘Dataset’ objects and how we can use them. First, we check the type specification of an element of this dataset using element.spec() .

print (training_data.element_spec)>>> (TensorSpec(shape=(32, 32, 3), dtype=tf.float64, name=None), TensorSpec(shape=(10,), dtype=tf.uint8, name=None))

This is more clear to understand. We see that from_tensor_slices preserves the structure of the input tensors. An element of this dataset consists of Image (with shape: 32, 32, 3) and corresponding label (with shape: 10,). Our next question should be how to access the elements within the ‘Dataset’ object? We can create an iterator object and access the elements by using next as below —

train_iter_im, train_iter_label = next(iter(training_data))print (train_iter_im.numpy().shape, train_iter_label.numpy().shape)>>> (32, 32, 3) (10,)

Instead of next, iterand .numpy() , we can return an iterator that converts all elements of the dataset to numpy array in a single step as below —

train_iter_im1, train_iter_label1 = next(training_data.as_numpy_iterator())print (train_iter_im1.shape, train_iter_label1.shape)>>> (32, 32, 3) (10,)

Let’s visualize some training images and corresponding labels as below —

class_types = [‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’,‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’] # from cifar-10 websitecheck_list = list(training_data.as_numpy_iterator())
fig = plt.figure(figsize=(10,10))
for i in range(12): plt.subplot(4,3,i+1) plt.xticks([]) plt.yticks([]) plt.grid(False) plt.imshow(check_list[i][0], cmap='gray') plt.xlabel(class_types [np.argmax(check_list[i][1])], fontsize=13)plt.tight_layout()plt.show()
Example Images from CIFAR-10 using code above. (Source: CIFAR-10 Data)

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: