Original Source Here
Brief Recap of NAS
As introduced in my previous post, any NAS method has three parts (as shown in Figure 1 below): search space, search strategy, and performance estimation.
And any NAS method, which is the search strategy in Figure 1, needs some sort of signal to tell if a given architecture is good or bad.
This can be seen in Figure 1 where the search strategy will send an architecture, A, to the performance estimation block which will return a performance metric to the search strategy.
Using this performance metric, the search strategy will update its internal mechanism to navigate the neural architectural search space.
An immediate question will come to mind, “How do we measure the performance of a neural architecture?”.
Let’s try to understand using an example.
Let’s say we have a task for which we want to design the architecture of the neural network. Any task will have a dataset that will be divided into 2 sets: a training set and a test set. This is shown in Figure 2 where the size of the training set block is bigger which denotes that the number of training data is normally more than that of the test data.
Now, to get the performance of neural architecture, A, follow the following steps:
Divide the training set into 2 sets: a training set and a validation set, as shown in Figure 3.
Train the neural architecture, A, on the training data set for a fixed number of epochs. After training, you will have a neural architecture with trained weights.
Now, evaluate the accuracy of the trained neural architecture, A, on the validation set. This becomes the performance metric that is returned by the “Performance Estimation” block in Figure 1.
Now, what is the bottleneck of the NAS algorithm?
It is this “Performance Estimation” block in Figure 1 that results in the requirement of huge computational resources and ends up taking huge amounts of time to perform the architecture search.
In the previous example, let’s say it takes 1 hour to train any neural architecture on the training set. Now, for any search strategy in Figure 1, we will need to evaluate at least 10,000 architectures in the provided search space to give a good result.
Let’s also assume that the time taken for evaluating the accuracy of the neural architecture on the validation data is negligible.
So, in total, we need 1 x 10,000 = 10,000 hours to perform the total training. This 10,000 hours translates to 416.67 days (or more than a year).
Thus, we need a year to perform the architecture search for a given task which is a bottleneck for any Neural Architecture Search Algorithm.
Papers like Regularized Evolution for Image Classifier Architecture Search and Learning Transferable Architectures for Scalable Image Recognition took 3150 days and 1800 days respectively.
This bottleneck problem is a very interesting research direction for anyone interested in beginning their research journey in their Ph.D. or MSc. The easy solution to this problem will be to use multiple devices in parallel to perform the performance estimation but such kind of resources are only available to big companies. So, an upcoming researcher may try to find ways that can be utilized to solve this bottleneck problem without the use of multiple devices.
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot