https://miro.medium.com/max/1000/0*l7fH8htM10IHbnK9
Original Source Here
Satellite View Images Enhancement with Deep Learning (Super-Resolution)
My name is Erfan Alimohammadi and I was a data scientist at the Balad Maps company. I intend to explain one of the issues related to satellite images of Balad in this article.
What was our problem?
As specified in the Balad app, in addition to the normal map, the satellite map can also be used.
The default map and satellite map data were stored in tiles:
When zooming in less on Iran, it is sufficient for general information about Iran to be downloaded and it is not necessary for more detailed information such as house numbers to be downloaded. Similarly, when viewing the map at the level of Tehran, it is not necessary to download information from another city like Yazd. This is why the data is divided. First, data with a specific accuracy has been created for each zoom level of the map. Second, the data of each specific zoom level has been divided among different regions so that when you see a part of Tehran, only the data of that part of Tehran is downloaded.
Image 3 — Each of the red squares are tiles that contain the map data, when you see a general view of Iran, the tiles include large areas and include general information. When you get to the city level, the tiles include more detailed information such as street names and house numbers.
Our map is commonly known as a tiled web map. This means that our satellite images are stored in tiles, with dimensions of 256 pixels by 256 pixels. When you zoom in on a specific part of the satellite map, the 256 by 256 image corresponding to that area is loaded. Balad purchases satellite images from other companies (such as domestic and foreign companies that specialize in imaging and mapping). Due to various reasons, we do not have access to some of the high-resolution images. This means that some of the satellite images are not available in Balad for certain large views. One of the main reasons for this is the high cost of obtaining and purchasing these images. This problem causes, if the user makes a lot of zoom, a higher quality image is not loaded, and the user sees the image in a blurred way which will cause dissatisfaction of the user. For example, if the user wants to see the three images below more closely, he will encounter the blurred images:
This limitation in image quality is a common issue faced by many map and satellite image providers. However, Balad is constantly working on finding ways to improve the image quality and availability for its users.
What methods are there for image scaling?
In the case of scaling satellite images in a normal way, methods are used to fill in information that we don’t already have. For example, if we want to convert a 2×2 image to a 4×4 image, we need to create 16 new numbers from the 4 previous numbers and there is no unique way to do this.
There are classic methods for this work that are used by default in all software in the world.
There are classical ways for this task that are used by default in all software in the world.
As you can see in image 7, one proposed method is to fill only 12 empty spaces in the new table. This type of problem is known as interpolation because we want to predict what is happening inside the intermediate spaces. One of the proposed methods is to have a constant slope from point 10 to point 20, 30, 40 to create a color gradient in the final image. Another method is to fill the spaces near point 10 with 10, spaces near point 20 with 20, spaces near point 30 with 30, and spaces near point 40 with 40. These two methods are respectively known as linear interpolation and nearest neighbor interpolation. Other known interpolations methods such as bicubic interpolation are also used for this task.
These methods are good but solutions for this subject are considered very general. That is, some images of the world may be enlarged very well but necessarily the best method for enlarging satellite images. That is, it may be better to look for a solution that works well on our specific input when the scope of our problem is clear and limited.
We have decided to use machine learning to solve our problem because machine learning can learn a specific input of images. That is, if we can show the computer images before and after enlarging and expect it to learn the subject by seeing many examples of small and large images, we may find a better solution than general solutions.
In this path, we used different methods related to deep learning and convolutional neural networks. Since neural networks to some extent…
We have used different methods related to deep learning and convolutional neural networks in this path. Since neural networks perform the learning process based on samples similar to the human brain, they may lead to amazing results.
After researching and studying published articles in this field, we found that this problem can be solved both by supervised learning and unsupervised learning. We evaluated several different ways to solve this problem. In the next section, we will explain the details of using a supervised method that had a promising result.
The proposed method is to use the JPEG image compression algorithm. In the JPEG format, to reduce the size of images, the original pixels are not saved as they are, but rather undergo changes that result in a smaller image size. JPEG does not examine images in the RGB format, but rather converts them into a different color space known as YCbCr.
In the YCbCr color space, unlike RGB, each pixel’s color is not indicated by the amount of red, green, and blue it has. Instead, attention is first given to the grayscale portion of each pixel and it is named Y. Then it is decided how much the image should be increased or decreased in its blue value, and the name of this value is given Cb. Lastly, it is also determined how much the image should be changed in its red value and the name of this value is given Cr.
Since the human eye recognizes objects based on their luminance, the JPEG algorithm also pays special attention to this matter. In the JPEG algorithm, only the Cb and Cr parts of the image are compressed, so that the quality of the grayscale part of the image is not reduced. In this case, the luminance of important objects and edges remains untouched and the human eye can still recognize the objects well.
Even if we do a lot of compression in JPEG, the main edges are still recognizable. (Source: English Wikipedia)
We turned to deep learning by taking inspiration from JPEG, as our goal was also to produce an image in which important objects and edges were preserved. Initially, we converted the image to the YCbCr color space and only passed the grayscale part through our neural network. The architecture used by us was a fully convolutional neural network, which received an input of a 128×128 grayscale image and produced a 256×256 image in the output. Our goal was to maximize the similarity between the generated image and the actual large image.
This model was only about the grayscale part of the image. For the upscaling of the Cb and Cr parts, we used the same classical upscaling methods. In fact, by using bilinear interpolation, we upscaled the Cb and Cr parts in order to obtain the final image by merging these three parts.
Sample Outputs
The proposed method of evaluation
Typically, to assess the quality of an image relative to an ideal image, the peak signal-to-noise ratio (PSNR) is used, which is a measure of the ratio of the maximum possible power of an image to the power of corrupting noise that affects the quality of that image.
In summary, the lower the difference in color between the pixels of the secondary image and the corresponding pixels of the original image, the higher the PSNR will be. This criterion helped us to compare the various methods we had for solving our problem.
The proposed method was able to achieve the highest PSNR compared to other methods. When we used the RGB color space, we got a lower PSNR. Also, when we passed all three color components YCbCr of the image through the network, the colors of the output image became slightly unnatural. This event could be explained to some extent by the changes in the image color components. Another advantage of the proposed method was that the learning time of the model became faster because the neural network only required one-third of the information in the input section.
Conclusion
In conclusion, we presented a deep learning method that can enhance the quality of aerial images. We were able to provide better satisfaction when working with maps for users without incurring high costs for obtaining aerial maps.
AI/ML
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot