Satellite View Images Enhancement with Deep Learning (Super-Resolution)*l7fH8htM10IHbnK9

Original Source Here

Satellite View Images Enhancement with Deep Learning (Super-Resolution)

My name is Erfan Alimohammadi and I was a data scientist at the Balad Maps company. I intend to explain one of the issues related to satellite images of Balad in this article.

Image 1 — A mixture of typical and satellite view images of Balad (source:

What was our problem?

As specified in the Balad app, in addition to the normal map, the satellite map can also be used.

Image 2 — In the Android Balad app, by opening the map display settings, you can use the satellite map instead of the default map.

The default map and satellite map data were stored in tiles:

When zooming in less on Iran, it is sufficient for general information about Iran to be downloaded and it is not necessary for more detailed information such as house numbers to be downloaded. Similarly, when viewing the map at the level of Tehran, it is not necessary to download information from another city like Yazd. This is why the data is divided. First, data with a specific accuracy has been created for each zoom level of the map. Second, the data of each specific zoom level has been divided among different regions so that when you see a part of Tehran, only the data of that part of Tehran is downloaded.

Image 3 — Each of the red squares are tiles that contain the map data, when you see a general view of Iran, the tiles include large areas and include general information. When you get to the city level, the tiles include more detailed information such as street names and house numbers.

Our map is commonly known as a tiled web map. This means that our satellite images are stored in tiles, with dimensions of 256 pixels by 256 pixels. When you zoom in on a specific part of the satellite map, the 256 by 256 image corresponding to that area is loaded. Balad purchases satellite images from other companies (such as domestic and foreign companies that specialize in imaging and mapping). Due to various reasons, we do not have access to some of the high-resolution images. This means that some of the satellite images are not available in Balad for certain large views. One of the main reasons for this is the high cost of obtaining and purchasing these images. This problem causes, if the user makes a lot of zoom, a higher quality image is not loaded, and the user sees the image in a blurred way which will cause dissatisfaction of the user. For example, if the user wants to see the three images below more closely, he will encounter the blurred images:

Images 4 and 5 — The three images show the same location, but as the zoom level increases, the image quality decreases.

This limitation in image quality is a common issue faced by many map and satellite image providers. However, Balad is constantly working on finding ways to improve the image quality and availability for its users.

What methods are there for image scaling?

In the case of scaling satellite images in a normal way, methods are used to fill in information that we don’t already have. For example, if we want to convert a 2×2 image to a 4×4 image, we need to create 16 new numbers from the 4 previous numbers and there is no unique way to do this.

Figure 6 — Scaling an image 2×2 (Source:

There are classic methods for this work that are used by default in all software in the world.

There are classical ways for this task that are used by default in all software in the world.

Image 7 — A proposed method for enlarging a 2×2 photo is to completely specify the color of the corner pixels.

As you can see in image 7, one proposed method is to fill only 12 empty spaces in the new table. This type of problem is known as interpolation because we want to predict what is happening inside the intermediate spaces. One of the proposed methods is to have a constant slope from point 10 to point 20, 30, 40 to create a color gradient in the final image. Another method is to fill the spaces near point 10 with 10, spaces near point 20 with 20, spaces near point 30 with 30, and spaces near point 40 with 40. These two methods are respectively known as linear interpolation and nearest neighbor interpolation. Other known interpolations methods such as bicubic interpolation are also used for this task.

These methods are good but solutions for this subject are considered very general. That is, some images of the world may be enlarged very well but necessarily the best method for enlarging satellite images. That is, it may be better to look for a solution that works well on our specific input when the scope of our problem is clear and limited.

We have decided to use machine learning to solve our problem because machine learning can learn a specific input of images. That is, if we can show the computer images before and after enlarging and expect it to learn the subject by seeing many examples of small and large images, we may find a better solution than general solutions.

In this path, we used different methods related to deep learning and convolutional neural networks. Since neural networks to some extent…

We have used different methods related to deep learning and convolutional neural networks in this path. Since neural networks perform the learning process based on samples similar to the human brain, they may lead to amazing results.

After researching and studying published articles in this field, we found that this problem can be solved both by supervised learning and unsupervised learning. We evaluated several different ways to solve this problem. In the next section, we will explain the details of using a supervised method that had a promising result.

The proposed method is to use the JPEG image compression algorithm. In the JPEG format, to reduce the size of images, the original pixels are not saved as they are, but rather undergo changes that result in a smaller image size. JPEG does not examine images in the RGB format, but rather converts them into a different color space known as YCbCr.

In the YCbCr color space, unlike RGB, each pixel’s color is not indicated by the amount of red, green, and blue it has. Instead, attention is first given to the grayscale portion of each pixel and it is named Y. Then it is decided how much the image should be increased or decreased in its blue value, and the name of this value is given Cb. Lastly, it is also determined how much the image should be changed in its red value and the name of this value is given Cr.

Image 8 — If we separate the Y, Cb, and Cr parts of the above image, we get three lower images. The important aspect of this color space is that in the Y part of it, all the objects and important edges that are visible to the human eye are present. Even after the image is converted to grayscale, the sky, mountains, building and lawn are still visible. (Source: English Wikipedia)

Since the human eye recognizes objects based on their luminance, the JPEG algorithm also pays special attention to this matter. In the JPEG algorithm, only the Cb and Cr parts of the image are compressed, so that the quality of the grayscale part of the image is not reduced. In this case, the luminance of important objects and edges remains untouched and the human eye can still recognize the objects well.

Even if we do a lot of compression in JPEG, the main edges are still recognizable. (Source: English Wikipedia)

As you can see in Image 9, even if we do a lot of compression in JPEG, the main edges are still recognizable. (Source: English Wikipedia)

We turned to deep learning by taking inspiration from JPEG, as our goal was also to produce an image in which important objects and edges were preserved. Initially, we converted the image to the YCbCr color space and only passed the grayscale part through our neural network. The architecture used by us was a fully convolutional neural network, which received an input of a 128×128 grayscale image and produced a 256×256 image in the output. Our goal was to maximize the similarity between the generated image and the actual large image.

Image 10 — A fully convolutional neural network is a network in which all internal layers are convolutional. It receives an image as input and produces another image in the output. (Source: Towards Data Science)

This model was only about the grayscale part of the image. For the upscaling of the Cb and Cr parts, we used the same classical upscaling methods. In fact, by using bilinear interpolation, we upscaled the Cb and Cr parts in order to obtain the final image by merging these three parts.

Sample Outputs

Image 11 — When we enlarged the images in Image 4, we reached the blurred images of Image 5. As you can see, these three images which are the output of our neural network, have a higher quality and are less blurred. The colors are also still natural.
Image 5 — The same image 5 from earlier is shown here again for better comparison with Image 11. The difference between these two images is more noticeable.

The proposed method of evaluation

Typically, to assess the quality of an image relative to an ideal image, the peak signal-to-noise ratio (PSNR) is used, which is a measure of the ratio of the maximum possible power of an image to the power of corrupting noise that affects the quality of that image.

Image 12 — We have three similar images that have been degraded from right to left. The image on the right is the original image, the middle image is moderately compressed, and the image on the left is heavily compressed. Since the change in image quality cannot be visually detected, we use the PSNR criterion. The PSNR value of the middle image is 45.53 dB and the PSNR value of the left image is 31.45 dB. (Source: English Wikipedia)

In summary, the lower the difference in color between the pixels of the secondary image and the corresponding pixels of the original image, the higher the PSNR will be. This criterion helped us to compare the various methods we had for solving our problem.

The proposed method was able to achieve the highest PSNR compared to other methods. When we used the RGB color space, we got a lower PSNR. Also, when we passed all three color components YCbCr of the image through the network, the colors of the output image became slightly unnatural. This event could be explained to some extent by the changes in the image color components. Another advantage of the proposed method was that the learning time of the model became faster because the neural network only required one-third of the information in the input section.


In conclusion, we presented a deep learning method that can enhance the quality of aerial images. We were able to provide better satisfaction when working with maps for users without incurring high costs for obtaining aerial maps.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: