Crop identification using satellite imagery: Introduction

Original Source Here

Crop identification using satellite imagery: Introduction

Geo-intelligence in the service of agriculture

The world of technology is witnessing a staggering growth in the field of geo-intelligence. This growth is primarily powered by two factors- on one hand, we have improvements in data and technology- with increase in ease of availability of satellite imagery and at the same time appreciable advancements in the field of big data and machine learning. On the other hand, there is growing necessity to solve complex and important problems and numerous private players are rising to the occasion. One such problem area is that of agriculture.

Agriculture, especially in India, is rife with a number of challenges- poor application of precision farming, sub-par farm yield, weak resource planning by farmers, small and scattered farms and dis-aggregated farmer and farm level statistics, all of these leading to suppressed benefits for the stakeholders compared to their market potential.

Catering to all stakeholders- Farm, Farmers and Buyers

With its rich network, DeHaat is aptly positioned to solve these pain points. The guiding factor here is that when farm-level and crop-level data is adequately assimilated, there is improvement in scientific advisory to farmers, farm monitoring systems, yield forecasting and acreage estimation. This will result in better returns for the producers and such aggregated information will also benefit institutional buyers.

At DeHaat, we are undertaking unstructured data mining from openly accessible satellite imagery for the purpose of crop identification, farm segmentation, crop health estimation and yield forecasting. The first step in this series of solutions is crop identification. This article discusses how this is accomplished using satellite imagery with the help of machine learning.

Crop identification: Concept

To put it simply, the objective is to build a system using which if a location/farm/site is inquired for the crop occupying that site, then it returns the crop identity.

Naive conceptualization of the solution

And how can location information tell us anything about crop identity? This will be possible if we are able to utilize the ability of sensors on the satellite to capture the bio-chemistry on ground. This is the knowledge that facilitates remote sensing and here is what is happening-

  • Healthy vegetation absorbs blue and red light to fuel photosynthesis and create chlorophyll
  • A plant with more chlorophyll will reflect more near-infrared energy than an unhealthy plant
  • The most interesting part is this- a plant’s spectrum of both absorption and reflection in visible and in infrared wavelengths is unique to the plant. This leads to the hypothesis that multi-spectral images should be able to register this unique characteristic and help identify a crop (see the illustration below)
Comparison of Enhanced Vegetation Index (EVI) for Cassava and Sugarcane (Credit: Bendini, H. and Sanches et al. (2016), Using Landsat 8 image time series for crop mapping in a region of Cerrado, Brazil (ISPRS))

Enter machine learning

Needless to say, the actual solution to a data science problem is as much shaped by the objective as by the characteristics of the given data. This where one examines the ground truth (target variable) and the features in the observations (independent variables) available.

Ground truth datasets

This project has seen two phases of development- in the first phase, we were driven by the need of proving the efficacy of a ML solution and in the second phase we are in the process of tuning our solution to actual business requirements.

In the first phase, we needed abundant and quality assured data that can be easily availed. These requirements are impressively met by the crop data library put together for USA’s geography by the National Agricultural Statistics Service under USDA. This data product, called Crop Data Layer or CDL, can be analyzed using their platform CropScape. The CDL is a raster, geo-referenced, crop-specific land cover data layer created annually for the continental United States using moderate resolution satellite imagery and extensive agricultural ground truth at 30 m spatial resolution.

CDL data from 2019 (Credit: CropScape)

The producer’s accuracy (recall) and user’s accuracy (precision) is reported to be approximately 90% and this made the dataset a great candidate to test the above stated hypothesis.

In the next phase of the project, the one being pursued currently, we are using ground truth data from India to tune our solutions. Here, the ground truth data is being prepared by the remote sensing team at DeHaat with the help of field agents.

Satellite imagery

Regular images are composed of 3 bands- Red, Green and Blue (from visible spectrum). Satellite images captures light from not just the visible spectrum but also beyond it- multi-spectral images. A multi-spectral image is one that captures image data within specific wavelength ranges across the electromagnetic spectrum. Spectral scope of Sentinel-2 is illustrated below:-

Now, there are different free and open source satellite imagery sources to get data from- Sentinel 1 (synthetic aperture radar imaging data), Sentinel 2 (multi spectral data), Landsat etc.

In contrast to Landsat, Sentinel 2 has bands’ data (especially R, G, B and NIR) at 10 metres

In the current scope of the project, we have been using multi-spectral satellite imagery data of Sentinel-2 as it provides information at finer resolution compared to Landsat.

Modelling scheme

The overall schematic for crop identification is illustrated in the figure. The following steps are covered:-

  1. An area of interest (AOI) is taken that will be used to train the model
  2. The ground truth information is procured for the AOI
  3. Satellite imagery for the same is fetched, and relationship between the ground truth and reflectance value is modeled
  4. The prediction is made for the same AOI. The attributes for the model that influence the prediction like resolution of prediction (least tile size of prediction), output classes (binary class or multi-class) etc.
  5. Finally, the prediction is compared with the ground truth information to evaluate the performance of the model

Resolution of prediction

In our approach we have been favoring object-based classification over pixel-based classification. In pursuit of this approach, we first prepare tiles out of the given ground truth raster and then translate the pixel-level labels to tile-level labels basis what is the dominant class in the tile (occupying >50% pixels in the tile).

By controlling the size of tile that will be classified, we are controlling what could be the spatial resolution of prediction of our model. In the current rounds of experiments we have fixed the tile size to be 8 pixels by 8 pixels implying ~250 x 250 m² spatial resolution.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: