Spatial-Temporal ConvLSTM for Crash Prediction

Original Source Here

Image by author

Spatial-Temporal ConvLSTM for Crash Prediction

A unique deep learning approach for accident prediction

This post is a unique empirical study using ConvLSTM deep-learning model and ArcPy to predict next-day crash risk locations with time sequences of crash feature data. The traffic crashes are formulated as a spatial-temporal sequence forecasting problem in which both the input and the prediction data are spatial-temporal sequences. The convolutional LSTM (ConvLSTM) approach is to build an end-to-end trainable model for the crash prediction. The result has shown that the ConvLSTM network can capture spatial-temporal correlations of traffic accidents when and where happening.

Problems: Traffic accidents lead to severe human injuries and casualties and huge economic losses. The ability to predict the risk of traffic accidents in a spatial-temporal context is important to prevent the occurrence of accidents not only for public citizens but also government officials. However, it is a very challenging task to predict traffic accidents not only the causes of multiple factors e.g. human, time, geometric and environmental but also the rare factors and sparse data sets. Traditional accident prediction commonly applies statistic regression such as Poisson, Negative Binomial (NB) and multivariate regression. But they often fail when dealing with complex and highly nonlinear data such as spatial-temporal correlations of the accidents e.g. when, where and why. There are classic machine learning approaches e.g. XGBoost, SVM and RandomForest classifiers which engineer features into the models to seek feature importance for the probability. We are talking about features in machine learning which are the arrays of numbers in multi dimensional space. To know more about feature engineering in Machine Learning, please read one of the best stories from Daniel Wilson.

Moreover, time and seasonality can play an important role in deciding the probability of accidents of when and where. Using ArcPy in ArcGIS Pro, we can aggregate crash locations in certain time windows. Click to see examples of crash locations in 3D scenes.

Solution: So, can we apply the classic machine learning models to search important causal features and then propose a practical deep-learning network to predict when and where the accidents happen? The answer is yes. The ConvLSTM model is one of the most interesting deep-learning models that is used to predict next-frame video or image. The original research was done for Precipitation Nowcasting. To better understand ConvLSTM model, let’s first look at the ordinary LSTM network model. Long Short-Term Memory (LSTM) network is a type of artificial recurrent neural network (RNN) that has feedback connections. It can not only processes single data points (such as images), but also entire sequences of data (such as speech or video). Below is one of the best explanatory videos credited to Michael Phil for illustrated Guide to LSTM. You can also read his blog.

Does it work? Now, let us see what is ConvLSTM. Simply saying, ConvLSTM is CNN convolution neural network combined with LSTM Network. Instead of only the input of sequence of data, its input is a sequence of data of CNN convolution neural network which is best suited for images and videos. This combination ensures ConvLSTM captures underlying local spatial-temporal correlations. See figure 1 below of the key equations of ConvLSTM and diagram of the inner structure.

figure 1. Image from

How it works: The goal of this empirical research is to engineer previous sequence of accident data to predict a fixed length of time period crash locations in Cobb County using ConvLSTM network. Each accident has a recorded time and location coordinate. Imagining each time period i.e. every two hours is a picture frame in the spatial -temporal space. When stacking these crash picture frames and feeding into the ConvLSTM network, we can predict fixed length i.e next certain hours of traffic accident picture frames. In order to fit the data into the model, the data will be formulated as 3D tensors such as images and stack as sequences of 4D tensors with features as described above as fourth dimension. (sequence, imageH, imageW, features). The entire Cobb County region is partitioned into 45hx41w (1845 total with 0.25 mile each) square bins using ArcPy. In this initial project, I use seven day crash sequence data as inputs of the model and predict the next day crash image map because County level crash data is very sparse and has strong weekly pattern. There will be another project to predict next two hour crash using the same model and algorithm. When applying the XGBoost and RandomForest models, it concludes 11 important features that causes traffic accidents in Cobb. These feature attributes include time invariant variables, such as road length, road curvature, average slope, population density, etc. and time variant variables such as weather, time of the day and accidents and locations etc. These features are formatted as 4th dimension of 4D tensor (7, 45, 41, 11). Three years’ Cobb county accident data is processed and batched into the ConvLSTM model with 90% training sequences and 10% percent sequences for testing. Each sequence is shifted one day with total of 1096 sequences for training and testing. The output prediction is the next day crashes happen in the 1845 bins (see figure 2 explanatory picture to display stacked County crash bin maps. The blue bins are the ones where accidents happen in that time period)

figure 2. Image by author

Build the model, train the data and make the risk prediction

Behind the scenes: A typical deep-learning model is to engineer the dataset, build the model, train the data, make the prediction and deploy the model. The model was trained for 50 epochs with Binary-crossentropy and Adam optimizer (in reality, it can be trained several hundred epochs). The result turns out to be very promising. The above image gallery displays some of the steps except the data engineering part. The ground truth along with the prediction sample of Dec 12th, 2020 is randomly selected which is not used in the training. By visually comparing the ground truth and prediction using certain probability threshold, the model can predict the most matched crash locations and patterns on that day. see figure 3.

figure 3. Image by author
Image by author

Results: With a few lines of ArcPy code, we can add the model output records to the grid feature class and map in ArcGIS Pro. It turns out to be easier to visualize in ArcGIS Pro with symbolized roads and other features.

Image by author
Image by author

Why it matters: This empirical research project shows how we can use ArcPy and deep-learning techniques such as ConvLSTM to find a promising solution for crash prediction. The entire workflows and data processes can be further enhanced with more engineering features and data, automated, deployed and mapped in ESRI ArcGIS Pro desktop or make as an ArcGIS enterprise service. As a result, the same methodology can be applied to spatial-temporal solutions for traffic flow, rainfall, crime prediction and more you can think of benefitting of humanity.

The world today presents us with many challenges. However, with the power of AI and deep learning, we can work together to seek solutions and make the world a much better place to live. Thank you for reading this story map. Please send me a comment to


[1] Xingjian Shi, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-kin Wong, Wang-chun Woo, Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. Computer Vision and Pattern Recognition (cs.CV) Cite as:arXiv:1506.04214 [cs.CV]

[2] Zhuoning Yuan, Xun Zhou, Tianbao Yang, Hetero-ConvLSTM: A Deep Learning Approach to Traffic Accident Prediction on Heterogeneous Spatio-Temporal Data. KDD 2018, August 19–23, 2018, London, United Kingdom.

[3] Sobhan Moosavi, Mohammad Hossein Samavatian, Srinivasan Parthasarathy, Radu Teodorescu, Rajiv Ramnath, Accident Risk Prediction based on Heterogeneous Sparse Data: New Dataset and Insights. Machine Learning (cs.LG); Databases (cs.DB); as:arXiv:1909.09638 [cs.LG].

[4] Jeremiah Roland *, Peter D. Way, Connor Firat, Thanh-Nam Doan, Mina Sartipi, Modeling and predicting vehicle accident occurrence in Chattanooga, Tennessee, Accident Analysis and Prevention 149 (2021) 105860

[5] ESRI ArcPy

[6] Daniel Wilson, Feature Engineering for Car Crash Prediction. Story Map May 19, 2020


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: