How a pretrained TabTransformer performs in the real world

https://miro.medium.com/max/1200/0*jd4yMQoQteNgeLYE

Original Source Here

How a pretrained TabTransformer performs in the real world

We recently tested a fancy deep learning model for tabular data. We already wrote a blogpost about the TabTransformer paper and its fundamental ideas a while ago, but at ML6 pretty ideas are not enough. We also want real results on real problems.

Boston House Prices with mayonnaise

The data we ran our tests on comes from the Belgian Federation of Notaries (FedNot) which owns a large dataset of Belgian house prices. So, it’s a bit like the Boston House Prices data, a classic machine learning 101 dataset, but better and Belgian.

The data comes from different sources. We’ve combined public datasets like OpenStreetMap with internal pseudonymised FedNot databases.

To predict the value of a certain house we’ll use a subset of the available features:

  • Physical house description: building height, parcel surface area, building surface area, building type (open, half open, closed)
  • A feature from the time dimension: the days between the house sale and a reference date of 1 January 2014
  • Location information: geohashes, postal code, province, region

For our further experiments, we sample the data and split it up in three chunks:

  1. 5000 rows for a supervised training set.
  2. 3000 rows for the test set on which we will evaluate the models.
  3. Some 300 000 rows for unsupervised learning. This means we ignore the prices for this chunk. If your flabber is gasted right now because of this surprise dataset, just read on.

Not your average kind of model

Now, let’s see what this T a b T r a n s f o r m e r is all about.

The TabTransformer architecture. (paper)

The main selling point of this model is that it contains transformer blocks. Just like in NLP, the transformer blocks learn contextual embeddings. However in this case, the inputs aren’t words but categorical features.

What’s more, you can train the transformer with the same unsupervised techniques as in NLP! (See GPT, BERT, …) We will use the 300k unlabeled examples to pretrain the transformer layers (so, without the price). That should improve the performance when we have only little labeled data.

You can read more about this model in our previous blogpost.

Thanks to Phil Wang, there is an implementation of the TabTransformer model architecture in PyTorch. We’ll use that one.

All that remains is implementing the unsupervised pretraining phase. In the TabTransformer paper, they propose to follow the approach of ELECTRA.

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: