Predicting Price of Electricity using Machine Learning



Original Source Here

Predicting Price of Electricity using Machine Learning

Context

Dataset containing the price of electricity for a data center in addition to factors that might affect the price.

Source: https://www.google.com/url?sa=i&url=https%3A%2F%2Fenergyanalyst.co.uk%2Fan-introduction-to-electricity-price-forecasting%2F&psig=AOvVaw1t-aFXyN9_Yu1mp_vlapPR&ust=1623500065408000&source=images&cd=vfe&ved=0CA0QjhxqFwoTCNj4p4LHj_ECFQAAAAAdAAAAABAO

Column Descriptions:

DateTime: String, defines date and time of sample

Holiday: String, gives name of holiday if day is a bank holiday

HolidayFlag: integer, 1 if day is a bank holiday, zero otherwise

DayOfWeek: integer (0–6), 0 monday, day of week

WeekOfYear: integer, running week within year of this date

Day integer: day of the date

Month integer: month of the date

Year integer: year of the date

PeriodOfDay integer: denotes half hour period of day (0–47)

SystemLoadEA: the national load forecast for this period

SMPEA: the price forecast for this period

ORKTemperature: the actual temperature measured at Cork airport

ORKWindspeed: the actual windspeed measured at Cork airport

CO2Intensity: the actual CO2 intensity in (g/kWh) for the electricity produced

ActualWindProduction: the actual wind energy production for this period

SystemLoadEP2: the actual national system load for this period

SMPEP2: the actual price of this time period, the value to be forecasted

Research paper

https://www.sciencedirect.com/science/article/pii/S030626191830196X

Importing the necessary libraries

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from math import sqrt
import keras
from keras.models import Sequential
from keras.layers import Dense
from sklearn.preprocessing import StandardScaler

Reading the dataset

df = pd.read_csv("/content/electricity_prices.csv", na_values=['?'])
df.head()
df.shape

(38014, 18)

The dataset contains 38014 rows and 18 columns.

df.info()<class 'pandas.core.frame.DataFrame'>
RangeIndex: 38014 entries, 0 to 38013
Data columns (total 18 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 DateTime 38014 non-null object
1 Holiday 38014 non-null object
2 HolidayFlag 38014 non-null int64
3 DayOfWeek 38014 non-null int64
4 WeekOfYear 38014 non-null int64
5 Day 38014 non-null int64
6 Month 38014 non-null int64
7 Year 38014 non-null int64
8 PeriodOfDay 38014 non-null int64
9 ForecastWindProduction 38009 non-null float64
10 SystemLoadEA 38012 non-null float64
11 SMPEA 38012 non-null float64
12 ORKTemperature 37719 non-null float64
13 ORKWindspeed 37715 non-null float64
14 CO2Intensity 38007 non-null float64
15 ActualWindProduction 38009 non-null float64
16 SystemLoadEP2 38012 non-null float64
17 SMPEP2 38012 non-null float64
dtypes: float64(9), int64(7), object(2)
memory usage: 5.2+ MB

Checking the Null values

df.isnull().sum()DateTime                    0
Holiday 0
HolidayFlag 0
DayOfWeek 0
WeekOfYear 0
Day 0
Month 0
Year 0
PeriodOfDay 0
ForecastWindProduction 5
SystemLoadEA 2
SMPEA 2
ORKTemperature 295
ORKWindspeed 299
CO2Intensity 7
ActualWindProduction 5
SystemLoadEP2 2
SMPEP2 2
dtype: int64

ForecastWindProduction , SystemLoadEA , SMPEA ,ORKTemperature ,ORKWindspeed ,CO2Intensity ActualWindProduction , SystemLoadEP2 and SMPEP2 have null values.

Removing the null values

df = df.dropna()

Plotting the target feature

plt.plot("SMPEP2", data=df)

Correlation plot of Independent attributes

plt.figure(figsize=(9,7))
sns.heatmap(df.corr(), annot=True, square=True, fmt='.1f', cbar=False);

Distribution plot of Target feature

sns.distplot(df['SMPEP2'])

Splitting the independent features and target feature

X = df[['ActualWindProduction', 'SystemLoadEP2', 'SMPEA', 'SystemLoadEA', 'ForecastWindProduction', 
'DayOfWeek', 'Year', 'ORKWindspeed', 'CO2Intensity', 'PeriodOfDay']]
y = df['SMPEP2']

Train-Validation Split (90% Train set and 10% Validation set)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.1, random_state = 42)
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Model Building using Neural networks

model = keras.Sequential([
keras.layers.Dense(512, activation="relu", input_shape=[10]),
keras.layers.Dense(800, activation="relu"),
keras.layers.Dropout(0.3),
keras.layers.Dense(1024, activation="relu"),
keras.layers.Dropout(0.3),
keras.layers.Dense(1, activation = 'linear'),
])
model.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_4 (Dense) (None, 512) 5632
_________________________________________________________________
dense_5 (Dense) (None, 800) 410400
_________________________________________________________________
dropout_2 (Dropout) (None, 800) 0
_________________________________________________________________
dense_6 (Dense) (None, 1024) 820224
_________________________________________________________________
dropout_3 (Dropout) (None, 1024) 0
_________________________________________________________________
dense_7 (Dense) (None, 1) 1025
=================================================================
Total params: 1,237,281
Trainable params: 1,237,281
Non-trainable params: 0

Compiling the model

model.compile(loss='mse', optimizer='adam', metrics=['mse','mae'])

Fitting the model with Early stopping and restoring the best weights

early_stopping = keras.callbacks.EarlyStopping(patience = 10, min_delta = 0.001, restore_best_weights =True )history = model.fit(
X_train, y_train,
validation_data=(X_test, y_test),
batch_size=50,
epochs=100,
callbacks=[early_stopping],
verbose=1,
)

Evaluating the model on test set

from sklearn.metrics import mean_absolute_error,r2_score
predictions = model.predict(X_test)
print(f"MAE: {mean_absolute_error(y_test, predictions)}")

print(f"R2_score: {r2_score(y_test, predictions)}")

MAE: 10.99616479512835
R2_score: 0.5678318389935775

XG Boost Regressor Model

from xgboost import XGBRegressor
model2 = XGBRegressor(n_estimators = 8000, max_depth=17, eta=0.1, subsample=0.7, colsample_bytree=0.8)
model2.fit(X_train, y_train)
pred = model2.predict(X_test)
r2_score(y_test, pred)

R2 Score is 0.6137340486115418

mean_absolute_error(y_test, pred)

Mean Absolute error is 9.415647364373736

Final Predictions of XG Boost Model

predarray([ 45.052734,  58.09632 ,  66.80695 , ...,  63.155807,  40.944473,
182.29193 ], dtype=float32)

Dump model to pickle file

model2.predict(X_test)
pkl_out = open("train_classifier","wb")
pkl.dump(model2,pkl_out)

Conclusion

By using this analysis we can predict electricity prices that is the actual price of this time period and forecast future business strategies.

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: