Folder restructure

This commit is contained in:
Radu C. Martin 2021-07-20 22:55:56 +02:00
parent c2a857a0e8
commit ffabe92245
16 changed files with 12 additions and 12 deletions

View file

@ -0,0 +1,27 @@
\begin{titlepage}
\begin{center}
\includegraphics[width=0.5\linewidth]{Logo-EPFL.png}\par
\vspace{5cm}
{\huge \bf
\setstretch{1.25}
Inter-seasonal Performance of Gaussian Process-based
Model Predictive Control of Buildings
\par}
\vspace{2cm}
{\LARGE \bf Master Project\par}
\vspace{6cm}
{\Large Radu C. Martin}\par
\vspace{2cm}
{\large Professor: Colin Jones}\par
{\large Supervisor: Manuel Pascal Koch}\par
\vspace{0.5cm}
{\large\today}
\end{center}
\end{titlepage}

View file

@ -0,0 +1,108 @@
\section{Introduction}
\subsection{Motivation}
Buildings are a major consumer of energy, with more than 25\% of the total
energy consumed in the EU coming from residential
buildings~\cite{tsemekiditzeiranakiAnalysisEUResidential2019}. Combined with a
steady increase in energy demand and stricter requirements on energy
efficiency~\cite{europeancommission.jointresearchcentre.EnergyConsumptionEnergy2018},
this amplifies the need for more accessible means of regulating energy usage of
new and existing buildings.
Data-driven methods of building identification and control prove very useful
through their ease of implementation, foregoing the need of more complex
physics-based models. On the flip side, additional attention is required to the
design of these control schemes, as the results could vary greatly from one
implementation to another.
Gaussian Processes have been previously used to model building dynamics, but
they are usually limited by a fixed computational budget. This limits the
approaches that can be taken for identification and update of said models.
Learning \acrshort{gp} models have also been previously used in the context of
autonomous racing cars \cite{kabzanLearningBasedModelPredictive2019}, but there
the Sparse \acrshort{gp} model was built on top of a white-box model and only
responsible for fitting the unmodeled dynamics.
\subsection{Previous Research}
With the increase in computational power and availability of data over time,
the accessibility of data-driven methods for system identification and control
has also risen significantly.
The idea of using Gaussian Processes as regression models for control of dynamic
systems is not new, and has already been explored a number of times. A general
description of their use, along with the necessary theory and some example
implementations is given in~\cite{kocijanModellingControlDynamic2016}.
In~\cite{pleweSupervisoryModelPredictive2020}, a \acrlong{gp} Model with a
\acrlong{rq} Kernel is used for temperature set point optimization.
Gaussian Processes for building control have also been studied before in the
context of Demand Response~\cite{nghiemDatadrivenDemandResponse2017,
jainLearningControlUsing2018}, where the buildings are used for their heat
capacity in order to reduce the stress on energy providers during peak load
times.
There are, however, multiple limitations with these approaches.
In~\cite{nghiemDatadrivenDemandResponse2017} the model is only identified once,
ignoring changes in weather or plant parameters, which could lead to different
dynamics. This is addressed in \cite{jainLearningControlUsing2018} by
re-identifying the model every two weeks using new information. Another
limitation is that of the scalability of the \acrshort{gp}s, which become
prohibitively expensive from a computational point of view when too much data is
added.
Outside of the context of building control, Sparse \acrlong{gp}es have been used
in autonomous racing in order to complement the physics-based model by fitting
the unmodeled dynamics of the
system~\cite{kabzanLearningBasedModelPredictive2019}.
\subsection{Contribution}
The ability to learn the plant's behaviour in new regions is very helpful in
maintaining model performance over time, as its behaviour starts deviating and
the original identified model goes further and further into the extrapolated
regions.
This project tries to combine the use of online learning control schemes with
\acrlong{gp} Models through implementing \acrlong{svgp} Models. \acrshort{svgp}s
provide means of extending the use of \acrshort{gp}s to larger datasets, thus
enabling the periodic re-training of the model to include all the historically
available data.
\subsection{Project Outline}
The main body of work can be divided in two parts: the development of a computer
model of the \pdome\ building and the synthesis, validation and comparison of
multiple control schemes using both classical \acrshort{gp}s, as well as
\acrshort{svgp}s.
Section~\ref{sec:gaussian_processes} provides the mathematical background for
understanding \acrshort{gp}s, as well as the definition in very broad strokes of
\acrshort{svgp}s and their differences from the classical implementation of
\acrlong{gp}es. This information is later used for comparing their performances
and outlining their respective pros and cons.
Section~\ref{sec:CARNOT} goes into the details of the implementation of the
\pdome\ computer model. The structure of the CARNOT model is described, and all
the parameters required for the simulation are either directly sourced from
existing literature, computed from available values or estimated using publicly
available data from \textit{Google Maps}~\cite{GoogleMaps}.
Following that, Section~\ref{sec:hyperparameters} discusses the choice of the
hyperparameters for the \acrshort{gp} and \acrshort{svgp} models respectively in
the context of their multi-step ahead simulation performance.
The \acrlong{ocp} implemented in the \acrshort{mpc} controller is described in
Section~\ref{sec:mpc_problem}, followed by a description of the complete
controller implementation in Python in Section~\ref{sec:implementation}.
Section~\ref{sec:results} presents and analyzes the results of full-year
simulations for both \acrshort{gp} and \acrshort{svgp} models. A few variations
on the initial \acrshort{svgp} model are further analyzed in order to identify
the most important parameteres for long time operation.
Finally, Section~\ref{sec:conclusion} provides a review of all the different
implementation and discusses the possible improvements to both the current
\acrshort{gp} and \acrshort{svgp} models.
\clearpage

View file

View file

@ -0,0 +1,351 @@
\section{Gaussian Processes Background}\label{sec:gaussian_processes}
The \acrfull{gp} is a member of the \textit{kernel machines} class of algorithms
in the field of machine learning.
The formal definition~\cite{rasmussenGaussianProcessesMachine2006} of
\acrlong{gp} states that:
\begin{displayquote}
A Gaussian Process is a collection of random variables, any finite number of
which have a joint Gaussian distribution.
\end{displayquote}
A \acrshort{gp} is completely specified by a mean and a covariance function:
\begin{equation}
\begin{aligned}
m(\mathbf{x}) &= \mathbb{E}[f(\mathbf{x})] \\
k(\mathbf{x}, \mathbf{x'}) &= \mathbb{E}[f(\mathbf{x} -
m(\mathbf{x}))(f(\mathbf{x'}) - m(\mathbf{x'}))]
\end{aligned}
\end{equation}
This notation is commonly abbreviated as:
\begin{equation}
f(\mathbf{x}) \sim \mathcal{GP}(m(\mathbf{x}), k(\mathbf{x}, \mathbf{x'}))
\end{equation}
Making the simplifying assumption of $m(\mathbf{x}) = 0$, we have the joint
distribution of the training outputs $f$, and the test outputs $f_*$ according
to the given prior:
\begin{equation}
\begin{bmatrix}
\mathbf{f} \\
\mathbf{f_*} \\
\end{bmatrix} \sim
\mathcal{N}\left(
\mathbf{0},
\begin{bmatrix}
K(X, X) & K(X, X_*) \\
K(X_*, X) & K(X_*, X_*) \\
\end{bmatrix}
\right)
\end{equation}
In the case of noisy observations, assuming $y = f + \epsilon$ with $\epsilon
\sim \mathcal{N}(0, \sigma_n^2)$, we have the following joint distribution:
\begin{equation}
\begin{bmatrix}
\mathbf{y} \\
\mathbf{f_*} \\
\end{bmatrix} \sim
\mathcal{N}\left(
\mathbf{0},
\begin{bmatrix}
K(X, X) + \sigma_n^2 I& K(X, X_*) \\
K(X_*, X) & K(X_*, X_*) \\
\end{bmatrix}
\right)
\end{equation}
which, for the rest of the section, will be used in the abbreviated form:
\begin{equation}
\begin{bmatrix}
\mathbf{y} \\
\mathbf{f_*} \\
\end{bmatrix} \sim
\mathcal{N}\left(
\mathbf{0},
\begin{bmatrix}
K + \sigma_n^2 I& K_*^T \\
K_* & K_{**} \\
\end{bmatrix}
\right)
\end{equation}
\subsection{Parameter learning}
"Training" a \acrshort{gp} is the process of finding the kernel parameters that
best explain the data. This is done by maximizing the probability density
function for the observations $y$, also known as the marginal likelihood:
\begin{equation}\label{eq:gp_likelihood}
p(y) = \frac{1}{\sqrt{(2\pi)^{n}\det{\left(K + \sigma_n^2I\right)}}}
\exp{\left(-\frac{1}{2}y^T\left(K + \sigma_n^2I\right)^{-1}y\right)}
\end{equation}
In order to simplify computation in actual \acrshort{gp} implementations, the
value that gets maximized is the log of Equation~\ref{eq:gp_likelihood}, the log
marginal likelihood:
\begin{equation}\label{eq:gp_log_likelihood}
\log(p(y)) = - \frac{1}{2}\log{\left(
\det{\left(
K + \sigma_n^2I
\right)}
\right)}
- \frac{1}{2}y^T\left(
K + \sigma_n^2I
\right)^{-1}y
- \frac{n}{2}\log{\left(2\pi\right)}
\end{equation}
\subsection{Prediction}
Given the proper covariance matrices $K$ and $K_*$, predictions on new points
can be made as follows:
\begin{equation}
\begin{aligned}
\mathbf{f_*} = \mathbb{E}\left(f_*|X, \mathbf{y}, X_*\right) &=
K_*\left(K + \sigma_n^2I\right)^{-1}\mathbf{y} \\
cov(\mathbf{f_*}) &= K_{**} - K_*\left(K +\sigma_n^2I\right)^{-1}K_*^T \\
\end{aligned}
\end{equation}
The extensions of these predictions to a non-zero mean \acrshort{gp} comes
naturally by applying the zero mean \acrshort{gp} to the \textit{difference}
between the observations and the fixed mean function:
\begin{equation}
\bar{\mathbf{f}}_* = \mathbf{m}(X_*) + K_*\left(K +
\sigma_n^2I\right)^{-1}(\mathbf{y} - \mathbf{m}(X)) \\
\end{equation}
\subsection{Kernels}\label{sec:Kernels}
The choice of the kernel is an important part for any kernel machine class
algorithm. It serves the purpose of shaping the behaviour of the \acrshort{gp}
by imposing a desired level of smoothness of the resulting functions, a
periodicity, linearity, etc. This extends the use cases of the \acrshort{gp}
models while including any available prior information of the system to be
modeled.
For the purpose of identifying a dynamical system, a few kernels are popular
choices~\cite{kocijanModellingControlDynamic2016}:
\subsubsection*{Squared Exponential Kernel}
This kernel is used when the system to be modelled is assumed to be smooth and
continuous. The basic version of the \acrshort{se} kernel has the following form:
\begin{equation}
k(\mathbf{x}, \mathbf{x'}) = \sigma^2 \exp{\left(- \frac{1}{2}\frac{\norm{\mathbf{x} -
\mathbf{x'}}^2}{l^2}\right)}
\end{equation}
With the model variance $\sigma^2$ and lengthscale $l$ as parameters.
The lengthscale indicates how fast the correlation diminishes as the two points
get further apart from each other.
This function can be modified to use an arbitrary positive semi-definite matrix
$\Lambda^{-1}$.
This will impose different weights for each combination of $\mathbf{x}$ and
$\mathbf{x'}$'s dimensions:
\begin{equation}
\begin{aligned}
k(\mathbf{x}, \mathbf{x'})
&= \sigma^2\exp{\left[-\frac{1}{2} (\mathbf{x} - \mathbf{x'})^T \Lambda^{-1}
(\mathbf{x} - \mathbf{x'})\right]} \\
&= \sigma^2 \exp{\left[-\frac{1}{2}\sum_{d=1}^D w_d(x_d - x'_d)^2\right]}
\end{aligned}
\end{equation}
where $w_d = \frac{1}{l_d^2}; d = 1 ,\dots, D$, with D being the dimension of the
data.
This special case of $\Lambda^{-1} = \text{diag}{\left([l_1^{-2},\dots,l_D^{-2}]\right)}$
is equivalent to implementing different lengthscales on different regressors.
This can be used to asses the relative importance of each regressor through the
value of the hyperparameters. This is the \acrfull{ard} property.
\subsubsection*{Rational Quadratic Kernel}
The \acrfull{rq} Kernel can be interpreted as an infinite sum of \acrshort{se}
kernels with different lengthscales. It has the same smooth behaviour as the
\acrlong{se} Kernel, but can take into account the difference in function
behaviour for large scale vs small scale variations.
\begin{equation}
k(\mathbf{x}, \mathbf{x'}) = \sigma^2 \exp{\left(1+\frac{1}{2}\frac{\norm{\mathbf{x} -
\mathbf{x'}}^2}{\alpha l^2}\right)}^{-\alpha}
\end{equation}
When extended with \acrshort{ard} behaviour, it becomes:
\begin{equation}
k(\mathbf{x}, \mathbf{x'})
= \sigma^2\exp{\left[1 + \frac{1}{2} {(\mathbf{x} - \mathbf{x'})}^T \Lambda^{-1}
(\mathbf{x} - \mathbf{x'})\right]}^{-\alpha} \\
\end{equation}
\subsection{Extending GPs to larger datasets}
There exist multiple methods of scaling \acrshort{gp}s to be used on more data,
without inquiring the penalty of inverting the covariance matrix. An overview
and comparison of multiple methods is given
at~\cite{liuUnderstandingComparingScalable2019}.
For the scope of this project, the choice of using the \acrfull{svgp} models has
been made, since it provides a very good balance of scalability, capability,
robustness and controllability~\cite{liuUnderstandingComparingScalable2019}.
The \acrlong{svgp} has been first introduced
by~\textcite{hensmanGaussianProcessesBig2013} as a way to scale the use of
\acrshort{gp}s to large datasets. A detailed explanation on the mathematics of
\acrshort{svgp}s and reasoning behind it is given
at~\cite{yiSparseVariationalGaussian2021}, with a very brief summary presented
below.
\subsubsection{Summarizing the training data}
Trivially downsampling the available data to a size that fits within the
computational budget is not the best solution, since it implies ignoring large
amounts of possibly otherwise unavailable information.
In order to solve the problem of reducing the dataset size without the downside
of discarding potentially useful information, a new set of random variables
$f(X_s)$, usually denoted as $f_s$ is introduced, with the requirement that this
new dataset has size $n_s$ smaller than the size $n$ of the original dataset.
The $X_s$ are called \textit{inducing locations}, and $f_s$ --- \textit{inducing
random variables}. They summarize the data in the sense that a model trained on
this new dataset should be able to generate the original dataset with a high
probability.
The multivariate Gaussian distribution is used to establish the relationship
between $f_s$ and $f$, which will serve the role of the prior, now called the
sparse prior:
\begin{equation}
\begin{bmatrix}
\mathbf{f}(X) \\
\mathbf{f}(X_s) \\
\end{bmatrix} \sim
\mathcal{N}\left(
\mathbf{0},
\begin{bmatrix}
K(X, X) & K(X, X_s) \\
K(X_s, X) & K(X_s, X_s) \\
\end{bmatrix}
\right)
\end{equation}
\subsubsection{Evidence Lower Bound}\label{sec:elbo}
Computing the log likelihood is one of the most expensive parts of model
training, due to inversion of the kernel matrix term ${\left(K +
\sigma_n^2I\right)}^{-1}$ (cf. Equation~\ref{eq:gp_likelihood}) with an
algorithmic complexity of $\mathcal{O}(n^3)$.
In order to solve this problem, the log likelihood equation
(Equation~\ref{eq:gp_log_likelihood}) used for training a
classical \acrshort{gp} is replaced with an approximate value, that is
computationally tractable on larger sets of data.
The following derivation of the \acrshort{elbo} is based on the one presented
in~\cite{yangUnderstandingVariationalLower}.
Assume $X$ to be the observations, and $Z$ the set parameters of the
\acrshort{gp} model, also known as the latent variables. The posterior
distribution of the hidden variables can be written as follows:
\begin{equation}
p(Z|X) = \frac{p(X|Z)p(Z)}{p(X)} = \frac{p(X|Z)p(Z)}{\int_Z p(X,Z)}
\end{equation}
The main idea of Variational Bayesian methods is to replace the true posterior
distribution $p(Z|X)$ with an approximation $q(Z)$ that is easier to compute,
but is as close as possible to the original.
One measure of the closeness of the two distributions is the \acrfull{kl}
divergence, which for variational inference takes the following form:
\vspace{5pt}
\begin{equation}
\begin{aligned}
KL\left[q(Z)||p(Z|X)\right]
&= \int_Z q(Z)\log{\left(\frac{q(Z)}{p(Z|X)}\right)} \\
&= - \int_Z q(Z)\log{\left(\frac{q(Z|X)}{p(Z)}\right)} \\
&= - \left(
\int_Z q(Z)\log{\left(\frac{p(X,Z)}{q(Z)}\right)}
- \int_Z q(Z)\log{\left(p(X)\right)}
\right) \\
&= - \int_Z q(Z) \log{\left(\frac{p(X,Z)}{q(Z)}\right)}
+ \log{\left(p(X)\right)}\int_Z q(Z) \\
&= -L + \log{\left(p(X)\right)} \\
\end{aligned}
\end{equation}
\vspace{5pt}
where L is the \acrfull{elbo}. Rearranging this equation we get:
\begin{equation}
L = \log{\left(p(X)\right)} - KL\left[q(Z)||p(Z|X)\right]
\end{equation}
Since KL is always $\geq 0$, we get that $L \leq \log{\left(p(X)\right)}$ is a
lower bound of the log probability of observations.
\subsection{Gaussian Process Models for Dynamical
Systems}\label{sec:gp_dynamical_system}
In the context of Dynamical Systems Identification and Control, Gaussian
Processes are used to represent different model structures, ranging from state
space and \acrshort{nfir} structures, to the more complex \acrshort{narx},
\acrshort{noe} and \acrshort{narmax}.
The general form of an \acrfull{narx} model is as follows:
\begin{equation}
\hat{y}(k) =
f(w(k-1),\dots,w(k-l_w),y(k-1),\dots,y(k-l_y),u(k-1),\dots,u(k-l_u))
\end{equation}
where w are the exogenous inputs with a maximum lag of $l_w$, u are the model
inputs with a maximum lag of $l_u$, and y are the model outputs with a maximum
lag of $l_y$. This structure is a prediction model, since it uses the measured
values of y as regressors.
The \acrfull{noe} uses the past predictions $\hat{y}$ for future predictions:
\begin{equation}
\hat{y}(k) =
f(w(k-1),\dots,w(k-l_w),\hat{y}(k-1),\dots,\hat{y}(k-l_y),u(k-1),\dots,u(k-l_u))
\end{equation}
Due to its use for multi-step ahead simulation of system behaviour, as opposed
to only predicting one state ahead using current information, the \acrshort{noe}
structure can be considered a \textit{simulation model}.
In order to get the best simulation results from a \acrshort{gp} model, the
\acrshort{noe} structure would have to be employed. Due to the high algorithmic
complexity of training and evaluating \acrshort{gp} models, this approach is
computationally intractable. In practice a \acrshort{narx} model will be trained,
which will be validated through multi-step ahead prediction.
\clearpage

View file

@ -0,0 +1,614 @@
\section{CARNOT model}\label{sec:CARNOT}
In order to better analyze the different model training and update methods it
was decided to replace the physical \pdome\ building with a computer model.
This allows for faster-than-real-time simulations, as well as perfectly
reproducing the weather conditions and building response for direct comparison
of different control schemes over longer periods of time.
The model is designed using the CARNOT
toolbox~\cite{lohmannEinfuehrungSoftwareMATLAB} for Simulink. It is based on the
CARNOT default \textit{Room Radiator} model, with the following changes:
\begin{itemize}
\item Only one of the two default rooms is used
\item The outside walls are replaced with windows
\item A window is added on the roof as an equivalent for the four skylights
presented in Section~\ref{sec:Physical_dimensions}
\end{itemize}
The resulting schema for the \pdome\ building is presented in
Figure~\ref{fig:CARNOT_polydome}. The following sections will focus on detailing
the choice of all the necessary model parameters.
\begin{figure}[ht]
\centering
\includegraphics[width = \textwidth]{Images/polydome_room_model.pdf}
\caption{Simulink Schema of the CARNOT \pdome\ building model}
\label{fig:CARNOT_polydome}
\end{figure}
\clearpage
Finally, the Simulink model is completed by adding a \textit{Weather Data File}
containing weather measurements for a whole year, and a \textit{Weather
Prediction} block responsible for sending weather predictions to the MPC.\@ The
controller itself is defined in Python and is connected to Simulink via three
TCP/IP sockets. Details on the implementation are presented in
Section~\ref{sec:implementation}.
\subsection{Physical dimensions}\label{sec:Physical_dimensions}
The \pdome\ building is a dome-shaped, single-zone building serving the role
of classroom for audiences of around one hundred students.
Most of the necessary measurements are already available from the
\citetitle*{nattererPolydomeTimberShell1993} article by
\textcite{nattererPolydomeTimberShell1993}. It presents the \pdome\ geometry as
having a floor area of 25m $\times$ 25m, a maximum height of the building of 7m,
walls of a maximum height of 3m. The roof is a spherical cap with a radius of
approximately 28m.
One particularity of the \pdome\ is the presence of four large skylights on the
roof. An estimate of their size has been made using images from \textit{Google
Maps Satellite View}~\cite{GoogleMaps}, with the included measurement tool. These
measurements are presented in Figure~\ref{fig:Google_Maps_Skylights}. The
skylights are measured to be squares of edge 2.5m.
\begin{figure}[ht]
\centering
\vspace{-10pt}
\includegraphics[width = 0.75\textwidth]{Images/google_maps_polydome_skylights}
\caption{Google Maps Satellite view of the \pdome\ building}
\label{fig:Google_Maps_Skylights}
\end{figure}
The only remaining missing piece of information is the \textit{stem wall}
height, which is the height of the walls from the ground on which the dome
ceiling directly resides. Due to the limited campus access, this measure has
also been approximated from a Google Maps image, presented in
Figure~\ref{fig:Google_Maps_Streetview}. An object of known size has been used
as reference, after which the following measurements have been done in
\citetitle{kimballGIMPGNUImage}~\cite{kimballGIMPGNUImage} using the
\textit{Measure Tool}.
The chosen reference object is the \pdome\ \acrshort{hvac} system, the full
description of which is presented in Section~\ref{sec:HVAC_parameters}, and
which has a known height of 2061mm \cite{aermecRoofTopManuelSelection}.
\clearpage
\begin{figure}[ht]
\centering
\includegraphics[width = \textwidth]{Images/polydome_streetview_annotated}
\caption{Google Maps StreetView view of the \pdome\ building}
\label{fig:Google_Maps_Streetview}
\end{figure}
The graphical analysis resulted in the measurements presented in
Table~\ref{tab:GIMP_measurements}:
\begin{table}[ht]
%\vspace{-8pt}
\centering
\begin{tabular}{||c c c c||}
\hline
Object & Size [px] & Size[mm] & Size[m]\\
\hline \hline
\acrshort{hvac} height & 70 & 2100 & 2.1 \\
Building height & 230 & 6900 & 6.9 \\
Stem wall & 45 & 1350 & 1.35 \\
Dome height & 185 & 5550 & 5.55 \\
\hline
\end{tabular}
\caption{Calculated Dome Dimensions}
\label{tab:GIMP_measurements}
\end{table}
For a validation of the results shown in Figure~\ref{tab:GIMP_measurements} it
is possible to compare the total \pdome\ building height measured in GIMP with
the value presented beforehand. The graphical measure of 6.9m is indeed very
close to the described height of 7m.
These measurements can be used to set the size parameters of all the nodes in
the model:
\begin{itemize}
\item Side windows with a size of 2m $\times$ 25m each
\item Roof window with a size of 5m $\times$ 5m, cumulative surface of all
the skylights
\item Ceiling with a size of 25m $\times$ 25m
\item Floor with a size of 25m $\times$ 25m
\end{itemize}
\subsection{Internal volume}
The \pdome\ building has a structure that is mostly based on a dome shape, with
the difference that the dome portion of the building does not reach the ground,
but stands above it at a height of approximately $1.35$m (cf.
Table~\ref{tab:GIMP_measurements}), with the large side windows extending to the
ground and creating a \textit{stem wall} for the dome to sit on.
The total internal volume of the \pdome\ building can therefore be approximated
by computing the sum volumes of the two elements: the stem and the dome.
Geometrically, a dome is a portion of a sphere cut off by a plane. Its volume
can therefore be computed in a similar fashion to that of a complete sphere.
A sphere can be regarded as volume of rotation of function $f(x)$ (cf.
Equation~\ref{eq:rot_func}) around the x axis, where $R$ is the sphere radius
and $x \in [0, 2r]$:
\begin{equation}\label{eq:rot_func}
f(x) = \sqrt{R^2 - (x-R)^2} = \sqrt{2Rx - x^2}
\end{equation}
It is therefore possible to compute the volume of a dome by integrating $f(x)$
up to the dome height $h$, as presented in Equation~\ref{eq:rot_vol}.
\begin{equation}\label{eq:rot_vol}
\begin{aligned}
V &= \pi \int_0^h f(x)^2 dx \\
&= \pi \int_0^h (2Rx - x^2)dx \\
&= \pi \left( Rx^2 - \frac{1}{3}x^3 \right)_0^h \\
&= \frac{\pi h^2}{3} (3R - h) \\
\end{aligned}
\end{equation}
The volume of the \pdome\ dome is computed by first finding the radius of the
complete sphere, the radius of curvature (cf.
Equation~\ref{eq:radius_curvature}, where $h$ is the height of the dome and $r$
is the radius of the base of the dome) and plugging it in
Equation~\ref{eq:rot_vol}.
\begin{equation}\label{eq:radius_curvature}
R_c = \frac{r^2 + h^2}{2h}
\end{equation}
The volume of the dome is then given by:
\begin{equation}
V_d = \frac{1}{3} \pi h^2 (3R_c - h) = \frac{1}{6} \pi h (3r^2 + h^2)
\end{equation}
The stem of the \pdome\ is approximated to a cube of edge 25m, and its volume can
therefore be calculated as:
\begin{equation}
V_s = l_s^2 \times h_s
\end{equation}
The total volume of the building is then given as:
\begin{equation}
V = V_d + V_s = \frac{1}{6} \pi h (3r^2 + h^2) + l_s^2 \times h_s
\end{equation}
Numerically, considering a dome diameter of 28m, a dome height of 5.55m and a stem
wall edge of 25m, we get the approximate volume of the building:
\begin{equation}\label{eq:numerical_volume}
V = V_d + V_s = 1798.22m^3 + 843.75m^3 = 2641.97m^3 \approx 2650m^3
\end{equation}
The value presented in Equation~\ref{eq:numerical_volume} is used directly in
the \textit{room\_node} element of the CARNOT model (cf.
Figure~\ref{fig:CARNOT_polydome}), as well as the calculation of the Air
Exchange Rate, presented in Section~\ref{sec:Air_Exchange_Rate}.
\subsection{Furniture}
The main function of the \pdome\ building is serving as a classroom for around
one hundred students. It has wood furniture consisting of chairs and tables, as
well as a wooden stage in the center of the building, meant to be used for
presentations. The building also contains a smaller room, housing all the
necessary technical equipment (cf. Figure~\ref{fig:Google_Maps_Streetview}).
The most accurate way of including information on the furniture in the CARNOT
model would be to manually compute the mass, volume and materials for each of
the chairs, tables, etc.\ but due to the restricted access to the building a
simpler approximation has been made.
\textcite{johraNumericalAnalysisImpact2017} present a methodology to model the
furniture in buildings for different materials, as well as an \textit{equivalent
indoor content material} that is meant to approximate the furniture content of
an office building. These values for mass content, surface area, material
density and thermal conductivity have been taken as the basis for the \pdome\
furniture content approximation, with the assumption that, since the \pdome\ is
still mostly empty, it has approximately a quarter of the furniture present in a
fully furnished office.
The full set of furniture is, therefore, approximated in the CARNOT model as a
wall, with the numerical values for mass, surface, thickness and volume
presented below.
\subsubsection*{Surface}
% surface:
% 1/4 * 1.8 [m2/m2 of floor space] * 625 m2 surface = 140 m2
% 140 m2 = [7 20] m [height width]
The equivalent material is taken to have a surface of 1.8 $\text{m}^2$ per each
$\text{m}^2$ of floor area~\cite{johraNumericalAnalysisImpact2017}. With a floor
area of 625 $\text{m}^2$, and assuming the furnishing of the building is a
quarter that of a fully-furnished office, the \pdome\ furniture equivalent wall
has a surface area of:
\begin{equation}
S_f = \frac{1}{4} \cdot 1.8 \left[\frac{\text{m}^2}{\text{m}^2}\right]
\cdot 625\ \left[\text{m}^2\right] = 281.25 \left[\text{m}^2\right]
\end{equation}
Since a wall has two large faces, the total surface gets divided between them
for a surface of each face of:
\begin{equation}
S_{f, \text{face}} =
\frac{1}{2} \times S_f \approx 140 \left[\text{m}^2\right]
\end{equation}
\subsubsection*{Mass}
% mass:
% 1/4 * 40 [kg/m2 of floor space] * 625 m2 surface = 6250 kg
The mass of the furniture equivalent wall is computed using the same
methodology, considering 40 kg of furniture mass per $\text{m}^2$ of floor
surface.
\begin{equation}
M_f = \frac{1}{4} \cdot 40 \cdot 625\ \left[\text{m}^2\right] = 6250\
\left[\text{m}^2\right]
\end{equation}
\subsubsection*{Volume}
% volume:
% 6250[kg]/600[kg/m3] = 10.41 [m3]
The volume of the furniture equivalent wall is calculated with the help of the
equivalent mass and densities:
\begin{equation}
V_f = \frac{M_f}{\rho_f} = \frac{6250}{600}
\left[\frac{\text{kg}}{\text{kg}/\text{m}^3}\right]
= 10.41\ \left[\text{m}^3\right]
\end{equation}
\subsubsection*{Thickness}
% thickness:
%10.41[m3]/140[m2] = 0.075m = 7.5cm
The last parameter of the furniture equivalent wall is computed by dividing its
volume by the surface:
\begin{equation}
h_f = \frac{V_f}{S_{f, \text{face}}}
= \frac{10.41}{140} \left[\frac{m^3}{m^2}\right]
= 0.075\ \left[\text{m}\right] = 7.5\ \left[\text{cm}\right]
\end{equation}
\subsection{Material properties}
In order to better simulate the behaviour of the real \pdome\ building, it is
necessary to approximate the building materials and their properties as close as
possible. This section goes into details and arguments for the choice of
parameters for each of the CARNOT nodes' properties.
\subsubsection{Windows}
The windows are made of two glass panes of thickness 4mm each.
The values of the heat transfer coefficient (U-factor) can vary greatly for
different window technologies, but an educated guess can be made on the lower
and upper bounds based on the age and location of the building.
Window manufacturers state
the U-factor for double glass windows to be between 2.8 \(\frac{W}{m^2K}\) for
older manufacturing techniques and 1.2 \(\frac{W}{m^2K}\) for newer
models~\cite{WhatAreTypical2018}.
The US Energy Department states that the
typical U-factor values for modern window installations is in the range of 0.2
--- 1.2 \(\frac{W}{m^2K}\)~\cite{GuideEnergyEfficientWindows}.
The European flat glass association claims that the maximum allowable U-factor
value for new window installations in the private sector buildings in
Switzerland is 1.5
\(\frac{W}{m^2K}\)~\cite{glassforeuropeMinimumPerformanceRequirements2018}.
Considering the aforementioned values, and the fact the \pdome\ building was
built in 1993~\cite{nattererModelingMultilayerBeam2008}, the default U-factor of
1.8 \(\frac{W}{m^2K}\) has been deemed appropriate.
The total solar energy transmittance $[g]$ and transmittance in the visible
range $[v_g]$ can vary in the range of 0.1 --- 0.9 depending on manufacturing
technology, tint, etc. The average values of 0.7 and 0.65 respectively have been
chosen for this case.
The values for glass density and heat capacity have been left at their default
values of 2500 \(\frac{kg}{m^3}\) and 1008 \(\frac{J}{kgK}\) respectively.
\subsubsection{Roof and Floor}
%%% Roof
% [5cm wood, 10cm insulation, 5cm wood]
% Conductivity for each material
% Heat capacity for each material
% Density for each material
The roof structure has been assumed to be made out of 10 cm of insulation,
enclosed on each side by 5 cm of wood.
%%% Floor
% [5cm wood, 10cm insulation, 20cm concrete]
% Conductivity for each material
% Heat capacity for each material
% Density for each material
The floor composition has been taken as consisting of, from top to bottom, 5 cm
wood, 10 cm insulation followed by 20 cm of concrete.
All the necessary values to characterise these materials have been taken
from~\cite{BuildingsHeatTransferData} and are presented in
Table~\ref{tab:material_properties}:
\begin{table}[ht]
%\vspace{-8pt}
\centering
\begin{tabular}{||c c c c||}
\hline
Material & Thermal Conductivity $k$ $[\frac{W}{mK}]$ & Specific Heat
Capacity $c$ $[\frac{J}{kgK}]$ & Density $\rho$ $[\frac{kg}{m^3}]$
\\
\hline \hline
Plywood & 0.12 & 1210 & 540 \\
Insulation & 0.03 & 1200 & 40 \\
Concrete & 1.0 & 700 & 2500 \\
\hline
\end{tabular}
\caption{Material properties for roof and floor elements}
\label{tab:material_properties}
\end{table}
\subsection{HVAC parameters}\label{sec:HVAC_parameters}
The \pdome\ is equipped with an \textit{AERMEC RTY-04} \acrshort{hvac} system.
According to the manufacturer's manual~\cite{aermecRoofTopManuelSelection}, this
\acrshort{hvac} houses two compressors of power 11.2 kW and 8.4 kW respectively,
an external ventilator of power 1.67 kW, and a reflow ventilator of power 2 kW.
The unit has a typical \acrlong{eer} (\acrshort{eer}, cooling efficiency) of 4.9
--- 5.1 and a \acrlong{cop} (\acrshort{cop}, heating efficiency) of 5.0, for a
maximum cooling capacity of 64.2 kW.
One particularity of this \acrshort{hvac} unit is that during summer, only one
of the two compressors are running. This results in a higher \acrlong{eer}, in
the cases where the full cooling capacity is not required.
\subsubsection*{Ventilation}
According to the manufacturer manual \cite{aermecRoofTopManuelSelection}, the
\acrshort{hvac} unit's external fan has an air flow ranging between 4900
$\text{m}^3/\text{h}$ and 7000 $\text{m}^3/\text{h}$.
\subsubsection*{Air Exchange Rate}\label{sec:Air_Exchange_Rate}
The air exchange rate, also known as `air changes per hour', represents the
number of times the air in a room gets replaced with new air. It can be
computed by dividing the air flow through the room by the room volume:
\begin{equation}
\text{Air exchange rate} = \frac{\text{Air flow}}{\text{Total volume}}
\end{equation}
In the case of the \pdome\ and its \acrshort{hvac}, this results in an airflow
range of:
\begin{equation}
\begin{aligned}
\text{Air exchange rate} &= \frac{4900}{2650}
- \frac{7000}{2650} &\left[\frac{\text{m}^3/\text{h}}{\text{m}^3}\right]\\
&= 1.85 - 2.64 &\left[\frac{1}{\text{h}}\right]
\end{aligned}
\end{equation}
As it will be shown in Section~\ref{sec:CARNOT_experimental}, the external fan
is continuously running. The \textit{Air exchange rate} has therefore been fixed
to a value of 2.5 for the duration of the whole simulation. The real airflow
could, of course, vary depending on a multitude of external factors, but they
would require more precise measurements to estimate.
\subsection{Validating against experimental data}\label{sec:CARNOT_experimental}
In order to confirm the validity of the model, it is necessary to compare the
CARNOT models' behaviour against that of the real \pdome\ building.
Section~\ref{sec:CARNOT_expdata} presents the available experimental data,
Section~\ref{sec:CARNOT_WDB} details the transformation of the available data
into CARNOT's WDB format, and Section~\ref{sec:CARNOT_comparison} details a
qualitative analysis of the differences.
\subsubsection{The available experimental data}\label{sec:CARNOT_expdata}
All the experimental data used for the validation of the CARNOT model has been
collected previously for another
study~\cite{fabiettiMultitimeScaleCoordination2018}, where it has been used to
identify a State Space model for the \pdome\ building.
The data has been collected in the time span of June to August 2017, and is
divided in seven different experiments, as presented in
Figure~\ref{tab:exp_dates}. The available measurements are the \textit{Outside
Temperature}, \textit{Solar Irradiation}, \textit{Electrical power consumption}
of the \acrshort{hvac}, and two measurements of \textit{Inside Temperature} in
different parts of the room.
\begin{table}[ht]
\centering
\begin{tabular}{||c c c||}
\hline
Exp no. & Start Date & End Date\\
\hline \hline
1 & 01.06.2017 20:00 & 03.06.2017 17:00 \\
2 & 10.06.2017 16:00 & 12.06.2017 06:00 \\
3 & 16.06.2017 20:00 & 19.06.2017 06:00 \\
4 & 19.06.2017 20:00 & 22.06.2017 06:00 \\
5 & 30.06.2017 20:00 & 03.07.2017 06:00 \\
6 & 07.07.2017 20:00 & 10.06.2017 06:00 \\
7 & 13.06.2017 20:00 & 20.06.2017 06:00 \\
\hline
\end{tabular}
\caption{\pdome\ experimental measurements}
\label{tab:exp_dates}
\end{table}
\clearpage
As mentioned previously, the external fan of the \acrshort{hvac} is constantly
running. This can be seen in Figure~\ref{fig:Polydome_electricity} as the
electricity consumption of the \acrshort{hvac} has a baseline of 1.67 kW.
\begin{figure}[ht]
\centering
\includegraphics[width = \textwidth]{Plots/Fan_baseline.pdf}
\caption{Electrical Power consumption of the \pdome\ \acrshort{hvac} for Experiment 7}
\label{fig:Polydome_electricity}
\end{figure}
Figure~\ref{fig:Polydome_electricity} also gives an insight into the workings of
the \acrshort{hvac} when it comes to the combination of the two available
compressors. The instruction manual of the
\acrshort{hvac}~\cite{aermecRoofTopManuelSelection} notes that in summer only
one of the compressors is running. This allows for a larger \acrshort{eer} value
and thus better performance. We can see that this is the case for most of the
experiment, where the power consumption caps at around 6~kW, which is the
maximum power consumption of one compressor. There are, however, moments during
the first part of the experiment where the power momentarily peaks over the 6~kW
limit, and goes as high as around 9~kW. This most probably happens when the
\acrshort{hvac} decides that the difference between the set point temperature
and the actual measured values is too large to compensate with a single
compressor.
Figure~\ref{fig:Polydome_exp7_settemp} presents the values of the set point
temperature and the measured internal temperature.
\begin{figure}[ht]
\centering
\includegraphics[width = \textwidth]{Plots/Exp_settemp.pdf}
\caption{Measured vs set point temperature of the \acrshort{hvac} for Experiment 7}
\label{fig:Polydome_exp7_settemp}
\end{figure}
A few observations can be made on these measurements. First, the second
compressor is indeed turned on during the first part of the experiment, when the
set point differs greatly from the measured temperature. Second, for the
beginning of Experiment 7, as well as the majority of the other experiments, the
set point temperature is the value that gets changed in order to excite the
system, and since the \acrshort{hvac}'s controller is on during identification,
it will oscillate between using one or two compressors. Lastly, it is possible
to notice that the \acrshort{hvac} is not turned on during the night, with the
exception of the external fan, which continues running.
\subsubsection{The CARNOT WDB weather data format}\label{sec:CARNOT_WDB}
For a correct simulation of the building behaviour, CARNOT requires not only the
detailed definition of the building blocks/nodes, but also a very detailed set
of data on the weather conditions. This set includes detailed information on the
sun's position throughout the simulation (zenith and azimuth angles), the
\acrfull{dhi} and \acrfull{dni}, direct and diffuse solar radiation on surface,
as well as information on the ambient temperature, humidity, precipitation,
pressure, wind speed and direction, etc. A detailed overview of each
measurement necessary for a simulation is given in the CARNOT user
manual~\cite{CARNOTManual}.
In order to compare the CARNOT model's performance to that of the real \pdome,
it is necessary to simulate the CARNOT model under the same set of conditions as
the ones present during the experimental data collection. In order to do this,
all the missing values that are required by the simulation have to be filled. In
some cases, such as the sun angles it is possible to compute the exact values,
but in other cases the real data is not available, which means that it has to be
inferred from the available data.
The information on the zenith and azimuth solar angles can be computed exactly
if the position and elevation of the building are known. The GPS coordinates and
elevation information is found using a map~\cite{ElevationFinder}. With that
information available, the zenith, azimuth angles, as well as the \acrfull{aoi}
are computed using the Python pvlib
library~\cite{f.holmgrenPvlibPythonPython2018}.
As opposed to the solar angles, which can be computed exactly from the available
information, the Solar Radiation Components (DHI and DNI) have to be estimated
from the available measurements of GHI, zenith angles (Z) and datetime
information. \textcite{erbsEstimationDiffuseRadiation1982} present an empirical
relationship between GHI and the diffuse fraction DF and the ratio of GHI to
extraterrestrial irradiance $K_t$, known as the Erbs model. The DF is then used
to compute DHI and DNI as follows:
\begin{equation}
\begin{aligned}
\text{DHI} &= \text{DF} \times \text{GHI} \\
\text{DNI} &= \frac{\text{GHI} - \text{DHI}}{\cos{\text{Z}}}
\end{aligned}
\end{equation}
All the other parameters related to solar irradiance, such as the in-plane
irradiance components, in-plane diffuse irradiance from the sky and the ground
are computed using the Python pvlib.
The values that can be neither calculated nor approximated from the available
data, such as the precipitation, wind direction, incidence angles in place of
vertical and main/secondary surface axis, have been replaced with the default
CARNOT placeholder value of -9999. The relative humidity, cloud index, pressure
and wind speed have been fixed to the default CARNOT values of 50\%, 0.5, 96300
Pa, 0 $\text{m}/\text{s}$ respectively.
\subsubsection{\pdome\ and CARNOT model comparison}\label{sec:CARNOT_comparison}
With the \acrshort{wdb} data compiled, we can now turn to simulating the CARNOT
model and compare its behaviour to that of the real \pdome\ building.
Unfortunately, one crucial piece of information is still missing: the amount of
heat that the \acrshort{hvac} either pumps in or takes out of the building at
any point in time. This value could be approximated from the information of
electrical power consumption and the \acrshort{eer}/\acrshort{cop} values given
that it is known if the \acrshort{hvac} is in either heating or cooling mode.
This information lacking in the existing experimental datasets, the heat
supplied/ taken out of the system has been inferred from the electrical energy
use, measured building temperature and \acrshort{hvac} temperature set point,
with the assumption that the \acrshort{hvac} is in cooling mode whenever the
measurements are higher than the set point temperature, and is in heating mode
otherwise. As it can already be seen in Figure~\ref{fig:Polydome_exp7_settemp},
this is a very strong assumption, that is not necessarily always correct. It
works well when the measurements are very different from the set point, as is
the case in the first part of the experiment, but this assumption is false for
the second part of the experiment, where the set point temperature remains fixed
and it is purely the \acrshort{hvac}'s job to regulate the temperature.
\begin{figure}[ht]
\centering
\includegraphics[width = \textwidth]{Plots/CARNOT_comparison_1.pdf}
\includegraphics[width = \textwidth]{Plots/CARNOT_comparison_2.pdf}
\caption{Measured vs CARNOT simulated temperature for the Experimental
Datasets}
\label{fig:CARNOT_simulation_validation}
\end{figure}
The results of the seven simulations are presented in
Figure~\ref{fig:CARNOT_simulation_validation}. Overall, the simulated
temperature has the same behaviour as the real \pdome\ measurements. A more
detailed inspection shows that for most of the experiments, the simulated
temperature is much more volatile than the true measurements. This could be due
to an overestimated value of the Air Exchange Rate, underestimated amount of
furniture in the building or, more probably, miscalculation of the
\acrshort{hvac}'s heating/cooling mode. Of note is the large difference in
behaviour for the Experiments 5 and 6. In fact, for these experiments, the
values for the electrical power consumption greatly differ in shape from the
ones presented in the other datasets, which could potentially mean erroneous
measurements, or some other underlying problem with the data.
Finally, it is possible to conclude that the CARNOT building behaves comparably
to the real \pdome\, even if not perfectly simulates its behaviour. These
differences could come from multiple factors --- missing information that had
to be inferred and/or approximated, such as the Air Exchange Ratio, the heat
provided/extracted, the amount of furniture in the building, the overall shape
and size of the building, as well as possibly errors in the experimental data
used for validation. A more detailed analysis of the building parameters would
have to be done in order to find the reasons and eliminate these discrepancies.
\clearpage

View file

@ -0,0 +1,511 @@
\section{Choice of Hyperparameters}~\label{sec:hyperparameters}
This section will discuss and try to validate the choice of all the
hyperparameters necessary for the training of a \acrshort{gp} model to capture
the CARNOT building's behaviour.
The class of black-box models is very versatile, being able to capture plant
behaviour directly from data. This comes in contrast to white-box and grey-box
modelling techniques, which require much more physical insight into the plant's
behaviour.
On the flip side, this versatility comes at the cost of having to properly
define the model hyperparameters: the number of regressors, the number of
autoregressive lags for each class of inputs, the shape of the covariance
function have to be taken into account when designing a \acrshort{gp} model.
These choices have direct influence on the resulting model behaviour and where
it can be generalized, as well as indirect influence in the form of more time
consuming computations in the case of larger number of regressors and more
complex kernel functions.
As described in Section~\ref{sec:gp_dynamical_system}, for the purpose of this
project, the \acrlong{gp} model will be trained using the \acrshort{narx}
structure. This already presents an important choice in the selection of
regressors and their respective autoregressive lags.
The output of the model has been chosen as the \textit{temperature measured
inside} the CARNOT building. This is a suitable choice for the \acrshort{ocp}
defined in Section~\ref{sec:mpc_problem}, where the goal is tracking as close as
possible the inside temperature of the building.
The input of the \acrshort{gp} model coincides with the input of the CARNOT
building, namely the \textit{heat} passed to the idealized \acrshort{hvac},
which is held constant at each step.
As for the exogenous inputs the choice turned out to be more complex. The CARNOT
\acrshort{wdb} format (cf. Section~\ref{sec:CARNOT_WDB}) consists of information
of all the solar angles, the different components of solar radiation, wind speed
and direction, temperature, precipitation, etc. All of this information is
required in order for CARNOT's proper functioning.
Including all of this information into the \acrshort{gp}s exogenous inputs would
come with a few downsides. First, depending on the number of lags chosen for the
exogenous inputs, the number of inputs to the \acrshort{gp} could be very large,
an exogenous inputs vector of 10 elements with 2 lags would yield 20 inputs for
the \acrshort{gp} model. This is very computationally expensive both for
training and using the model, as its algorithmic complexity is
$\mathcal{O}(n^3)$.
Second, this information may not always available in experimental
implementations on real buildings, where legacy equipment might already be
installed, or where budget restrictions call for simpler equipment. An example
of this are the experimental datasets used for validation of the CARNOT model,
where the only available weather information is the \acrshort{ghi} and the
measurement of the outside temperature. This would also be a limitation when
getting the weather predictions for the next steps during real-world
experiments.
Last, while very verbose information, such as the solar angles and the components
of the solar radiation is very useful for CARNOT which simulates each node
individually knowing their absolute positions, this information would not
always benefit the \acrshort{gp} model at least not comparably to the
additional computational complexity.
For the exogenous inputs the choice has therefore been made to take the
\textit{Global Solar Irradiance}\footnotemark and \textit{Outside Temperature
Measurement}.
\footnotetext{Using the \acrshort{ghi} makes sense in the case of the \pdome\
building which has windows facing all directions, as opposed to a room which
would have windows on only one side.}
\subsection{The Kernel}
The covariance matrix is an important choice when creating the \acrshort{gp}. A
properly chosen kernel can impose a prior desired behaviour on the
\acrshort{gp} such as continuity of the function and its derivatives,
periodicity, linearity, etc. On the flip side, choosing the wrong kernel can
make computations more expensive, require more data to learn the proper
behaviour or outright be numerically unstable and/or give erroneous predictions.
The \acrlong{se} kernel (cf. Section~\ref{sec:Kernels}) is very versatile,
theoretically being able to fit any continuous function given enough data. When
including the \acrshort{ard} behaviour, it also gives an insight into the
relative importance of each regressor, though their respective lengthscales.
Many different kernels have been used when identifying models for building
thermal control, such as a pure Rational Quadratic
Kernel~\cite{pleweSupervisoryModelPredictive2020}, a combination of
\acrshort{se}, \acrshort{rq} and a Linear
Kernel~\cite{jainLearningControlUsing2018}, Squared Exponential Kernel and
Kernels from the M\`atern family~\cite{massagrayThermalBuildingModelling2016}.
For the purpose of this project, the choice has been made to use the
\textit{\acrlong{se} Kernel}, as it provides a very good balance of versatility
and computational complexity for the modelling of the CARNOT building.
\subsection{Lengthscales}\label{sec:lengthscales}
The hyperparameters of the \acrshort{se} can be useful when studying the
importance of regressors. The larger the distance between the two inputs, the
less correlated they are. In fact, setting the kernel variance $\sigma^2$ = 1,
we can compute the correlation of two inputs located at distance one, two and
three lengthscales apart.
\begin{table}[ht]
\centering
\begin{tabular}{||c c ||}
\hline
$\norm{\mathbf{x} - \mathbf{x}'}$ &
$\exp{(-\frac{1}{2}\times\frac{\norm{\mathbf{x} - \mathbf{x}'}^2}{l^2})}$ \\
\hline \hline
$1l$ & 0.606 \\
$2l$ & 0.135 \\
$3l$ & 0.011 \\
\hline
\end{tabular}
\caption{Correlation of inputs relative to their distance}
\label{tab:se_correlation}
\end{table}
From Table~\ref{tab:se_correlation} is can be seen that at 3 lengthscales apart,
the inputs are already almost uncorrelated. In order to better visualize this
difference the value of \textit{relative lengthscale importance} is introduced:
\begin{equation}
\lambda = \frac{1}{\sqrt{l}}
\end{equation}
Another indicator of model behaviour is the variance of the identified
\acrshort{se} kernel. The expected value of the variance is around the variance
of the inputs. An extremely high or extremely low value of the variance could
mean a numerically unstable model.
Table~\ref{tab:GP_hyperparameters} presents the relative lengthscale importances
and the variance for different combinations of the exogenous input lags ($l_w$),
the controlled input lags ($l_u$) and the output lags ($l_y$) for a classical
\acrshort{gp} model.
% TODO: [Lengthscales] Explain lags and w1,1, w2, etc.
\begin{table}[ht]
%\vspace{-8pt}
\centering
\resizebox{\columnwidth}{!}{%
\begin{tabular}{||c c c|c|c c c c c c c c c c c||}
\hline
\multicolumn{3}{||c|}{Lags} & Variance &\multicolumn{11}{c||}{Kernel
lengthscales relative importance} \\
$l_w$ & $l_u$ & $l_y$ & $\sigma^2$ &$\lambda_{w1,1}$ & $\lambda_{w1,2}$ &
$\lambda_{w1,3}$ & $\lambda_{w2,1}$ & $\lambda_{w2,2}$ &
$\lambda_{w2,3}$ & $\lambda_{u1,1}$ & $\lambda_{u1,2}$ &
$\lambda_{y1,1}$ & $\lambda_{y1,2}$ & $\lambda_{y1,3}$\\
\hline \hline
1 & 1 & 1 & 0.11 & 0.721 & & & 2.633 & & & 0.569 & & 2.645 & & \\
1 & 1 & 2 & 22.68 & 0.222 & & & 0.751 & & & 0.134 & & 3.154 & 3.073
& \\ 1 & 1 & 3 & 0.29 & 0.294 & & & 1.303 & & & 0.356 & & 2.352
& 1.361 & 2.045 \\ 1 & 2 & 1 & 7.55 & 0.157 & & & 0.779 & & &
0.180 & 0.188 & 0.538 & & \\ 1 & 2 & 3 & 22925.40 & 0.018 & & &
0.053 & & & 0.080 & 0.393 & 0.665 & 0.668 & 0.018 \\ 2 & 1 & 2 & 31.53
& 0.010 & 0.219 & & 0.070 & 0.719 & & 0.123 & & 3.125 & 3.044 &
\\ 2 & 1 & 3 & 0.44 & 0.007 & 0.251 & & 0.279 & 1.229 & & 0.319
& & 2.705 & 1.120 & 2.510 \\ 3 & 1 & 3 & 0.56 & 0.046 &
0.064 & 0.243 & 0.288 & 1.151 & 0.233 & 0.302 & & 2.809 & 1.086 &
2.689 \\ 3 & 2 & 2 & 1.65 & 0.512 & 0.074 & 0.201 & 0.161 & 1.225
& 0.141 & 0.231 & 0.331 & 0.684 & 0.064 & \\
\hline
\end{tabular}%
}
\caption{GP hyperparameter values for different autoregressive lags}
\label{tab:GP_hyperparameters}
\end{table}
In general, the results of Table~\ref{tab:GP_hyperparameters} show that the past
outputs are important when predicting future values. Of importance is also the
past inputs, with the exception of the models with very high variance, where the
relative importances stay almost constant across all the inputs. For example, it
can be seen that for the exogenous inputs, the outside temperature ($w2$) is
generally more important than the solar irradiation ($w1$). In the case of more
autoregressive lags for the exogenous inputs the more recent information is
usually more important in the case of the solar irradiation, while the
second-to-last measurement is preffered for the outside temperature.
For the classical \acrshort{gp} model the appropriate choice of lags would be
$l_u = 1$ and $l_y = 3$ with $l_w$ taking the values of either 1, 2 or 3,
depending on the results of further analysis.
As for the case of the \acrlong{svgp}, the results for the classical
\acrshort{gp} (cf. Table~\ref{tab:GP_hyperparameters}) are not necessarily
representative of the relationships between the regressors of the
\acrshort{svgp} model, due to the fact that the dataset used for training is
composed of the \textit{inducing variables}, which are not the real data, but a
set of parameters chosen by the training algorithm in a way to best generate the
original data.
Therefore to better understand the behaviour of the \acrshort{svgp} models, the
same computations as in Table~\ref{tab:GP_hyperparameters} have been made,
presented in Table~\ref{tab:SVGP_hyperparameters}:
\begin{table}[ht]
%\vspace{-8pt}
\centering
\resizebox{\columnwidth}{!}{%
\begin{tabular}{||c c c|c|c c c c c c c c c c c||}
\hline
\multicolumn{3}{||c|}{Lags} & Variance &\multicolumn{11}{c||}{Kernel
lengthscales relative importance} \\
$l_w$ & $l_u$ & $l_y$ & $\sigma^2$ &$\lambda_{w1,1}$ & $\lambda_{w1,2}$ &
$\lambda_{w1,3}$ & $\lambda_{w2,1}$ & $\lambda_{w2,2}$ &
$\lambda_{w2,3}$ & $\lambda_{u1,1}$ & $\lambda_{u1,2}$ &
$\lambda_{y1,1}$ & $\lambda_{y1,2}$ & $\lambda_{y1,3}$\\
\hline \hline
1 & 1 & 1 & 0.2970 & 0.415 & & & 0.748 & & & 0.675 & & 0.680 & & \\
1 & 1 & 2 & 0.2717 & 0.430 & & & 0.640 & & & 0.687 & & 0.559 & 0.584 & \\
1 & 1 & 3 & 0.2454 & 0.455 & & & 0.589 & & & 0.671 & & 0.522 & 0.512 & 0.529 \\
1 & 2 & 1 & 0.2593 & 0.310 & & & 0.344 & & & 0.534 & 0.509 & 0.597 & & \\
1 & 2 & 3 & 0.2139 & 0.330 & & & 0.368 & & & 0.537 & 0.447 & 0.563 & 0.410 & 0.363 \\
2 & 1 & 2 & 0.2108 & 0.421 & 0.414 & & 0.519 & 0.559 & & 0.680 & & 0.525 & 0.568 & \\
2 & 1 & 3 & 0.1795 & 0.456 & 0.390 & & 0.503 & 0.519 & & 0.666 & & 0.508 & 0.496 & 0.516 \\
3 & 1 & 3 & 0.1322 & 0.432 & 0.370 & 0.389 & 0.463 & 0.484 & 0.491 & 0.666 & & 0.511 & 0.501 & 0.526 \\
3 & 2 & 2 & 0.1228 & 0.329 & 0.317 & 0.325 & 0.334 & 0.337 & 0.331 &
0.527 & 0.441 & 0.579 & 0.435 & \\
\hline
\end{tabular}%
}
\caption{SVGP hyperparameter values for different autoregressive lags}
\label{tab:SVGP_hyperparameters}
\end{table}
The results of Table~\ref{tab:SVGP_hyperparameters} are not very surprising, even
if very different from the classical \acrshort{gp} case. The kernel variance is
always of a reasonable value, and the relative importance of the lengthscales is
relatively constant across the board. It is certainly harder to interpret these
results as pertaining to the relevance of the chosen regressors. For the
\acrshort{svgp} model, the choice of the autoregressive lags has been made
purely on the values of the loss functions, presented in
Table~\ref{tab:SVGP_loss_functions}.
\subsection{Loss functions}
The most important metric for measuring the performance of a model is the value
of the loss function, computed on a dataset separate from the one used for
training.
There exist a number of different loss functions, each focusing on different
aspects of a model's performance. A selection of loss functions used in
identification of Gaussian Process models is presented
below~\cite{kocijanModellingControlDynamic2016}.
The \acrfull{rmse} is a very commonly used performance measure. As the name
suggests, it computes the root of the mean squared error:
\begin{equation}\label{eq:rmse}
\text{RMSE} = \sqrt{\frac{1}{N}\sum_{i=1}^N \left(y_i -
E(\hat{y}_i)\right)^{2}}
\end{equation}
This performance metric is very useful when training a model whose goal is
solely to minimize the difference between the measured values, and the ones
predicted by the model.
A variant of the \acrshort{mse} is the \acrfull{smse}, which normalizes the
\acrlong{mse} by the variance of the output values of the validation dataset.
\begin{equation}\label{eq:smse}
\text{SMSE} = \frac{1}{N}\frac{\sum_{i=1}^N \left(y_i -
E(\hat{y}_i)\right)^{2}}{\sigma_y^2}
\end{equation}
While the \acrshort{rmse} and the \acrshort{smse} are very good at ensuring the
predicted mean value of the Gaussian Process is close to the measured values of
the validation dataset, the confidence of the Gaussian Process prediction is
completely ignored. In this case, two models predicting the same mean values but
having very different confidence intervals, would be equivalent according to these
performance metrics.
The \acrfull{lpd} is a performance metric which takes into account not only the
mean value of the GP prediction, but the entire distribution:
\begin{equation}
\text{LPD} = \frac{1}{2} \ln{\left(2\pi\right)} + \frac{1}{2N}
\sum_{i=1}^N\left(\ln{\left(\sigma_i^2\right)} + \frac{\left(y_i -
E(\hat{y}_i)\right)^{2}}{\sigma_i^2}\right)
\end{equation}
where $\sigma_i^2$ is the model's output variance at the \textit{i}-th step.
The \acrshort{lpd} scales the error of the mean value prediction $\left(y_i -
E(\hat{y}_i)\right)^{2}$ by the variance $\sigma_i^2$. This means that the
overconfident models get penalized more than the more conservative models for
the same mean prediction error, leading to models that better represent
the real system.
The \acrfull{msll} is obtained by subtracting the loss of the model that
predicts using a Gaussian with the mean $E(\boldsymbol{y})$ and variance
$\sigma_y^2$ of the measured data from the model \acrshort{lpd} and taking the
mean of the obtained result:
\begin{equation}
\text{MSLL} = \frac{1}{2N}\sum_{i=1}^N\left[
\ln{\left(\sigma_i^2\right) + \frac{\left(y_i -
E\left(\hat{y}_i\right)\right)^2}{\sigma_i^2}}
\right] - \frac{1}{2N}\sum_{i=1}^N\left[
\ln{\left(\sigma_y^2\right) + \frac{\left(y_i -
E\left(\boldsymbol{y}\right)\right)^2}{\sigma_y^2}}
\right]
\end{equation}
The \acrshort{msll} is approximately zero for simple models and negative for
better ones.
Table~\ref{tab:GP_loss_functions} and Table~\ref{tab:SVGP_loss_functions}
present the values of the different loss functions for the same lag combinations
as the ones analyzed in Section~\ref{sec:lengthscales} for the classical
\acrshort{gp} and the \acrshort{svgp} models respectively:
\begin{table}[ht]
%\vspace{-8pt}
\centering
\begin{tabular}{||c c c|c c c c||}
\hline
\multicolumn{3}{||c|}{Lags} & \multicolumn{4}{c||}{Loss functions}\\
$l_w$ & $l_u$ & $l_y$ & RMSE & SMSE & MSLL & LPD\\
\hline \hline
1 & 1 & 1 & 0.3464 & 0.36394 & 20.74 & 21.70 \\
1 & 1 & 2 & 0.1415 & 0.06179 & -9.62 & -8.67 \\
1 & 1 & 3 & 0.0588 & 0.01066 & -8.99 & -8.03 \\
1 & 2 & 1 & 0.0076 & 0.00017 & 71.83 & 72.79 \\
1 & 2 & 3 & \textbf{0.0041} & \textbf{0.00005} & 31.25 & 32.21 \\
2 & 1 & 2 & 0.1445 & 0.06682 & -9.57 & -8.61 \\
2 & 1 & 3 & 0.0797 & 0.02033 & -10.94 & -9.99 \\
3 & 1 & 3 & 0.0830 & 0.02219 & \textbf{-11.48} & \textbf{-10.53} \\
3 & 2 & 2 & 0.0079 & 0.00019 & 58.30 & 59.26 \\
\hline
\end{tabular}
\caption{GP Loss function values for different autoregressive lags}
\label{tab:GP_loss_functions}
\end{table}
For the classical \acrshort{gp} model (cf. Table~\ref{tab:GP_loss_functions}) a
number of different lag combinations give rise to models with very large
\acrshort{msll}/\acrshort{lpd} values. This might indicate that those models are
overconfident, either due to the very large kernel variance parameter, or the
specific lengthscales combinations. The model with the best
\acrshort{rmse}/\acrshort{smse} metrics \model{1}{2}{3} had very bad
\acrshort{msll} and \acrshort{lpd} metrics, as well as by far the largest
variance of all the combinations. On the contrary, the \model{3}{1}{3} model has
the best \acrshort{msll} and \acrshort{lpd} performance, while still maintaining
small \acrshort{rmse} and \acrshort{smse} values. The inconvenience of this set
of lags is the large number of regressors, which leads to much more expensive
computations. Other good choices for the combinations of lags are
\model{2}{1}{3} and \model{1}{1}{3}, which have good performance on all four
metrics, as well as being cheaper from a computational perspective. In order to
make a more informed choice for the best hyperparameters, the simulation
performance of all three combinations has been analyzed.
\begin{table}[ht]
%\vspace{-8pt}
\centering
\begin{tabular}{||c c c|c c c c||}
\hline
\multicolumn{3}{||c|}{Lags} & \multicolumn{4}{c||}{Loss functions}\\
$l_w$ & $l_u$ & $l_y$ & RMSE & SMSE & MSLL & LPD\\
\hline \hline
1 & 1 & 1 & 0.3253 & 0.3203 & 228.0278 & 228.9843 \\
1 & 1 & 2 & 0.2507 & 0.1903 & 175.5525 & 176.5075 \\
1 & 1 & 3 & 0.1983 & 0.1192 & 99.7735 & 100.7268 \\
1 & 2 & 1 & 0.0187 & 0.0012 & -9.5386 & -8.5836 \\
1 & 2 & 3 & \textbf{0.0182} & \textbf{0.0011} & \textbf{-10.2739} &
\textbf{-9.3206} \\
2 & 1 & 2 & 0.2493 & 0.1884 & 165.0734 & 166.0284 \\
2 & 1 & 3 & 0.1989 & 0.1200 & 103.6753 & 104.6287 \\
3 & 1 & 3 & 0.2001 & 0.1214 & 104.4147 & 105.3681 \\
3 & 2 & 2 & 0.0206 & 0.0014 & -9.9360 & -8.9826 \\
\hline
\end{tabular}
\caption{SVGP Loss function values for different autoregressive lags}
\label{tab:SVGP_loss_functions}
\end{table}
The results for the \acrshort{svgp} model, presented in
Table~\ref{tab:SVGP_loss_functions} are much less ambiguous. The \model{1}{2}{3}
model has the best performance according to all four metrics, with most of the
other combinations scoring much worse on the \acrshort{msll} and \acrshort{lpd}
loss functions. This has, therefore, been chosen as the model for the full year
simulations.
\subsection{Validation of hyperparameters}\label{sec:validation_hyperparameters}
The validation step has the purpose of testing the viability of the trained
models. If choosing a model according to loss function values on a new dataset
is a way of minimizing the possibility of over fitting the model to the training
data, validating the model by analyzing its multi-step prediction performance
ensures the model was able to learn the correct dynamics and is useful in
simulation scenarios.
The following subsections analyze the performance of the trained \acrshort{gp}
and \acrshort{svgp} models over 20-step ahead predictions. For the \acrshort{gp}
model the final choice of parameters is made according to the simulation
performance. The simulation performance of the \acrshort{svgp} model is compared
to that of the classical models while speculating on the possible reasons for
the discrepancies.
\subsubsection{Conventional Gaussian Process}
The simulation performance of the three lag combinations chosen for the
classical \acrlong{gp} models has been analyzed, with the results presented in
Figures~\ref{fig:GP_113_multistep_validation},~\ref{fig:GP_213_multistep_validation}
and~\ref{fig:GP_313_multistep_validation}. For reference, the one-step ahead
predictions for the training and test datasets are presented in
Appendix~\ref{apx:hyperparams_gp}.
\begin{figure}[ht]
\centering
\includegraphics[width =
\textwidth]{Plots/GP_113_-1pts_test_prediction_20_steps.pdf}
\vspace{-25pt}
\caption{20-step ahead simulation for \model{1}{1}{3}}
\label{fig:GP_113_multistep_validation}
\end{figure}
In the case of the simplest model (cf.
Figure~\ref{fig:GP_113_multistep_validation}), overall the predictions are quite
good. The large deviation from true values starts happening at around 15 steps.
This could impose an additional limit on the size of the control horizon of the
\acrlong{ocp}.
\begin{figure}[ht]
\centering
\includegraphics[width =
\textwidth]{Plots/GP_213_-1pts_test_prediction_20_steps.pdf}
\vspace{-25pt}
\caption{20-step ahead simulation for \model{2}{1}{3}}
\label{fig:GP_213_multistep_validation}
\end{figure}
The more complex model, presented in
Figure~\ref{fig:GP_213_multistep_validation} has a much better prediction
performance, with only two predictions out of a total of twenty five diverging
at the later steps. Except for the late-stage divergence on the two predictions,
this proves to be the best simulation model.
\begin{figure}[ht]
\centering
\includegraphics[width =
\textwidth]{Plots/GP_313_-1pts_test_prediction_20_steps.pdf}
\vspace{-25pt}
\caption{20-step ahead simulation for \model{3}{1}{3}}
\label{fig:GP_313_multistep_validation}
\end{figure}
Lastly, \model{3}{1}{3} (cf. Figure~\ref{fig:GP_313_multistep_validation}) has a
much worse simulation performance than the other two models. This could hint at
an over fitting of the model on the training data.
\clearpage
This is consistent with the results found in Table~\ref{tab:GP_loss_functions}
for the \acrshort{rmse} and \acrshort{smse}, as well as can be seen in
Appendix~\ref{apx:hyperparams_gp}, Figure~\ref{fig:GP_313_test_validation},
where \model{3}{1}{3} has much worse performance on the testing dataset predictions
than the other two models.
The performance of the three models in simulation mode is consistent with the
previously found results. It is of note that neither the model that scored the
best on the \acrshort{rmse}/\acrshort{smse}, \model{1}{2}{3}, nor the one that
had the best \acrshort{msll}/\acrshort{lpd}, perform the best under a simulation
scenario. In the case of the former it is due to numerical instability, the
training/ prediction often failing depending on the inputs. On the other hand,
in the case of the latter, only focusing on the \acrshort{msll}/\acrshort{lpd}
performance metrics can lead to very conservative models, that give good and
confident one-step ahead predictions, while still unable to fit the true
behaviour of the plant.
Overall, the \model{2}{1}{3} performed the best in the simulation scenario,
while still having good performance on all loss functions. In implementation,
however, this model turned out to be very unstable, and the more conservative
\model{1}{1}{3} model was used instead.
\subsubsection{Sparse and Variational Gaussian Process}
For the \acrshort{svgp} models, only the performance of \model{1}{2}{3} was
investigated, since it had the best performance according to all four loss
metrics.
As a first validation step, it is of note that the \acrshort{svgp} model was
able to accurately reproduce the training dataset with only 150 inducing
locations (cf. Appendix~\ref{apx:hyperparams_svgp}). It also performs about as
well as the better \acrshort{gp} models for the one step prediction on the
testing datasets.
In the case of the simulation performance, presented in
Figure~\ref{fig:SVGP_multistep_validation}, two things are of particular
interest. First, all 25 simulations have good overall behaviour --- there are no
simulations starting to exhibit erratic behaviour --- this is a good indicator
for lack of over fitting. This behaviour is indicative of a more conservative
model than the ones identified for the \acrshort{gp} models. It is also possible
to conclude that given the same amount of data, the classical \acrshort{gp}
models can better learn plant behaviour, provided the correct choice of
regressors.
\begin{figure}[ht]
\centering
\includegraphics[width =
\textwidth]{Plots/SVGP_123_test_prediction_20_steps.pdf}
\caption{20-step ahead simulation for \model{1}{2}{3}}
\label{fig:SVGP_multistep_validation}
\end{figure}
\clearpage

View file

@ -0,0 +1,50 @@
\section{The MPC Problem}\label{sec:mpc_problem}
The \acrlong{ocp} to be solved was chosen in such a way as to make the
analysis of the models' performances more straightforward. The objective is
tracking a defined reference temperature as close as possible, while ensuring
the heat input stays within the HVAC capacity. The \textit{zero-variance} method
is used for multi-step prediction when using an existing \acrshort{gp}. This
does ignore the propagation of uncertainty for multi step ahead prediction, but
even with its simplicity, this method has been proven to work
~\cite{kocijanModellingControlDynamic2016,jainLearningControlUsing2018,
pleweSupervisoryModelPredictive2020}.
The optimization problem is therefore defined as follows:
\begin{subequations}\label{eq:optimal_control_problem}
\begin{align}
& \text{minimize}
& & \sum_{i=0}^{N-1} (\bar{y}_{t+i} - y_{ref, t})^2 \\
& \text{subject to}
& & \bar{y}_{t+i} = K_*K^{-1}\mathbf{x}_{t+i-1} \\
&&& \mathbf{x}_{t+i-1} = \left[\mathbf{w}_{t+i-1},\quad
\mathbf{u}_{t+i-1},\quad \mathbf{y}_{t+i-1}\right]^T \\
\label{eq:components}
&&& u_{t+i} \in \mathcal{U}
\end{align}
\end{subequations}
where $y_{ref, t}$ is the reference temperature at time t, $\mathcal{U}$ is the
set of allowed inputs, $\mathbf{x}_{t}$ is the GP input vector at time t,
composed of the exogenous autoregressive inputs $\mathbf{w}_{t}$, the
autoregressive controlled inputs $\mathbf{u}_{t}$ and the autoregressive outputs
$\mathbf{y}_{t}$.
\subsection{Temperature reference}\label{sec:reference_temperature}
The temperature reference for the controller has been taken as the mean value of
the SIA~180:2014~\cite{sia180:2014ProtectionThermiqueProtection2014} temperature
norm. It imposes a range of temperatures that are allowed for residential
buildings based on the rolling 48h average outside temperature.
\begin{figure}[ht]
\centering
\includegraphics[width = \textwidth]{Images/sia_180_2014.png}
\caption{The SIA 180:2014 norm for residential building
temperatures~\cite{sia180:2014ProtectionThermiqueProtection2014}}
\label{fig:sia_temperature_norm}
\end{figure}
\clearpage

View file

@ -0,0 +1,201 @@
\section{Implementation}\label{sec:implementation}
This section goes into the details of the implementation of the Simulink plant
and Python controller setup.
A high-level view of the setup is presented in Figure~\ref{fig:setup_diagram}.
The Simulink model's main responsibility is running the CARNOT simulation. It
also has the task of providing the \acrshort{mpc} with information on the
weather forecast, since the weather information for the simulation comes from a
CARNOT \acrshort{wdb} object. A detailed view of all the information available
in the \acrshort{wdb} object is given in Section~\ref{sec:CARNOT_WDB}.
\begin{figure}[ht]
\centering
\includegraphics[width = 0.5\textwidth]{Images/setup_diagram.pdf}
\caption{Block diagram of the Simulink plant and Python Controller}
\label{fig:setup_diagram}
\end{figure}
\subsection{Simulink Model}
The secondary functions of the Simulink model is the weather prediction, as well
as communication with the Python controller. A complete schematic of the
Simulink setup is presented in Figure~\ref{fig:Simulink_complete}.
\begin{figure}[ht]
\centering
\includegraphics[width = 0.75\textwidth]{Images/polydome_python.pdf}
\caption{Simulink diagram of the Complete Simulation}
\label{fig:Simulink_complete}
\end{figure}
The communication between Simulink and the controller is done using three
separate TCP/IP sockets: one for sending the control signal, one for reading the
temperature measurement, and one for reading the weather forecast. This is
mainly due to a Simulink limitation which can only transfer one signal over a
single socket. This implementation also has the benefit of providing an
additional layer of abstraction for the controller and the controlled plant: as
long as the measurements, the actuators and the weather prediction can
communicate over TCP, these elements can all be implemented completely
separately, which is much more similar to a real-life implementation.
With this structure, the only information received and sent by the Python
controller is the actual sampled data, without any additional information.
While the controller needs information on the control horizon in order to read
the correct amount of data for the weather predictions and to properly generate
the optimization problem, the discrete/continuous transition and vice-versa
happens on the Simulink side. This simplifies the adjustment of the sampling
time, with the downside of harder inclusion of meta-data such as hour of the
day, day of the week, etc.\ in the \acrlong{gp} Model.
The weather prediction is done using the information present in the CARNOT
\acrshort{wdb} object. Since the sampling time and control horizon of the
controller can be adjusted, the required weather predictions can lie within an
arbitrary time interval. At each sampling point, the weather measurements are
piece-wise linearly interpolated for the span of time ranging from the most
recent measurement to the next measurement after the last required prediction
time. This provides a better approximation than pure linear interpolation over
the starting and ending points, while retaining a simple implementation.
\subsection{Gaussian Processes}
As described in Section~\ref{sec:gaussian_processes}, both training and
evaluating a \acrshort{gp} has an algorithmic complexity of $\mathcal{O}(n^3)$.
This means that naive implementations can get too expensive in terms of
computation time very quickly.
In order to have as smallest of a bottleneck as possible when dealing with
\acrshort{gp}s, a very fast implementation of \acrlong{gp} Models was used, in
the form of GPflow~\cite{matthewsGPflowGaussianProcess2017}. It is based on
TensorFlow~\cite{tensorflow2015-whitepaper}, which has very efficient
implementation of all the necessary Linear Algebra operations. Another benefit
of this implementation is the very simple use of any additional computational
resources, such as a GPU, TPU, etc.
\subsubsection{Classical Gaussian Process training}
For the training of the classical \acrshort{gp} models, the Scipy optimizer
provided by GPflow was used. By default, it uses the `L-BFGS-B' optimization
method, which runs until a local minimum of the negative log likelihood is
found. Since in the present implementation the \acrshort{gp} models are trained
only once, with a small amount of initial data, this approach was not only
sufficient, but also faster than the more complex implementation of training
used for \acrshort{svgp} models.
\subsubsection{Sparse and Variational Gaussian Process training}
The \acrshort{svgp} models have a more involved optimization procedure due to to
several factors. First, when training an \acrshort{svgp} model, the optimization
objective is the value of the \acrshort{elbo} (cf. Section~\ref{sec:elbo}).
After several implementations, the more complex \textit{Adam} optimizer turned
out to provide much faster convergence compared to other optimizers. Second, in
the case of updating the model once per day with all the historical information,
the training dataset keeps getting larger each time. In order to combat this,
the sparse model was trained on minibatches of 1000 datapoints for 10000
iterations. Evaluating the \acrshort{elbo} on minibatches provide an unbiased
estimate of the actual value, given enough training iterations. This
implementation has the benefit of taking constant training time, which becomes
important later in the simulation, where the training dataset is much larger
than the initial amount of data.
\subsection{Optimal Control Problem}
The \acrlong{ocp} has been implemented using the
CasADi~\cite{anderssonCasADiSoftwareFramework2019} algorithmic differentiation
framework. It provides an interface between a high-level definition of the
optimization problems, and the very efficient low-level solvers.
The implementation of the \acrshort{ocp} defined in
Equation~\ref{eq:optimal_control_problem} has a few particularities, discussed
in the following subsections.
\subsubsection{Integrating GPflow models in CasADi}
The first particularity of the implementing the \acrshort{ocp} with CasADi is
the use of the CasADi callback objects. The purpose of these objects is
integration of external functions into CasADi. Generally, using callbacks is not
advised because each call to the external function incurs additional overhead.
These callbacks usually can't directly provide information on the
forward/reverse sensitivities, used by CasADi to drastically reduce computation
time. In the absence of this information, CasADi has to do many evaluations
around a point in order to approximate the gradients. TensorFlow keeps track of
all the computational graph's gradients, which can be accessed at a cost
slightly higher than the evaluation cost of the function.
Integrating the gradient information into the CasADi callbacks reduces the
number of function calls by around an order of magnitude, which already
drastically reduces computation time.
Another significant speed improvement comes from transforming the Python calls
to TensorFlow into native tf-functions. This change incurs a small overhead the
first time the optimization problem is run, since all the TensorFlow functions
have to be compiled before execution, but afterwards speeds up the execution by
around another order of magnitude.
The last optimization done to the CasADi implementation is the use of the MA27
linear solver provided by the HSL optimization
library~\cite{HSLCollectionFortran}. This change results in an speedup of around
10\% compared to using the default MUMPS solver. While not as drastic as the
other improvements, this still provides a significant reduction in the runtime
of the whole year simulation.
\subsubsection{Sparse Implementation of the Optimization Problem}
The optimization problem as presented in
Equation~\ref{eq:optimal_control_problem} becomes very nonlinear quite fast. In
fact, due to the autoregressive structure of the \acrshort{gp}, the predicted
temperature at time t is passed as an input to the model at time $t+1$. A simple
recursive implementation of the Optimization Problem becomes intractable after
only 3~---~4 prediction steps.
In order to solve this problem, a new OCP is introduced. It has a much sparser
structure, in exchange for a larger number of variables. This turns out to be
much faster to solve than the original problem.
Let $w_l$, $u_l$, and $y_l$ be the lengths of the state vector components
$\mathbf{w}$, $\mathbf{u}$, $\mathbf{y}$ (cf. Equation~\ref{eq:components}).
Also, let X be the matrix of all the system states over the optimization horizon
and W be the matrix of the predicted disturbances for all the future steps. The
original \acrlong{ocp} can be rewritten using index notation as:
\begin{subequations}\label{eq:sparse_optimal_control_problem}
\begin{align}
& \text{minimize}
& & \sum_{i=2}^{N + 1} \left(X[i, w_l + u_l + 1] - y_{ref, t}\right)^2 \\
& \text{subject to}
& & X[i+1, w_l + u_l + 1] = K_*K^{-1}X[i, :] \quad \text{for} \quad
i\in[1, N]\\
&&& X[i, w_l + u_l + 2: ] = X[i, w_l+ u_l + 1: w_l + u_l + y_l - 1]\\
&&& X[i, 1:w_l] = W[i, :] \\
&&& X[i+1, w_l + 2: w_l + u_l] = X[i, w_l + 1: w_l + u_l - 1] \\
&&& X[:, w_l + 1] \in \mathcal{U}
\end{align}
\end{subequations}
\subsection{Model Identification and Update using a Python server}
The Python server is responsible for the control part of the simulation. It
delegates which controller is active, is responsible for training and updating
the \acrshort{gp} and \acrshort{svgp} models, as well as keeping track of all
the intermediate results for analysis.
In the beginning of the experiment there is no information available on the
building's thermal behaviour. For this part of the simulation, the server
switches to a \acrshort{pi} controller with random disturbances until it gathers
enough data to train a \acrshort{gp} model. This ensures that the building is
sufficiently excited to capture its dynamics, while maintaining the temperature
within an acceptable range (~15 --- 25 $\degree$C).
Once enough data has been captured all the features are first scaled to the
range [-1, 1] in order to reduce the possibility of numerical instabilities. The
Python server then trains the \acrshort{gp} model and switches to tracking the
appropriate SIA 180:2014 reference temperature (cf.
Section~\ref{sec:reference_temperature}).
For the case of the \acrshort{svgp}, a new model is trained once enough data is
gathered. The implementations tested were updated once a day, either on the
whole historical set of data, or on a window of the last five days of data.
\clearpage

485
Sections/80_Results.tex Normal file
View file

@ -0,0 +1,485 @@
\section{Full-year Simulation Results}\label{sec:results}
This section focuses on the presentation and interpretation of the year-long
simulation of the control schemes presented previously. All the control schemes
analyzed in this Section have used a sampling time of 15 minutes and a control
horizon of 8 steps.
Section~\ref{sec:GP_results} analyzes the results of a conventional
\acrlong{gp} Model trained on the first five days of gathered data. The model
is then used for the rest of the year, with the goal of tracking the defined
reference temperature.
Section~\ref{sec:SVGP_results} goes into details on the analysis of the learning
scheme using a \acrshort{svgp} Model. In this scenario, the model is first
trained on the first five days of data, and updates every day at midnight with
the new information gathered from closed-loop operation.
\subsection{Conventional Gaussian Processes}\label{sec:GP_results}
The first simulation, to be used as a baseline comparison with the
\acrshort{svgp} Models developed further, consists of using a `static'
\acrshort{gp} model trained on five days worth of experimental data. This model
is then employed for the rest of the year.
With a sampling time of 15 minutes, the model is trained on 480 points of data.
This size of the identification dataset is enough to learn the behaviour of the
plant, without being too complex to solve from a numerical perspective. The
current implementation takes roughly 1.5 seconds of computation time per step.
For reference, identifying a model on 15 days worth of experimental data (1440
points) makes simulation time approximately 11 --- 14 seconds per step, or
around eight time slower. This is consistent with the $\mathcal{O}(n^3)$
complexity of evaluating a \acrshort{gp}.
The results of the simulation are presented in
Figure~\ref{fig:GP_fullyear_simulation}. Overall, the performance of this model
is not very good. The tracked temperature presents an offset of around 0.5
$\degree$C in the stable part of the simulation. The offset becomes much larger
once the reference temperature starts moving from the initial constant value.
The controller becomes completely unstable around the middle of July, and can
only regain some stability at the middle of October. It is also possible to note
that from mid-October to end-December the controller has very similar
performance to that exhibited in the beginning of the year, namely January to
end-February.
\begin{figure}[ht]
\centering
\includegraphics[width =
\textwidth]{Plots/4_GP_480pts_12_averageYear_fullyear.pdf}
\caption{GP full year simulation}
\label{fig:GP_fullyear_simulation}
\end{figure}
This very large difference in performance could be explained by the change in
weather during the year. The winter months of the beginning and end of the year
exhibit similar performance. The spring months already make the controller less
stable than at the start of the year, while the drastic temperature changes in
the summer make the controller completely unstable.
Figure~\ref{fig:GP_fullyear_abserr} presents the absolute error measured at each
step of the simulation over the course of the year. We can note a mean absolute
error of 1.33 $\degree$C, with the largest deviations occurring in late summer
where the absolute error can reach extreme values, and the `best' performance
occurring during the winter months.
\begin{figure}[ht]
\centering
\includegraphics[width =
\textwidth]{Plots/4_GP_480pts_12_averageYear_abserr.pdf}
\caption{GP full year absolute error}
\label{fig:GP_fullyear_abserr}
\end{figure}
Figure~\ref{fig:GP_first_model_performance} analyzes the 20-step ahead
simulation performance of the identified model over the course of the year. At
experimental step 250, the controller is still gathering data. It is therefore
expected that the identified model will be capable of reproducing this data. At
step 500, 20 steps after identification, the model correctly steers the internal
temperature towards the reference temperature. On the flip side, already at
experimental steps 750 and 1000, only 9 days into the simulation, the model is
unable to properly simulate the behaviour of the plant, with the maximum
difference at the end of the simulation reaching 0.75 $\degree$C and 1.5
$\degree$C respectively.
\begin{figure}[ht]
\centering
\includegraphics[width =
\textwidth]{Plots/4_GP_480pts_12_averageYear_first_model_performance.pdf}
\caption{GP model performance}
\label{fig:GP_first_model_performance}
\end{figure}
This large difference of performance could be explained by the change in outside
weather (solar irradiance and outside temperature --- the exogenous inputs) from
the one present during the training phase. It can be seen in
Figure~\ref{fig:Dataset_outside_temperature} that already at 500 points in the
simulation both the GHI and the Outside Temperature are outside of the training
ranges, with the latter exhibiting a much larger variation.
\begin{figure}[ht]
\centering
\includegraphics[width =
\textwidth]{Plots/Exogenous_inputs_fullyear.pdf}
\caption{Exogenous inputs for the simulation}
\label{fig:Dataset_outside_temperature}
\end{figure}
Finally, it is possible to conclude that this approach does not perform well due
to several causes:
\begin{itemize}
\item The size of the training dataset is limited by the computation budget
\item The model does not extrapolate correctly the information on
disturbances
\item The model stays fixed for the duration of the year, being unable to
adapt to new weather conditions.
\end{itemize}
These problems could be solved in several ways, such as periodically
re-identifying the model to fit the current weather pattern. This approach would
be quite cumbersome due to repeated need of disturbing the model in order to
sufficiently excite it. Another approach would be to keep the whole historical
dataset of measurements, which quickly renders the problem intractable. More
complex solutions, such as keeping a fixed-size data dictionary whose points are
deleted when they no longer help the predictions and new points are added as
they are deemed useful or compiling the training dataset with multiple
experiments in different weather conditions could dramatically improve model
performance, but are more complex in implementation.
\subsection{Sparse and Variational Gaussian Process}\label{sec:SVGP_results}
The \acrlong{svgp} models are setup in a similar way as described before. The
model is first identified using 5 days worth of experimental data collected
using a \acrshort{pi} controller and a random disturbance signal. The difference
lies in the fact than the \acrshort{svgp} model gets re-identified every night
at midnight using the newly accumulated data from closed-loop operation.
The results of this setup are presented in
Figure~\ref{fig:SVGP_fullyear_simulation}. It can already be seen that this
setup performs much better than the initial one. The only large deviations from
the reference temperature are due to cold weather, when the \acrshort{hvac}'s
limited heat capacity is unable to maintain the proper temperature.
Additionnaly, the \acrshort{svgp} controller takes around 250 - 300ms of
computation time for each simulation time, decreasing the computational cost of
the original \acrshort{gp} by a factor of six.
\begin{figure}[ht]
\centering
\includegraphics[width =
\textwidth]{Plots/1_SVGP_480pts_inf_window_12_averageYear_fullyear.pdf}
\caption{SVGP full year simulation}
\label{fig:SVGP_fullyear_simulation}
\end{figure}
\clearpage
Comparing the absolute error of the measured vs reference temperature for the
duration of the experiment (cf. Figure~\ref{fig:SVGP_fullyear_abserr}) with the
one of the original experiment, the average absolute error is reduced from 1.33
$\degree$C to only 0.05 $\degree$C, with the majority of the values being lower
than 0.4 $\degree$ C.
\begin{figure}[ht]
\centering
\includegraphics[width =
\textwidth]{Plots/1_SVGP_480pts_inf_window_12_averageYear_abserr.pdf}
\caption{SVGP full year absolute error}
\label{fig:SVGP_fullyear_abserr}
\end{figure}
Figures~\ref{fig:SVGP_first_model_performance},
~\ref{fig:SVGP_later_model_performance}
and~\ref{fig:SVGP_last_model_performance} show the 20-step simulation performance of three
different models, identified at three different stages of the experiment. They
have all been set to simulate 25 consecutive experimental steps starting at
steps 250, 500, 10750 and 11000 respectively.
The initial model (cf. Figure~\ref{fig:SVGP_first_model_performance}),
identified after the first five days has the worst performance. It is unable to
correctly simulate even the learning dataset. This behaviour is similar to that
discovered in Figure~\ref{fig:SVGP_multistep_validation}
(cf. Section~\ref{sec:validation_hyperparameters}), where the \acrshort{svgp}
model performed worse than the equivalent \acrshort{gp} trained on the same
dataset. It also performs worse than the initial \acrshort{gp} model in the rest
of the simulations, being unable to correctly predict the heating to reference
at step 500, and having maximum errors of around 10 $\degree$C for the simulations
starting at 107500 and 11000 points.
\begin{figure}[ht]
\centering
\includegraphics[width =
\textwidth]{Plots/1_SVGP_480pts_inf_window_12_averageYear_first_model_performance.pdf}
\caption{SVGP first model performance}
\label{fig:SVGP_first_model_performance}
\end{figure}
\clearpage
Figure~\ref{fig:SVGP_later_model_performance} shows the performance of the 100th
trained model (i.e the model trained on April 15). This model performs much
better in all simulations. It is able to correctly simulate the 20-step
behaviour of the plant over all the experimental steps in the first two cases.
It still has a noticeable error when predicting the behaviour of the plant on
new data (i.e. simulations starting at steps 10750 and 11000), but it is much
less than before. This gives a hint at the fact that the \acrshort{svgp} model's
performance improves throughout the year, but it does require much more data
than the classical \acrshort{gp} model to capture the building dynamics.
\begin{figure}[ht]
\centering
\includegraphics[width =
\textwidth]{Plots/1_SVGP_480pts_inf_window_12_averageYear_later_model_performance.pdf}
\caption{SVGP later model performance}
\label{fig:SVGP_later_model_performance}
\end{figure}
The last model is trained on the whole-year dataset.
Figure~\ref{fig:SVGP_last_model_performance} shows its performance for the same
situation described before. The model is able to predict the plant's behaviour
at steps 250 and 500 even better than before, as well as predict the behaviour
at steps 10750 and 11000 with maximum error of 0.6 $\degree$C and 0.1 $\degree$C
respectively.
\begin{figure}[ht]
\centering
\includegraphics[width =
\textwidth]{Plots/1_SVGP_480pts_inf_window_12_averageYear_last_model_performance.pdf}
\caption{SVGP last model performance}
\label{fig:SVGP_last_model_performance}
\end{figure}
The analysis of the model evolution as more data gets gathered already gives
very good insight into the strengths and weaknesses of this approach. The
initial model is unable to correctly extrapolate the plant's behaviour in new
regions of the state space. Also, given the same amount of data, the
\acrshort{svgp} model is able to capture less information about the plant
dynamics than the equivalent \acrshort{gp} model. On the flip side, re-training
the model every day with new information is able to mitigate this by adding the
data in new regions as it gets discovered while being able to maintain constant
training and evaluation cost.
A more in depth analysis of the evolution of the \acrshort{svgp} hyperparameters
over the duration of the experiment is presented in
Section~\ref{sec:lengthscales_results}.
A few questions arise naturally after investigating the performance of this
control scheme:
\begin{itemize}
\item If the model is able to correctly understand data gathered in
closed-loop operation, will the performance deteriorate drastically if
the first model is trained on less data?
\item How much information can the model extract from closed-loop operation?
Would a model trained on a window of only the last five days of
closed-loop operation data be able to perform correctly?
\end{itemize}
These questions will be further analysed in the Sections~\ref{sec:svgp_96pts}
and~\ref{sec:svgp_window} respectively.
\clearpage
\subsubsection{Lengthscales}\label{sec:lengthscales_results}
Figure~\ref{fig:SVGP_evol_importance} provides a deeper insight into the
evolution of the relative importance of the \acrshort{svgp} regressors over the
course of the full-year simulation\footnotemark. A few remarks are immediate:
the importance of most hyperparameters changes drastically the first few
iterations, until reaching a more steady change pace, until around the month of
July where most of the hyperparameters settle for the rest of the simulation.
This behaviour could be explained by the model learning new regions of the state
space (i.e the span of the \acrshort{ghi} and Outside Temperatures) over the
first months as these values change, and remaining more constant once it has
already gathered information on these different operating points.
\footnotetext{The evolution of the \textit{hyperparameters} is provided for
reference in Annex~\ref{anx:hyperparams_evol}.}
\begin{figure}[ht]
\centering
\includegraphics[width =
\textwidth]{Plots/1_SVGP_480pts_inf_window_12_averageYear_evol_importance.pdf}
\caption{Evolution of SVGP model parameters}
\label{fig:SVGP_evol_importance}
\end{figure}
As seen in Figure~\ref{fig:SVGP_evol_importance}, the variance of the
\acrshort{se} kernel steadily decreases, until reaching a plateau, which
signifies the increase in confidence of the model. The hyperparameters
corresponding to the exogenous inputs --- $w1,1$ and $w1,2$ --- become generally
less important for future predictions over the course of the year, with the
importance of $w1,1$, the \acrlong{ghi}, climbing back up over the last, colder
months. This might be due to the fact that during the colder months, the
\acrshort{ghi} is the only way for the exogenous inputs to \textit{provide}
additional heat to the system.
A similar trend can be observed for the evolution of the input's
hyperparameters, with the exception that the first lag of the controlled input
($u1,1$) remains the most important over the course of the year.
For the lags of the measured output it can be seen that, over the course of the
year, the importance of the first lag decreases, while that of the second and
third lag increase --- until all three reach relatively similar values.
Another interesting comparison is provided by looking at the possible values of
the \acrshort{se} kernel components. Since all the values are normalized within
the -1 to 1 range, it is unlikely that any two points will be a distance higher
than 2 apart. It is possible then to plot the values of the kernel terms due to
each regressor as a function of their distance. This is done in
Figure~\ref{fig:SVGP_first_covariance} for the first identified model and in
Figure~\ref{fig:SVGP_last_covariance} for the last. It is clear that in both
cases the kernel terms behave mostly linearly, with the exception of two points
being close to each other, when the correlation remains stronger before it
starts diminishing.
\begin{figure}[ht]
\centering
\includegraphics[width =
\textwidth]{Plots/1_SVGP_480pts_inf_window_12_averageYear_first_covariance.pdf}
\caption{SVGP model first covariance parameters}
\label{fig:SVGP_first_covariance}
\end{figure}
As for the last model, it can be noted that only the scale of the kernel terms
changes, with their shape remaining consistent with the first identified model.
This means that the model does not get much more complex as the data is
gathered, but instead the same general structure is kept, with further
refinements being done as data is added to the system.
\clearpage
\begin{figure}[ht]
\centering
\includegraphics[width =
\textwidth]{Plots/1_SVGP_480pts_inf_window_12_averageYear_last_covariance.pdf}
\caption{SVGP model last covariance parameters}
\label{fig:SVGP_last_covariance}
\end{figure}
One question that could be addressed given these mostly linear kernel terms is
how well would an \acrshort{svgp} model perform with a linear kernel.
Intuition would hint that it should still be able to track the reference
temperature, albeit not as precisely due to the correlation that diminishes much
slower when the two points are closer together in the \acrshort{se} kernel. This
will be further investigated in Section~\ref{sec:svgp_linear}.
\clearpage
\subsection{SVGP with one day of starting data}\label{sec:svgp_96pts}
As previously discussed in Section~\ref{sec:SVGP_results}, the \acrshort{svgp}
model is able to properly adapt given new information, overtime refining its
understanding of the plant's dynamics.
Analyzing the results of a simulation done on only one day's worth of initial
simulation data (cf. Figures~\ref{fig:SVGP_96pts_fullyear_simulation}
and~\ref{fig:SVGP_96pts_abserr}) it is very notable that the model performs
almost identically to the one identified in the previous sections. This
highlights one of the practical benefits of the \acrshort{svgp} implementations
compared to the classical \acrlong{gp} -- it is possible to start with a rougher
controller trained on less data and refine it over time, reducing the need for
cumbersome and potentially costly initial experiments for gathering data.
\begin{figure}[ht]
\centering
\includegraphics[width =
\textwidth]{Plots/6_SVGP_96pts_inf_window_12_averageYear_fullyear.pdf}
\caption{One Day SVGP full year simulation}
\label{fig:SVGP_96pts_fullyear_simulation}
\end{figure}
\begin{figure}[ht]
\centering
\includegraphics[width =
\textwidth]{Plots/6_SVGP_96pts_inf_window_12_averageYear_abserr.pdf}
\caption{One Day SVGP Absolute Error}
\label{fig:SVGP_96pts_abserr}
\end{figure}
\clearpage
\subsection{SVGP with a five days moving window}\label{sec:svgp_window}
This section presents the result of running a different control scheme. Here, as
was the case for the base \acrshort{svgp} model, it is first trained on 5 days
worth of data, with the difference being that each new model is only identified
using the last five days' worth of data. This should provide an insight on
whether the \acrshort{svgp} model is able to understand model dynamics only
based on closed-loop operation.
\begin{figure}[ht]
\centering
\includegraphics[width =
\textwidth]{Plots/5_SVGP_480pts_480pts_window_12_averageYear_fullyear.pdf}
\caption{Windowed SVGP full year simulation}
\label{fig:SVGP_480window_fullyear_simulation}
\end{figure}
As it can be seen in Figure~\ref{fig:SVGP_480window_fullyear_simulation}, this
model is unable to exhaustively track the reference temperature. In fact, five days
after the identification, the model forgets all the initial data and becomes
unstable. This instability then generates enough excitation of the plant for the
model, to again learn its behaviour. This cycle repeats every five days, when the
controller becomes unstable. In the stable regions, however, the controller is
able to track the reference temperature.
\clearpage
\subsection{SVGP with Linear Kernel}\label{sec:svgp_linear}
The last model to be investigated is the \acrshort{svgp} with Linear Kernel. As
it was suggested previously, the terms of the originally identified
\acrshort{svgp} model are not very complex, leading to the question whether a
pure linear kernel could suffice to understand the plant's behaviour.
Figure~\ref{fig:SVGP_linear_fullyear_simulation} shows the results of the
full-year simulation. While this controller is still able to track the reference
temperature, it shows a much larger variance in the measured values than the
\acrshort{se} kernel \acrshort{svgp} model. This confirms the previous
suspicions that a pure linear model would not be able to capture the more
nuanced details of the CARNOT model dynamics.
\begin{figure}[ht]
\centering
\includegraphics[width =
\textwidth]{Plots/10_SVGP_480pts_inf_window_12_averageYear_LinearKernel_fullyear.pdf}
\caption{Linear SVGP full year simulation}
\label{fig:SVGP_linear_fullyear_simulation}
\end{figure}
\clearpage
\subsection{Comparative analysis}
Presented in Table~\ref{tab:Model_comparations} are the Mean Error, Error
Variance and Mean Absolute Error for the full year simulation for the three
stable \acrshort{svgp} models, as well as the classical \acrshort{gp} model.
\begin{table}[ht]
%\vspace{-8pt}
\centering
\begin{tabular}{||c c c c||}
\hline
Model & Mean Error [$\degree$C] & Error Variance [$\degree$C] & Mean
Absolute Error [$\degree$C]\\
\hline \hline
GP & 5.08 & 6.88 & 1.330 \\
SVGP (5 days) & -0.06 & 0.25 & 0.055 \\
SVGP (1 day) & -0.04 & 0.24 & 0.050 \\
SVGP (Linear)& -0.03 & 0.29 & 0.093 \\
\hline
\end{tabular}
\caption{Full-year model performance comparison}
\label{tab:Model_comparations}
\end{table}
The worst performing model, as noted previously, is the \acrshort{gp} model. The
\acrshort{svgp} with Linear Kernel results in a stable model with a mean error
very close to zero, which means no constant bias/ offset. This model has the
highest error variance of all the identified \acrshort{svgp} models, which was
also noted beforehand from qualitative observations. It is therefore possible to
conclude that a Linear Kernel does not suffice for properly modeling the
dynamics of the CARNOT model.
The two \acrshort{svgp} models with \acrlong{se} kernels perform the best. They
have a comparable performance, with very small differences in Mean Absolute
Error and Error variance. This leads to the conclusion that the \acrshort{svgp}
models can be deployed with less explicit identification data, but they will
continue to improve over the course of the year, as the building passes through
different regions of the state space and more data is collected.
However, these results do not discredit the use of \acrlong{gp} for employment
in a multi-seasonal situation. As shown before, given the same amount of data
and ignoring the computational cost, they perform better than the alternative
\acrshort{svgp} models. The bad initial performance could be mitigated by
sampling the identification data at different points in time during multiple
experiments, updating a fixed-size dataset based on the gained information, as
well as more cleverly designing the kernel to include prior information.
\clearpage

106
Sections/90_Conclusion.tex Normal file
View file

@ -0,0 +1,106 @@
\section{Conclusion}~\label{sec:conclusion}
The aim of this project was to analyze the performance of \acrshort{gp} based
controllers for use in longer lasting implementations, where differences in
building behaviour become important compared to the initially available data.
First, the performance of a classical \acrshort{gp} model trained on five days
worth of experimental data was analyzed. Initially, this model performed very
well both in one step ahead prediction and multi-step ahead simulation over new,
unseen, data. With the change in weather, however, the model shifted from
operating in the interpolated regions to the extrapolated regions of the initial
weather data. In this scenario the model was unable to properly predict the
\pdome\ behaviour and, as a consequence, the \acrshort{mpc} controller became
unstable.
Following that, several \acrshort{svgp} implementations were analyzed. The
initial behaviour exhibited during parameter identification (cf.
Section~\ref{sec:hyperparameters}) showed that the \acrshort{svgp} model was
less capable of capturing building dynamics only based on the initial
experimental dataset, possibly due to the \acrshort{elbo} approximation of the
true log likelihood. While the \acrshort{svgp} model remained stable over the
course of the 20-step ahead simulation, in the later steps it drifted much
further from the real values than the equivalent \acrshort{gp} model.
However, during the full-year simulation, this downside of the \acrshort{svgp}
model was compensated by adding new data to the model training dataset each
night at midnight. The model performance continuously improved over the course
of the simulation, providing much better results overall.
To better analyze the learning behaviour of the \acrshort{svgp} models, three
variations of the initial \acrshort{svgp} were also simulated. The first
variation consisted of training the initial model on only one day's worth of
experimental data, as opposed to five days in the first case. This model was
then regularly updated every night at midnight, just as the initial case. It
turned out to provide very comparable results to the initial model, leading to
the conclusion that the \acrshort{svgp} model can be initially deployed using
much less training data, and it will still be able to correctly capture the
building dynamics on subsequent updates.
The second variation of the \acrshort{svgp} model was re-trained using a rolling
window of five days' worth of data, in order to see the model's ability to learn
the proper building dynamics based only on closed-loop operation data. This
model turned out to be unstable, and the full-year simulation showed that every
time the model was trained using \textit{only} closed-loop operation data it
turned unstable. This prompted a much higher excitation of the building for the
following day, which in turn provided enough information to train a good model,
that would last until this information was too old to be included in the
training window, at which point the model would turn unstable again.
In the last variation, the \acrshort{svgp} model was trained using a linear
kernel. This model turned out to perform worse overall than the \acrshort{se}
kernel model since it was unable to capture the more nuanced, non-linear
behaviour of the building.
\subsection{Further Research}~\label{sec:further_research}
Section~\ref{sec:results} has presented and compared the results of a full-year
simulation for a classical \acrshort{gp} model, as well as a few incarnations of
\acrshort{svgp} models. The results show that the \acrshort{svgp} have much
better performance, mainly due to the possibility of updating the model
throughout the year. The \acrshort{svgp} models also present a computational
cost advantage both in training and in evaluation, due to several approximations
shown in Section~\ref{sec:gaussian_processes}.
Focusing on the \acrlong{gp} models, there could be several ways of improving
its performance, as noted previously: a more varied identification dataset and
smart update of a fixed-size data dictionary according to information gain,
could mitigate the present problems.
Using a Sparse \acrshort{gp} without replacing the maximum log likelihood
with the \acrshort{elbo} could improve performance of the \acrshort{gp} model at
the expense of training time.
An additional change that could be made is inclusion of the most amount of prior
information possible through setting a more refined kernel, as well as adding
prior information on all the model hyperparameters when available. This approach
however goes against the `spirit' of black-box approaches, since significant
insight into the physics of the plant is required in order to properly model and
implement this information.
On the \acrshort{svgp} side, several changes could also be proposed, which were
not properly addressed in this work. First, the size of the inducing dataset was
chosen experimentally until it was found to accurately reproduce the manually
collected experimental data. In order to better use the available computational
resources, this value could be found programmatically in a way to minimize
evaluation time, while still providing good performance. Another possibility is
the periodic re-evaluation of this value when new data comes in, since as more
and more data is collected the model becomes more complex, and in general more
inducing locations could be necessary to properly reproduce the training data.
Finally, none of the presented controllers take into account occupancy rates or
adapt to possible changes in the real building, such as adding or removing
furniture, deteriorating insulation and so on. The presented update methods only
deals with adding information on behaviour in different state space regions, i.e
\textit{learning}. Additionally, their ability to \textit{adapt} to changes in
the actual plant's behaviour should be further addressed.
\section*{Acknowledgements}
I would like to thank Manuel Koch for the great help provided during the course
of the project starting from the basics on CARNOT modelling, to helping me
better compare the performance of different controllers, as well as Prof.\ Colin
Jones, whose insights were always very guiding, while still allowing me to
discover everything on my own.
\clearpage

View file

@ -0,0 +1,42 @@
\section{Further Research}
Section~\ref{sec:results} has presented and compared the results of a full-year
simulation for a classical \acrshort{gp} model, as well as a few incarnations of
\acrshort{svgp} models. The results show that the \acrshort{svgp} have much
better performance, mainly due to the possibility of updating the model
throughout the year. The \acrshort{svgp} models also present a computational
cost advantage both in training and in evaluation, due to several approximations
shown in Section~\ref{sec:gaussian_processes}.
Focusing on the \acrlong{gp} models, there could be several ways of improving
its performance, as noted previously: a more varied identification dataset and
smart update of a fixed-size data dictionary according to information gain,
could mitigate the present problems.
Using a Sparse \acrshort{gp} without replacing the maximum log likelihood
with the \acrshort{elbo} could improve performance of the \acrshort{gp} model at
the expense of training time.
An additional change that could be made is inclusion of the most amount of prior
information possible through setting a more refined kernel, as well as adding
prior information on all the model hyperparameters when available. This approach
however goes against the "spirit" of black-box approaches, since significant
insight into the physics of the plant is required in order to properly model and
implement this information.
On the \acrshort{svgp} side, several changes could also be proposed, which were
not properly addressed in this work. First, the size of the inducing dataset was
chosen experimentally until it was found to accurately reproduce the manually
collected experimental data. In order to better use the available computational
resources, this value could be found programmatically in a way to minimize
evaluation time, while still providing good performance. Another possibility is
the periodic re-evaluation of this value when new data comes in, since as more
and more data is collected the model becomes more complex, and in general more
inducing locations could be necessary to properly reproduce the training data.
Finally, none of the presented controllers take into account occupancy rates or
adapt to possible changes in the real building, such as adding or removing
furniture, deteriorating insulation and so on. The presented update methods only
deals with adding information on behaviour in different state space regions, i.e
\textit{learning}, and their ability to \textit{adapt} to changes in the actual
plant's behaviour should be further addressed.

View file

@ -0,0 +1,34 @@
\section{Conclusion}
The aim of this project was to analyse the performance of \acrshort{gp} based
controllers for use in longer lasting implementations, where differences in
building behaviour become important compared to the initially available data.
First, the performance of a classical \acrshort{gp} model trained on 5 days
worth of experimental data was analysed. This model turned out to be unable to
correctly extrapolate building behaviour as the weather changed throughout the
year.
Several \acrshort{svgp} implementations were then analysed. They turned out to
provide important benefits over the classical models, such as the ability to
easily scale when new data is being added and the much reduced computational
effort required. They do however present some downsides, namely increasing the
number of hyperparameters by having to choose the number of inducing locations,
as well as performing worse than then classical \acrshort{gp} implementation
given the same amount of data.
Finally, the possible improvements to the current implementations have been
addressed, noting that classical \acrshort{gp} implementations could also be
adapted to the \textit{learning control} paradigm, even if their implementation
could turn out to be much more involved and more computationally expensive than
the \acrshort{svgp} alternative.
\section*{Acknowledgements}
I would like to thank Koch Manuel Pascal for the great help provided during the
course of the project starting from the basics on CARNOT modelling, to helping
me better compare the performance of different controllers, as well as Professor
Jones, whose insights were always very guiding, while still allowing me to
discover everything on my own.
\clearpage

View file

@ -0,0 +1,78 @@
\clearpage
\section{Hyperparameters validation for classical GP}\label{apx:hyperparams_gp}
\subsection{\texorpdfstring{\model{1}{1}{3}}{113}}
\begin{figure}[ht]
\centering
\includegraphics[width = \textwidth]{Plots/GP_113_training_performance.pdf}
\caption{Prediction performance of \model{1}{1}{3} on the training dataset}
\label{fig:GP_train_validation}
\end{figure}
\begin{figure}[ht]
\centering
\includegraphics[width = \textwidth]{Plots/GP_113_test_performance.pdf}
\caption{Prediction performance of \model{1}{1}{3} on the test dataset}
\label{fig:GP_test_validation}
\end{figure}
\clearpage
\subsection{\texorpdfstring{\model{2}{1}{3}}{213}}
\begin{figure}[ht]
\centering
\includegraphics[width = \textwidth]{Plots/GP_213_training_performance.pdf}
\caption{Prediction performance of \model{2}{1}{3} on the training dataset}
\label{fig:GP_213_train_validation}
\end{figure}
\begin{figure}[ht]
\centering
\includegraphics[width = \textwidth]{Plots/GP_213_test_performance.pdf}
\caption{Prediction performance of \model{2}{1}{3} on the test dataset}
\label{fig:GP_213_test_validation}
\end{figure}
\clearpage
\subsection{\texorpdfstring{\model{3}{1}{3}}{313}}
\begin{figure}[ht]
\centering
\includegraphics[width = \textwidth]{Plots/GP_313_training_performance.pdf}
\caption{Prediction performance of \model{3}{1}{3} on the training dataset}
\label{fig:GP_313_train_validation}
\end{figure}
\begin{figure}[ht]
\centering
\includegraphics[width = \textwidth]{Plots/GP_313_test_performance.pdf}
\caption{Prediction performance of \model{3}{1}{3} on the test dataset}
\label{fig:GP_313_test_validation}
\end{figure}
\clearpage
\section{Hyperparameters validation for SVGP}\label{apx:hyperparams_svgp}
\subsection{\texorpdfstring{\model{1}{2}{3}}{123}}
\begin{figure}[ht]
\centering
\includegraphics[width = \textwidth]{Plots/SVGP_123_training_performance.pdf}
\caption{Prediction performance of \model{1}{2}{3} on the training dataset}
\label{fig:SVGP_train_validation}
\end{figure}
\begin{figure}[ht]
\centering
\includegraphics[width = \textwidth]{Plots/SVGP_123_test_performance.pdf}
\caption{Prediction performance of \model{1}{2}{3} on the test dataset}
\label{fig:SVGP_test_validation}
\end{figure}
\clearpage

View file

@ -0,0 +1,10 @@
\section{SVGP hyperparameters evolution}\label{anx:hyperparams_evol}
\begin{figure}[ht]
\centering
\includegraphics[width =
\textwidth]{Plots/1_SVGP_480pts_inf_window_12_averageYear_evol_hyperparameters.pdf}
\caption{Evolution of SVGP hyperparameters}
\label{fig:SVGP_evol_hyperparameters}
\end{figure}