Folder restructure
This commit is contained in:
parent
c2a857a0e8
commit
ffabe92245
16 changed files with 12 additions and 12 deletions
27
Sections/00_First_Page.tex
Normal file
27
Sections/00_First_Page.tex
Normal file
|
@ -0,0 +1,27 @@
|
|||
\begin{titlepage}
|
||||
\begin{center}
|
||||
|
||||
\includegraphics[width=0.5\linewidth]{Logo-EPFL.png}\par
|
||||
\vspace{5cm}
|
||||
|
||||
{\huge \bf
|
||||
\setstretch{1.25}
|
||||
Inter-seasonal Performance of Gaussian Process-based
|
||||
Model Predictive Control of Buildings
|
||||
\par}
|
||||
|
||||
\vspace{2cm}
|
||||
{\LARGE \bf Master Project\par}
|
||||
|
||||
\vspace{6cm}
|
||||
{\Large Radu C. Martin}\par
|
||||
\vspace{2cm}
|
||||
|
||||
{\large Professor: Colin Jones}\par
|
||||
{\large Supervisor: Manuel Pascal Koch}\par
|
||||
|
||||
\vspace{0.5cm}
|
||||
{\large\today}
|
||||
|
||||
\end{center}
|
||||
\end{titlepage}
|
108
Sections/10_Introduction.tex
Normal file
108
Sections/10_Introduction.tex
Normal file
|
@ -0,0 +1,108 @@
|
|||
\section{Introduction}
|
||||
|
||||
\subsection{Motivation}
|
||||
|
||||
Buildings are a major consumer of energy, with more than 25\% of the total
|
||||
energy consumed in the EU coming from residential
|
||||
buildings~\cite{tsemekiditzeiranakiAnalysisEUResidential2019}. Combined with a
|
||||
steady increase in energy demand and stricter requirements on energy
|
||||
efficiency~\cite{europeancommission.jointresearchcentre.EnergyConsumptionEnergy2018},
|
||||
this amplifies the need for more accessible means of regulating energy usage of
|
||||
new and existing buildings.
|
||||
|
||||
Data-driven methods of building identification and control prove very useful
|
||||
through their ease of implementation, foregoing the need of more complex
|
||||
physics-based models. On the flip side, additional attention is required to the
|
||||
design of these control schemes, as the results could vary greatly from one
|
||||
implementation to another.
|
||||
|
||||
Gaussian Processes have been previously used to model building dynamics, but
|
||||
they are usually limited by a fixed computational budget. This limits the
|
||||
approaches that can be taken for identification and update of said models.
|
||||
Learning \acrshort{gp} models have also been previously used in the context of
|
||||
autonomous racing cars \cite{kabzanLearningBasedModelPredictive2019}, but there
|
||||
the Sparse \acrshort{gp} model was built on top of a white-box model and only
|
||||
responsible for fitting the unmodeled dynamics.
|
||||
|
||||
\subsection{Previous Research}
|
||||
With the increase in computational power and availability of data over time,
|
||||
the accessibility of data-driven methods for system identification and control
|
||||
has also risen significantly.
|
||||
|
||||
The idea of using Gaussian Processes as regression models for control of dynamic
|
||||
systems is not new, and has already been explored a number of times. A general
|
||||
description of their use, along with the necessary theory and some example
|
||||
implementations is given in~\cite{kocijanModellingControlDynamic2016}.
|
||||
In~\cite{pleweSupervisoryModelPredictive2020}, a \acrlong{gp} Model with a
|
||||
\acrlong{rq} Kernel is used for temperature set point optimization.
|
||||
|
||||
Gaussian Processes for building control have also been studied before in the
|
||||
context of Demand Response~\cite{nghiemDatadrivenDemandResponse2017,
|
||||
jainLearningControlUsing2018}, where the buildings are used for their heat
|
||||
capacity in order to reduce the stress on energy providers during peak load
|
||||
times.
|
||||
|
||||
There are, however, multiple limitations with these approaches.
|
||||
In~\cite{nghiemDatadrivenDemandResponse2017} the model is only identified once,
|
||||
ignoring changes in weather or plant parameters, which could lead to different
|
||||
dynamics. This is addressed in \cite{jainLearningControlUsing2018} by
|
||||
re-identifying the model every two weeks using new information. Another
|
||||
limitation is that of the scalability of the \acrshort{gp}s, which become
|
||||
prohibitively expensive from a computational point of view when too much data is
|
||||
added.
|
||||
|
||||
Outside of the context of building control, Sparse \acrlong{gp}es have been used
|
||||
in autonomous racing in order to complement the physics-based model by fitting
|
||||
the unmodeled dynamics of the
|
||||
system~\cite{kabzanLearningBasedModelPredictive2019}.
|
||||
|
||||
\subsection{Contribution}
|
||||
|
||||
The ability to learn the plant's behaviour in new regions is very helpful in
|
||||
maintaining model performance over time, as its behaviour starts deviating and
|
||||
the original identified model goes further and further into the extrapolated
|
||||
regions.
|
||||
|
||||
This project tries to combine the use of online learning control schemes with
|
||||
\acrlong{gp} Models through implementing \acrlong{svgp} Models. \acrshort{svgp}s
|
||||
provide means of extending the use of \acrshort{gp}s to larger datasets, thus
|
||||
enabling the periodic re-training of the model to include all the historically
|
||||
available data.
|
||||
|
||||
\subsection{Project Outline}
|
||||
|
||||
The main body of work can be divided in two parts: the development of a computer
|
||||
model of the \pdome\ building and the synthesis, validation and comparison of
|
||||
multiple control schemes using both classical \acrshort{gp}s, as well as
|
||||
\acrshort{svgp}s.
|
||||
|
||||
Section~\ref{sec:gaussian_processes} provides the mathematical background for
|
||||
understanding \acrshort{gp}s, as well as the definition in very broad strokes of
|
||||
\acrshort{svgp}s and their differences from the classical implementation of
|
||||
\acrlong{gp}es. This information is later used for comparing their performances
|
||||
and outlining their respective pros and cons.
|
||||
|
||||
Section~\ref{sec:CARNOT} goes into the details of the implementation of the
|
||||
\pdome\ computer model. The structure of the CARNOT model is described, and all
|
||||
the parameters required for the simulation are either directly sourced from
|
||||
existing literature, computed from available values or estimated using publicly
|
||||
available data from \textit{Google Maps}~\cite{GoogleMaps}.
|
||||
|
||||
Following that, Section~\ref{sec:hyperparameters} discusses the choice of the
|
||||
hyperparameters for the \acrshort{gp} and \acrshort{svgp} models respectively in
|
||||
the context of their multi-step ahead simulation performance.
|
||||
|
||||
The \acrlong{ocp} implemented in the \acrshort{mpc} controller is described in
|
||||
Section~\ref{sec:mpc_problem}, followed by a description of the complete
|
||||
controller implementation in Python in Section~\ref{sec:implementation}.
|
||||
|
||||
Section~\ref{sec:results} presents and analyzes the results of full-year
|
||||
simulations for both \acrshort{gp} and \acrshort{svgp} models. A few variations
|
||||
on the initial \acrshort{svgp} model are further analyzed in order to identify
|
||||
the most important parameteres for long time operation.
|
||||
|
||||
Finally, Section~\ref{sec:conclusion} provides a review of all the different
|
||||
implementation and discusses the possible improvements to both the current
|
||||
\acrshort{gp} and \acrshort{svgp} models.
|
||||
|
||||
\clearpage
|
0
Sections/20_Previous_Research.tex
Normal file
0
Sections/20_Previous_Research.tex
Normal file
351
Sections/30_Gaussian_Processes_Background.tex
Normal file
351
Sections/30_Gaussian_Processes_Background.tex
Normal file
|
@ -0,0 +1,351 @@
|
|||
\section{Gaussian Processes Background}\label{sec:gaussian_processes}
|
||||
|
||||
The \acrfull{gp} is a member of the \textit{kernel machines} class of algorithms
|
||||
in the field of machine learning.
|
||||
|
||||
The formal definition~\cite{rasmussenGaussianProcessesMachine2006} of
|
||||
\acrlong{gp} states that:
|
||||
|
||||
|
||||
\begin{displayquote}
|
||||
A Gaussian Process is a collection of random variables, any finite number of
|
||||
which have a joint Gaussian distribution.
|
||||
\end{displayquote}
|
||||
|
||||
A \acrshort{gp} is completely specified by a mean and a covariance function:
|
||||
|
||||
\begin{equation}
|
||||
\begin{aligned}
|
||||
m(\mathbf{x}) &= \mathbb{E}[f(\mathbf{x})] \\
|
||||
k(\mathbf{x}, \mathbf{x'}) &= \mathbb{E}[f(\mathbf{x} -
|
||||
m(\mathbf{x}))(f(\mathbf{x'}) - m(\mathbf{x'}))]
|
||||
\end{aligned}
|
||||
\end{equation}
|
||||
|
||||
This notation is commonly abbreviated as:
|
||||
|
||||
\begin{equation}
|
||||
f(\mathbf{x}) \sim \mathcal{GP}(m(\mathbf{x}), k(\mathbf{x}, \mathbf{x'}))
|
||||
\end{equation}
|
||||
|
||||
Making the simplifying assumption of $m(\mathbf{x}) = 0$, we have the joint
|
||||
distribution of the training outputs $f$, and the test outputs $f_*$ according
|
||||
to the given prior:
|
||||
|
||||
\begin{equation}
|
||||
\begin{bmatrix}
|
||||
\mathbf{f} \\
|
||||
\mathbf{f_*} \\
|
||||
\end{bmatrix} \sim
|
||||
\mathcal{N}\left(
|
||||
\mathbf{0},
|
||||
\begin{bmatrix}
|
||||
K(X, X) & K(X, X_*) \\
|
||||
K(X_*, X) & K(X_*, X_*) \\
|
||||
\end{bmatrix}
|
||||
\right)
|
||||
\end{equation}
|
||||
|
||||
In the case of noisy observations, assuming $y = f + \epsilon$ with $\epsilon
|
||||
\sim \mathcal{N}(0, \sigma_n^2)$, we have the following joint distribution:
|
||||
|
||||
\begin{equation}
|
||||
\begin{bmatrix}
|
||||
\mathbf{y} \\
|
||||
\mathbf{f_*} \\
|
||||
\end{bmatrix} \sim
|
||||
\mathcal{N}\left(
|
||||
\mathbf{0},
|
||||
\begin{bmatrix}
|
||||
K(X, X) + \sigma_n^2 I& K(X, X_*) \\
|
||||
K(X_*, X) & K(X_*, X_*) \\
|
||||
\end{bmatrix}
|
||||
\right)
|
||||
\end{equation}
|
||||
|
||||
which, for the rest of the section, will be used in the abbreviated form:
|
||||
|
||||
\begin{equation}
|
||||
\begin{bmatrix}
|
||||
\mathbf{y} \\
|
||||
\mathbf{f_*} \\
|
||||
\end{bmatrix} \sim
|
||||
\mathcal{N}\left(
|
||||
\mathbf{0},
|
||||
\begin{bmatrix}
|
||||
K + \sigma_n^2 I& K_*^T \\
|
||||
K_* & K_{**} \\
|
||||
\end{bmatrix}
|
||||
\right)
|
||||
\end{equation}
|
||||
|
||||
|
||||
\subsection{Parameter learning}
|
||||
|
||||
"Training" a \acrshort{gp} is the process of finding the kernel parameters that
|
||||
best explain the data. This is done by maximizing the probability density
|
||||
function for the observations $y$, also known as the marginal likelihood:
|
||||
|
||||
\begin{equation}\label{eq:gp_likelihood}
|
||||
p(y) = \frac{1}{\sqrt{(2\pi)^{n}\det{\left(K + \sigma_n^2I\right)}}}
|
||||
\exp{\left(-\frac{1}{2}y^T\left(K + \sigma_n^2I\right)^{-1}y\right)}
|
||||
\end{equation}
|
||||
|
||||
In order to simplify computation in actual \acrshort{gp} implementations, the
|
||||
value that gets maximized is the log of Equation~\ref{eq:gp_likelihood}, the log
|
||||
marginal likelihood:
|
||||
|
||||
\begin{equation}\label{eq:gp_log_likelihood}
|
||||
\log(p(y)) = - \frac{1}{2}\log{\left(
|
||||
\det{\left(
|
||||
K + \sigma_n^2I
|
||||
\right)}
|
||||
\right)}
|
||||
- \frac{1}{2}y^T\left(
|
||||
K + \sigma_n^2I
|
||||
\right)^{-1}y
|
||||
- \frac{n}{2}\log{\left(2\pi\right)}
|
||||
\end{equation}
|
||||
|
||||
\subsection{Prediction}
|
||||
|
||||
Given the proper covariance matrices $K$ and $K_*$, predictions on new points
|
||||
can be made as follows:
|
||||
|
||||
\begin{equation}
|
||||
\begin{aligned}
|
||||
\mathbf{f_*} = \mathbb{E}\left(f_*|X, \mathbf{y}, X_*\right) &=
|
||||
K_*\left(K + \sigma_n^2I\right)^{-1}\mathbf{y} \\
|
||||
cov(\mathbf{f_*}) &= K_{**} - K_*\left(K +\sigma_n^2I\right)^{-1}K_*^T \\
|
||||
\end{aligned}
|
||||
\end{equation}
|
||||
|
||||
The extensions of these predictions to a non-zero mean \acrshort{gp} comes
|
||||
naturally by applying the zero mean \acrshort{gp} to the \textit{difference}
|
||||
between the observations and the fixed mean function:
|
||||
|
||||
\begin{equation}
|
||||
\bar{\mathbf{f}}_* = \mathbf{m}(X_*) + K_*\left(K +
|
||||
\sigma_n^2I\right)^{-1}(\mathbf{y} - \mathbf{m}(X)) \\
|
||||
\end{equation}
|
||||
|
||||
\subsection{Kernels}\label{sec:Kernels}
|
||||
The choice of the kernel is an important part for any kernel machine class
|
||||
algorithm. It serves the purpose of shaping the behaviour of the \acrshort{gp}
|
||||
by imposing a desired level of smoothness of the resulting functions, a
|
||||
periodicity, linearity, etc. This extends the use cases of the \acrshort{gp}
|
||||
models while including any available prior information of the system to be
|
||||
modeled.
|
||||
|
||||
For the purpose of identifying a dynamical system, a few kernels are popular
|
||||
choices~\cite{kocijanModellingControlDynamic2016}:
|
||||
|
||||
|
||||
\subsubsection*{Squared Exponential Kernel}
|
||||
|
||||
This kernel is used when the system to be modelled is assumed to be smooth and
|
||||
continuous. The basic version of the \acrshort{se} kernel has the following form:
|
||||
|
||||
\begin{equation}
|
||||
k(\mathbf{x}, \mathbf{x'}) = \sigma^2 \exp{\left(- \frac{1}{2}\frac{\norm{\mathbf{x} -
|
||||
\mathbf{x'}}^2}{l^2}\right)}
|
||||
\end{equation}
|
||||
|
||||
With the model variance $\sigma^2$ and lengthscale $l$ as parameters.
|
||||
|
||||
The lengthscale indicates how fast the correlation diminishes as the two points
|
||||
get further apart from each other.
|
||||
|
||||
This function can be modified to use an arbitrary positive semi-definite matrix
|
||||
$\Lambda^{-1}$.
|
||||
This will impose different weights for each combination of $\mathbf{x}$ and
|
||||
$\mathbf{x'}$'s dimensions:
|
||||
|
||||
\begin{equation}
|
||||
\begin{aligned}
|
||||
k(\mathbf{x}, \mathbf{x'})
|
||||
&= \sigma^2\exp{\left[-\frac{1}{2} (\mathbf{x} - \mathbf{x'})^T \Lambda^{-1}
|
||||
(\mathbf{x} - \mathbf{x'})\right]} \\
|
||||
&= \sigma^2 \exp{\left[-\frac{1}{2}\sum_{d=1}^D w_d(x_d - x'_d)^2\right]}
|
||||
\end{aligned}
|
||||
\end{equation}
|
||||
where $w_d = \frac{1}{l_d^2}; d = 1 ,\dots, D$, with D being the dimension of the
|
||||
data.
|
||||
|
||||
This special case of $\Lambda^{-1} = \text{diag}{\left([l_1^{-2},\dots,l_D^{-2}]\right)}$
|
||||
is equivalent to implementing different lengthscales on different regressors.
|
||||
This can be used to asses the relative importance of each regressor through the
|
||||
value of the hyperparameters. This is the \acrfull{ard} property.
|
||||
|
||||
|
||||
\subsubsection*{Rational Quadratic Kernel}
|
||||
|
||||
The \acrfull{rq} Kernel can be interpreted as an infinite sum of \acrshort{se}
|
||||
kernels with different lengthscales. It has the same smooth behaviour as the
|
||||
\acrlong{se} Kernel, but can take into account the difference in function
|
||||
behaviour for large scale vs small scale variations.
|
||||
|
||||
\begin{equation}
|
||||
k(\mathbf{x}, \mathbf{x'}) = \sigma^2 \exp{\left(1+\frac{1}{2}\frac{\norm{\mathbf{x} -
|
||||
\mathbf{x'}}^2}{\alpha l^2}\right)}^{-\alpha}
|
||||
\end{equation}
|
||||
|
||||
When extended with \acrshort{ard} behaviour, it becomes:
|
||||
|
||||
\begin{equation}
|
||||
k(\mathbf{x}, \mathbf{x'})
|
||||
= \sigma^2\exp{\left[1 + \frac{1}{2} {(\mathbf{x} - \mathbf{x'})}^T \Lambda^{-1}
|
||||
(\mathbf{x} - \mathbf{x'})\right]}^{-\alpha} \\
|
||||
\end{equation}
|
||||
|
||||
|
||||
|
||||
\subsection{Extending GPs to larger datasets}
|
||||
|
||||
There exist multiple methods of scaling \acrshort{gp}s to be used on more data,
|
||||
without inquiring the penalty of inverting the covariance matrix. An overview
|
||||
and comparison of multiple methods is given
|
||||
at~\cite{liuUnderstandingComparingScalable2019}.
|
||||
|
||||
For the scope of this project, the choice of using the \acrfull{svgp} models has
|
||||
been made, since it provides a very good balance of scalability, capability,
|
||||
robustness and controllability~\cite{liuUnderstandingComparingScalable2019}.
|
||||
|
||||
The \acrlong{svgp} has been first introduced
|
||||
by~\textcite{hensmanGaussianProcessesBig2013} as a way to scale the use of
|
||||
\acrshort{gp}s to large datasets. A detailed explanation on the mathematics of
|
||||
\acrshort{svgp}s and reasoning behind it is given
|
||||
at~\cite{yiSparseVariationalGaussian2021}, with a very brief summary presented
|
||||
below.
|
||||
|
||||
\subsubsection{Summarizing the training data}
|
||||
|
||||
Trivially downsampling the available data to a size that fits within the
|
||||
computational budget is not the best solution, since it implies ignoring large
|
||||
amounts of possibly otherwise unavailable information.
|
||||
|
||||
In order to solve the problem of reducing the dataset size without the downside
|
||||
of discarding potentially useful information, a new set of random variables
|
||||
$f(X_s)$, usually denoted as $f_s$ is introduced, with the requirement that this
|
||||
new dataset has size $n_s$ smaller than the size $n$ of the original dataset.
|
||||
|
||||
The $X_s$ are called \textit{inducing locations}, and $f_s$ --- \textit{inducing
|
||||
random variables}. They summarize the data in the sense that a model trained on
|
||||
this new dataset should be able to generate the original dataset with a high
|
||||
probability.
|
||||
|
||||
The multivariate Gaussian distribution is used to establish the relationship
|
||||
between $f_s$ and $f$, which will serve the role of the prior, now called the
|
||||
sparse prior:
|
||||
|
||||
\begin{equation}
|
||||
\begin{bmatrix}
|
||||
\mathbf{f}(X) \\
|
||||
\mathbf{f}(X_s) \\
|
||||
\end{bmatrix} \sim
|
||||
\mathcal{N}\left(
|
||||
\mathbf{0},
|
||||
\begin{bmatrix}
|
||||
K(X, X) & K(X, X_s) \\
|
||||
K(X_s, X) & K(X_s, X_s) \\
|
||||
\end{bmatrix}
|
||||
\right)
|
||||
\end{equation}
|
||||
|
||||
\subsubsection{Evidence Lower Bound}\label{sec:elbo}
|
||||
|
||||
Computing the log likelihood is one of the most expensive parts of model
|
||||
training, due to inversion of the kernel matrix term ${\left(K +
|
||||
\sigma_n^2I\right)}^{-1}$ (cf. Equation~\ref{eq:gp_likelihood}) with an
|
||||
algorithmic complexity of $\mathcal{O}(n^3)$.
|
||||
|
||||
In order to solve this problem, the log likelihood equation
|
||||
(Equation~\ref{eq:gp_log_likelihood}) used for training a
|
||||
classical \acrshort{gp} is replaced with an approximate value, that is
|
||||
computationally tractable on larger sets of data.
|
||||
|
||||
The following derivation of the \acrshort{elbo} is based on the one presented
|
||||
in~\cite{yangUnderstandingVariationalLower}.
|
||||
|
||||
Assume $X$ to be the observations, and $Z$ the set parameters of the
|
||||
\acrshort{gp} model, also known as the latent variables. The posterior
|
||||
distribution of the hidden variables can be written as follows:
|
||||
|
||||
\begin{equation}
|
||||
p(Z|X) = \frac{p(X|Z)p(Z)}{p(X)} = \frac{p(X|Z)p(Z)}{\int_Z p(X,Z)}
|
||||
\end{equation}
|
||||
|
||||
The main idea of Variational Bayesian methods is to replace the true posterior
|
||||
distribution $p(Z|X)$ with an approximation $q(Z)$ that is easier to compute,
|
||||
but is as close as possible to the original.
|
||||
|
||||
One measure of the closeness of the two distributions is the \acrfull{kl}
|
||||
divergence, which for variational inference takes the following form:
|
||||
|
||||
|
||||
\vspace{5pt}
|
||||
\begin{equation}
|
||||
\begin{aligned}
|
||||
KL\left[q(Z)||p(Z|X)\right]
|
||||
&= \int_Z q(Z)\log{\left(\frac{q(Z)}{p(Z|X)}\right)} \\
|
||||
&= - \int_Z q(Z)\log{\left(\frac{q(Z|X)}{p(Z)}\right)} \\
|
||||
&= - \left(
|
||||
\int_Z q(Z)\log{\left(\frac{p(X,Z)}{q(Z)}\right)}
|
||||
- \int_Z q(Z)\log{\left(p(X)\right)}
|
||||
\right) \\
|
||||
&= - \int_Z q(Z) \log{\left(\frac{p(X,Z)}{q(Z)}\right)}
|
||||
+ \log{\left(p(X)\right)}\int_Z q(Z) \\
|
||||
&= -L + \log{\left(p(X)\right)} \\
|
||||
\end{aligned}
|
||||
\end{equation}
|
||||
\vspace{5pt}
|
||||
|
||||
where L is the \acrfull{elbo}. Rearranging this equation we get:
|
||||
|
||||
\begin{equation}
|
||||
L = \log{\left(p(X)\right)} - KL\left[q(Z)||p(Z|X)\right]
|
||||
\end{equation}
|
||||
|
||||
Since KL is always $\geq 0$, we get that $L \leq \log{\left(p(X)\right)}$ is a
|
||||
lower bound of the log probability of observations.
|
||||
|
||||
\subsection{Gaussian Process Models for Dynamical
|
||||
Systems}\label{sec:gp_dynamical_system}
|
||||
|
||||
In the context of Dynamical Systems Identification and Control, Gaussian
|
||||
Processes are used to represent different model structures, ranging from state
|
||||
space and \acrshort{nfir} structures, to the more complex \acrshort{narx},
|
||||
\acrshort{noe} and \acrshort{narmax}.
|
||||
|
||||
|
||||
The general form of an \acrfull{narx} model is as follows:
|
||||
|
||||
\begin{equation}
|
||||
\hat{y}(k) =
|
||||
f(w(k-1),\dots,w(k-l_w),y(k-1),\dots,y(k-l_y),u(k-1),\dots,u(k-l_u))
|
||||
\end{equation}
|
||||
|
||||
where w are the exogenous inputs with a maximum lag of $l_w$, u are the model
|
||||
inputs with a maximum lag of $l_u$, and y are the model outputs with a maximum
|
||||
lag of $l_y$. This structure is a prediction model, since it uses the measured
|
||||
values of y as regressors.
|
||||
|
||||
The \acrfull{noe} uses the past predictions $\hat{y}$ for future predictions:
|
||||
|
||||
\begin{equation}
|
||||
\hat{y}(k) =
|
||||
f(w(k-1),\dots,w(k-l_w),\hat{y}(k-1),\dots,\hat{y}(k-l_y),u(k-1),\dots,u(k-l_u))
|
||||
\end{equation}
|
||||
|
||||
Due to its use for multi-step ahead simulation of system behaviour, as opposed
|
||||
to only predicting one state ahead using current information, the \acrshort{noe}
|
||||
structure can be considered a \textit{simulation model}.
|
||||
|
||||
|
||||
In order to get the best simulation results from a \acrshort{gp} model, the
|
||||
\acrshort{noe} structure would have to be employed. Due to the high algorithmic
|
||||
complexity of training and evaluating \acrshort{gp} models, this approach is
|
||||
computationally intractable. In practice a \acrshort{narx} model will be trained,
|
||||
which will be validated through multi-step ahead prediction.
|
||||
|
||||
\clearpage
|
614
Sections/40_CARNOT_model.tex
Normal file
614
Sections/40_CARNOT_model.tex
Normal file
|
@ -0,0 +1,614 @@
|
|||
\section{CARNOT model}\label{sec:CARNOT}
|
||||
|
||||
In order to better analyze the different model training and update methods it
|
||||
was decided to replace the physical \pdome\ building with a computer model.
|
||||
This allows for faster-than-real-time simulations, as well as perfectly
|
||||
reproducing the weather conditions and building response for direct comparison
|
||||
of different control schemes over longer periods of time.
|
||||
|
||||
The model is designed using the CARNOT
|
||||
toolbox~\cite{lohmannEinfuehrungSoftwareMATLAB} for Simulink. It is based on the
|
||||
CARNOT default \textit{Room Radiator} model, with the following changes:
|
||||
\begin{itemize}
|
||||
\item Only one of the two default rooms is used
|
||||
\item The outside walls are replaced with windows
|
||||
\item A window is added on the roof as an equivalent for the four skylights
|
||||
presented in Section~\ref{sec:Physical_dimensions}
|
||||
\end{itemize}
|
||||
|
||||
The resulting schema for the \pdome\ building is presented in
|
||||
Figure~\ref{fig:CARNOT_polydome}. The following sections will focus on detailing
|
||||
the choice of all the necessary model parameters.
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width = \textwidth]{Images/polydome_room_model.pdf}
|
||||
\caption{Simulink Schema of the CARNOT \pdome\ building model}
|
||||
\label{fig:CARNOT_polydome}
|
||||
\end{figure}
|
||||
|
||||
\clearpage
|
||||
|
||||
Finally, the Simulink model is completed by adding a \textit{Weather Data File}
|
||||
containing weather measurements for a whole year, and a \textit{Weather
|
||||
Prediction} block responsible for sending weather predictions to the MPC.\@ The
|
||||
controller itself is defined in Python and is connected to Simulink via three
|
||||
TCP/IP sockets. Details on the implementation are presented in
|
||||
Section~\ref{sec:implementation}.
|
||||
|
||||
\subsection{Physical dimensions}\label{sec:Physical_dimensions}
|
||||
|
||||
The \pdome\ building is a dome-shaped, single-zone building serving the role
|
||||
of classroom for audiences of around one hundred students.
|
||||
|
||||
Most of the necessary measurements are already available from the
|
||||
\citetitle*{nattererPolydomeTimberShell1993} article by
|
||||
\textcite{nattererPolydomeTimberShell1993}. It presents the \pdome\ geometry as
|
||||
having a floor area of 25m $\times$ 25m, a maximum height of the building of 7m,
|
||||
walls of a maximum height of 3m. The roof is a spherical cap with a radius of
|
||||
approximately 28m.
|
||||
|
||||
One particularity of the \pdome\ is the presence of four large skylights on the
|
||||
roof. An estimate of their size has been made using images from \textit{Google
|
||||
Maps Satellite View}~\cite{GoogleMaps}, with the included measurement tool. These
|
||||
measurements are presented in Figure~\ref{fig:Google_Maps_Skylights}. The
|
||||
skylights are measured to be squares of edge 2.5m.
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\vspace{-10pt}
|
||||
\includegraphics[width = 0.75\textwidth]{Images/google_maps_polydome_skylights}
|
||||
\caption{Google Maps Satellite view of the \pdome\ building}
|
||||
\label{fig:Google_Maps_Skylights}
|
||||
\end{figure}
|
||||
|
||||
The only remaining missing piece of information is the \textit{stem wall}
|
||||
height, which is the height of the walls from the ground on which the dome
|
||||
ceiling directly resides. Due to the limited campus access, this measure has
|
||||
also been approximated from a Google Maps image, presented in
|
||||
Figure~\ref{fig:Google_Maps_Streetview}. An object of known size has been used
|
||||
as reference, after which the following measurements have been done in
|
||||
\citetitle{kimballGIMPGNUImage}~\cite{kimballGIMPGNUImage} using the
|
||||
\textit{Measure Tool}.
|
||||
|
||||
The chosen reference object is the \pdome\ \acrshort{hvac} system, the full
|
||||
description of which is presented in Section~\ref{sec:HVAC_parameters}, and
|
||||
which has a known height of 2061mm \cite{aermecRoofTopManuelSelection}.
|
||||
|
||||
\clearpage
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width = \textwidth]{Images/polydome_streetview_annotated}
|
||||
\caption{Google Maps StreetView view of the \pdome\ building}
|
||||
\label{fig:Google_Maps_Streetview}
|
||||
\end{figure}
|
||||
|
||||
The graphical analysis resulted in the measurements presented in
|
||||
Table~\ref{tab:GIMP_measurements}:
|
||||
|
||||
\begin{table}[ht]
|
||||
%\vspace{-8pt}
|
||||
\centering
|
||||
\begin{tabular}{||c c c c||}
|
||||
\hline
|
||||
Object & Size [px] & Size[mm] & Size[m]\\
|
||||
\hline \hline
|
||||
\acrshort{hvac} height & 70 & 2100 & 2.1 \\
|
||||
Building height & 230 & 6900 & 6.9 \\
|
||||
Stem wall & 45 & 1350 & 1.35 \\
|
||||
Dome height & 185 & 5550 & 5.55 \\
|
||||
\hline
|
||||
\end{tabular}
|
||||
\caption{Calculated Dome Dimensions}
|
||||
\label{tab:GIMP_measurements}
|
||||
\end{table}
|
||||
|
||||
For a validation of the results shown in Figure~\ref{tab:GIMP_measurements} it
|
||||
is possible to compare the total \pdome\ building height measured in GIMP with
|
||||
the value presented beforehand. The graphical measure of 6.9m is indeed very
|
||||
close to the described height of 7m.
|
||||
|
||||
These measurements can be used to set the size parameters of all the nodes in
|
||||
the model:
|
||||
|
||||
\begin{itemize}
|
||||
\item Side windows with a size of 2m $\times$ 25m each
|
||||
\item Roof window with a size of 5m $\times$ 5m, cumulative surface of all
|
||||
the skylights
|
||||
\item Ceiling with a size of 25m $\times$ 25m
|
||||
\item Floor with a size of 25m $\times$ 25m
|
||||
\end{itemize}
|
||||
|
||||
\subsection{Internal volume}
|
||||
|
||||
The \pdome\ building has a structure that is mostly based on a dome shape, with
|
||||
the difference that the dome portion of the building does not reach the ground,
|
||||
but stands above it at a height of approximately $1.35$m (cf.
|
||||
Table~\ref{tab:GIMP_measurements}), with the large side windows extending to the
|
||||
ground and creating a \textit{stem wall} for the dome to sit on.
|
||||
|
||||
The total internal volume of the \pdome\ building can therefore be approximated
|
||||
by computing the sum volumes of the two elements: the stem and the dome.
|
||||
|
||||
Geometrically, a dome is a portion of a sphere cut off by a plane. Its volume
|
||||
can therefore be computed in a similar fashion to that of a complete sphere.
|
||||
|
||||
A sphere can be regarded as volume of rotation of function $f(x)$ (cf.
|
||||
Equation~\ref{eq:rot_func}) around the x axis, where $R$ is the sphere radius
|
||||
and $x \in [0, 2r]$:
|
||||
|
||||
\begin{equation}\label{eq:rot_func}
|
||||
f(x) = \sqrt{R^2 - (x-R)^2} = \sqrt{2Rx - x^2}
|
||||
\end{equation}
|
||||
|
||||
It is therefore possible to compute the volume of a dome by integrating $f(x)$
|
||||
up to the dome height $h$, as presented in Equation~\ref{eq:rot_vol}.
|
||||
|
||||
\begin{equation}\label{eq:rot_vol}
|
||||
\begin{aligned}
|
||||
V &= \pi \int_0^h f(x)^2 dx \\
|
||||
&= \pi \int_0^h (2Rx - x^2)dx \\
|
||||
&= \pi \left( Rx^2 - \frac{1}{3}x^3 \right)_0^h \\
|
||||
&= \frac{\pi h^2}{3} (3R - h) \\
|
||||
\end{aligned}
|
||||
\end{equation}
|
||||
|
||||
The volume of the \pdome\ dome is computed by first finding the radius of the
|
||||
complete sphere, the radius of curvature (cf.
|
||||
Equation~\ref{eq:radius_curvature}, where $h$ is the height of the dome and $r$
|
||||
is the radius of the base of the dome) and plugging it in
|
||||
Equation~\ref{eq:rot_vol}.
|
||||
|
||||
\begin{equation}\label{eq:radius_curvature}
|
||||
R_c = \frac{r^2 + h^2}{2h}
|
||||
\end{equation}
|
||||
|
||||
The volume of the dome is then given by:
|
||||
|
||||
\begin{equation}
|
||||
V_d = \frac{1}{3} \pi h^2 (3R_c - h) = \frac{1}{6} \pi h (3r^2 + h^2)
|
||||
\end{equation}
|
||||
|
||||
The stem of the \pdome\ is approximated to a cube of edge 25m, and its volume can
|
||||
therefore be calculated as:
|
||||
|
||||
\begin{equation}
|
||||
V_s = l_s^2 \times h_s
|
||||
\end{equation}
|
||||
|
||||
The total volume of the building is then given as:
|
||||
|
||||
\begin{equation}
|
||||
V = V_d + V_s = \frac{1}{6} \pi h (3r^2 + h^2) + l_s^2 \times h_s
|
||||
\end{equation}
|
||||
|
||||
Numerically, considering a dome diameter of 28m, a dome height of 5.55m and a stem
|
||||
wall edge of 25m, we get the approximate volume of the building:
|
||||
\begin{equation}\label{eq:numerical_volume}
|
||||
V = V_d + V_s = 1798.22m^3 + 843.75m^3 = 2641.97m^3 \approx 2650m^3
|
||||
\end{equation}
|
||||
|
||||
The value presented in Equation~\ref{eq:numerical_volume} is used directly in
|
||||
the \textit{room\_node} element of the CARNOT model (cf.
|
||||
Figure~\ref{fig:CARNOT_polydome}), as well as the calculation of the Air
|
||||
Exchange Rate, presented in Section~\ref{sec:Air_Exchange_Rate}.
|
||||
|
||||
\subsection{Furniture}
|
||||
|
||||
The main function of the \pdome\ building is serving as a classroom for around
|
||||
one hundred students. It has wood furniture consisting of chairs and tables, as
|
||||
well as a wooden stage in the center of the building, meant to be used for
|
||||
presentations. The building also contains a smaller room, housing all the
|
||||
necessary technical equipment (cf. Figure~\ref{fig:Google_Maps_Streetview}).
|
||||
|
||||
The most accurate way of including information on the furniture in the CARNOT
|
||||
model would be to manually compute the mass, volume and materials for each of
|
||||
the chairs, tables, etc.\ but due to the restricted access to the building a
|
||||
simpler approximation has been made.
|
||||
|
||||
\textcite{johraNumericalAnalysisImpact2017} present a methodology to model the
|
||||
furniture in buildings for different materials, as well as an \textit{equivalent
|
||||
indoor content material} that is meant to approximate the furniture content of
|
||||
an office building. These values for mass content, surface area, material
|
||||
density and thermal conductivity have been taken as the basis for the \pdome\
|
||||
furniture content approximation, with the assumption that, since the \pdome\ is
|
||||
still mostly empty, it has approximately a quarter of the furniture present in a
|
||||
fully furnished office.
|
||||
|
||||
The full set of furniture is, therefore, approximated in the CARNOT model as a
|
||||
wall, with the numerical values for mass, surface, thickness and volume
|
||||
presented below.
|
||||
|
||||
\subsubsection*{Surface}
|
||||
% surface:
|
||||
% 1/4 * 1.8 [m2/m2 of floor space] * 625 m2 surface = 140 m2
|
||||
% 140 m2 = [7 20] m [height width]
|
||||
|
||||
The equivalent material is taken to have a surface of 1.8 $\text{m}^2$ per each
|
||||
$\text{m}^2$ of floor area~\cite{johraNumericalAnalysisImpact2017}. With a floor
|
||||
area of 625 $\text{m}^2$, and assuming the furnishing of the building is a
|
||||
quarter that of a fully-furnished office, the \pdome\ furniture equivalent wall
|
||||
has a surface area of:
|
||||
|
||||
\begin{equation}
|
||||
S_f = \frac{1}{4} \cdot 1.8 \left[\frac{\text{m}^2}{\text{m}^2}\right]
|
||||
\cdot 625\ \left[\text{m}^2\right] = 281.25 \left[\text{m}^2\right]
|
||||
\end{equation}
|
||||
|
||||
Since a wall has two large faces, the total surface gets divided between them
|
||||
for a surface of each face of:
|
||||
|
||||
\begin{equation}
|
||||
S_{f, \text{face}} =
|
||||
\frac{1}{2} \times S_f \approx 140 \left[\text{m}^2\right]
|
||||
\end{equation}
|
||||
|
||||
\subsubsection*{Mass}
|
||||
% mass:
|
||||
% 1/4 * 40 [kg/m2 of floor space] * 625 m2 surface = 6250 kg
|
||||
|
||||
The mass of the furniture equivalent wall is computed using the same
|
||||
methodology, considering 40 kg of furniture mass per $\text{m}^2$ of floor
|
||||
surface.
|
||||
|
||||
\begin{equation}
|
||||
M_f = \frac{1}{4} \cdot 40 \cdot 625\ \left[\text{m}^2\right] = 6250\
|
||||
\left[\text{m}^2\right]
|
||||
\end{equation}
|
||||
|
||||
\subsubsection*{Volume}
|
||||
% volume:
|
||||
% 6250[kg]/600[kg/m3] = 10.41 [m3]
|
||||
|
||||
The volume of the furniture equivalent wall is calculated with the help of the
|
||||
equivalent mass and densities:
|
||||
|
||||
\begin{equation}
|
||||
V_f = \frac{M_f}{\rho_f} = \frac{6250}{600}
|
||||
\left[\frac{\text{kg}}{\text{kg}/\text{m}^3}\right]
|
||||
= 10.41\ \left[\text{m}^3\right]
|
||||
\end{equation}
|
||||
|
||||
\subsubsection*{Thickness}
|
||||
% thickness:
|
||||
%10.41[m3]/140[m2] = 0.075m = 7.5cm
|
||||
|
||||
The last parameter of the furniture equivalent wall is computed by dividing its
|
||||
volume by the surface:
|
||||
|
||||
\begin{equation}
|
||||
h_f = \frac{V_f}{S_{f, \text{face}}}
|
||||
= \frac{10.41}{140} \left[\frac{m^3}{m^2}\right]
|
||||
= 0.075\ \left[\text{m}\right] = 7.5\ \left[\text{cm}\right]
|
||||
\end{equation}
|
||||
|
||||
|
||||
\subsection{Material properties}
|
||||
|
||||
In order to better simulate the behaviour of the real \pdome\ building, it is
|
||||
necessary to approximate the building materials and their properties as close as
|
||||
possible. This section goes into details and arguments for the choice of
|
||||
parameters for each of the CARNOT nodes' properties.
|
||||
|
||||
\subsubsection{Windows}
|
||||
|
||||
The windows are made of two glass panes of thickness 4mm each.
|
||||
|
||||
The values of the heat transfer coefficient (U-factor) can vary greatly for
|
||||
different window technologies, but an educated guess can be made on the lower
|
||||
and upper bounds based on the age and location of the building.
|
||||
|
||||
Window manufacturers state
|
||||
the U-factor for double glass windows to be between 2.8 \(\frac{W}{m^2K}\) for
|
||||
older manufacturing techniques and 1.2 \(\frac{W}{m^2K}\) for newer
|
||||
models~\cite{WhatAreTypical2018}.
|
||||
|
||||
The US Energy Department states that the
|
||||
typical U-factor values for modern window installations is in the range of 0.2
|
||||
--- 1.2 \(\frac{W}{m^2K}\)~\cite{GuideEnergyEfficientWindows}.
|
||||
|
||||
The European flat glass association claims that the maximum allowable U-factor
|
||||
value for new window installations in the private sector buildings in
|
||||
Switzerland is 1.5
|
||||
\(\frac{W}{m^2K}\)~\cite{glassforeuropeMinimumPerformanceRequirements2018}.
|
||||
|
||||
Considering the aforementioned values, and the fact the \pdome\ building was
|
||||
built in 1993~\cite{nattererModelingMultilayerBeam2008}, the default U-factor of
|
||||
1.8 \(\frac{W}{m^2K}\) has been deemed appropriate.
|
||||
|
||||
The total solar energy transmittance $[g]$ and transmittance in the visible
|
||||
range $[v_g]$ can vary in the range of 0.1 --- 0.9 depending on manufacturing
|
||||
technology, tint, etc. The average values of 0.7 and 0.65 respectively have been
|
||||
chosen for this case.
|
||||
|
||||
The values for glass density and heat capacity have been left at their default
|
||||
values of 2500 \(\frac{kg}{m^3}\) and 1008 \(\frac{J}{kgK}\) respectively.
|
||||
|
||||
\subsubsection{Roof and Floor}
|
||||
|
||||
%%% Roof
|
||||
% [5cm wood, 10cm insulation, 5cm wood]
|
||||
% Conductivity for each material
|
||||
% Heat capacity for each material
|
||||
% Density for each material
|
||||
|
||||
The roof structure has been assumed to be made out of 10 cm of insulation,
|
||||
enclosed on each side by 5 cm of wood.
|
||||
|
||||
%%% Floor
|
||||
% [5cm wood, 10cm insulation, 20cm concrete]
|
||||
% Conductivity for each material
|
||||
% Heat capacity for each material
|
||||
% Density for each material
|
||||
|
||||
The floor composition has been taken as consisting of, from top to bottom, 5 cm
|
||||
wood, 10 cm insulation followed by 20 cm of concrete.
|
||||
|
||||
All the necessary values to characterise these materials have been taken
|
||||
from~\cite{BuildingsHeatTransferData} and are presented in
|
||||
Table~\ref{tab:material_properties}:
|
||||
|
||||
\begin{table}[ht]
|
||||
%\vspace{-8pt}
|
||||
\centering
|
||||
\begin{tabular}{||c c c c||}
|
||||
\hline
|
||||
Material & Thermal Conductivity $k$ $[\frac{W}{mK}]$ & Specific Heat
|
||||
Capacity $c$ $[\frac{J}{kgK}]$ & Density $\rho$ $[\frac{kg}{m^3}]$
|
||||
\\
|
||||
\hline \hline
|
||||
Plywood & 0.12 & 1210 & 540 \\
|
||||
Insulation & 0.03 & 1200 & 40 \\
|
||||
Concrete & 1.0 & 700 & 2500 \\
|
||||
\hline
|
||||
\end{tabular}
|
||||
\caption{Material properties for roof and floor elements}
|
||||
\label{tab:material_properties}
|
||||
\end{table}
|
||||
|
||||
\subsection{HVAC parameters}\label{sec:HVAC_parameters}
|
||||
|
||||
The \pdome\ is equipped with an \textit{AERMEC RTY-04} \acrshort{hvac} system.
|
||||
According to the manufacturer's manual~\cite{aermecRoofTopManuelSelection}, this
|
||||
\acrshort{hvac} houses two compressors of power 11.2 kW and 8.4 kW respectively,
|
||||
an external ventilator of power 1.67 kW, and a reflow ventilator of power 2 kW.
|
||||
The unit has a typical \acrlong{eer} (\acrshort{eer}, cooling efficiency) of 4.9
|
||||
--- 5.1 and a \acrlong{cop} (\acrshort{cop}, heating efficiency) of 5.0, for a
|
||||
maximum cooling capacity of 64.2 kW.
|
||||
|
||||
One particularity of this \acrshort{hvac} unit is that during summer, only one
|
||||
of the two compressors are running. This results in a higher \acrlong{eer}, in
|
||||
the cases where the full cooling capacity is not required.
|
||||
|
||||
\subsubsection*{Ventilation}
|
||||
|
||||
According to the manufacturer manual \cite{aermecRoofTopManuelSelection}, the
|
||||
\acrshort{hvac} unit's external fan has an air flow ranging between 4900
|
||||
$\text{m}^3/\text{h}$ and 7000 $\text{m}^3/\text{h}$.
|
||||
|
||||
\subsubsection*{Air Exchange Rate}\label{sec:Air_Exchange_Rate}
|
||||
|
||||
The air exchange rate, also known as `air changes per hour', represents the
|
||||
number of times the air in a room gets replaced with new air. It can be
|
||||
computed by dividing the air flow through the room by the room volume:
|
||||
|
||||
\begin{equation}
|
||||
\text{Air exchange rate} = \frac{\text{Air flow}}{\text{Total volume}}
|
||||
\end{equation}
|
||||
|
||||
In the case of the \pdome\ and its \acrshort{hvac}, this results in an airflow
|
||||
range of:
|
||||
|
||||
\begin{equation}
|
||||
\begin{aligned}
|
||||
\text{Air exchange rate} &= \frac{4900}{2650}
|
||||
- \frac{7000}{2650} &\left[\frac{\text{m}^3/\text{h}}{\text{m}^3}\right]\\
|
||||
&= 1.85 - 2.64 &\left[\frac{1}{\text{h}}\right]
|
||||
\end{aligned}
|
||||
\end{equation}
|
||||
|
||||
As it will be shown in Section~\ref{sec:CARNOT_experimental}, the external fan
|
||||
is continuously running. The \textit{Air exchange rate} has therefore been fixed
|
||||
to a value of 2.5 for the duration of the whole simulation. The real airflow
|
||||
could, of course, vary depending on a multitude of external factors, but they
|
||||
would require more precise measurements to estimate.
|
||||
|
||||
\subsection{Validating against experimental data}\label{sec:CARNOT_experimental}
|
||||
|
||||
In order to confirm the validity of the model, it is necessary to compare the
|
||||
CARNOT models' behaviour against that of the real \pdome\ building.
|
||||
|
||||
Section~\ref{sec:CARNOT_expdata} presents the available experimental data,
|
||||
Section~\ref{sec:CARNOT_WDB} details the transformation of the available data
|
||||
into CARNOT's WDB format, and Section~\ref{sec:CARNOT_comparison} details a
|
||||
qualitative analysis of the differences.
|
||||
|
||||
\subsubsection{The available experimental data}\label{sec:CARNOT_expdata}
|
||||
|
||||
All the experimental data used for the validation of the CARNOT model has been
|
||||
collected previously for another
|
||||
study~\cite{fabiettiMultitimeScaleCoordination2018}, where it has been used to
|
||||
identify a State Space model for the \pdome\ building.
|
||||
|
||||
The data has been collected in the time span of June to August 2017, and is
|
||||
divided in seven different experiments, as presented in
|
||||
Figure~\ref{tab:exp_dates}. The available measurements are the \textit{Outside
|
||||
Temperature}, \textit{Solar Irradiation}, \textit{Electrical power consumption}
|
||||
of the \acrshort{hvac}, and two measurements of \textit{Inside Temperature} in
|
||||
different parts of the room.
|
||||
|
||||
\begin{table}[ht]
|
||||
\centering
|
||||
\begin{tabular}{||c c c||}
|
||||
\hline
|
||||
Exp no. & Start Date & End Date\\
|
||||
\hline \hline
|
||||
1 & 01.06.2017 20:00 & 03.06.2017 17:00 \\
|
||||
2 & 10.06.2017 16:00 & 12.06.2017 06:00 \\
|
||||
3 & 16.06.2017 20:00 & 19.06.2017 06:00 \\
|
||||
4 & 19.06.2017 20:00 & 22.06.2017 06:00 \\
|
||||
5 & 30.06.2017 20:00 & 03.07.2017 06:00 \\
|
||||
6 & 07.07.2017 20:00 & 10.06.2017 06:00 \\
|
||||
7 & 13.06.2017 20:00 & 20.06.2017 06:00 \\
|
||||
\hline
|
||||
\end{tabular}
|
||||
\caption{\pdome\ experimental measurements}
|
||||
\label{tab:exp_dates}
|
||||
\end{table}
|
||||
|
||||
\clearpage
|
||||
|
||||
As mentioned previously, the external fan of the \acrshort{hvac} is constantly
|
||||
running. This can be seen in Figure~\ref{fig:Polydome_electricity} as the
|
||||
electricity consumption of the \acrshort{hvac} has a baseline of 1.67 kW.
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width = \textwidth]{Plots/Fan_baseline.pdf}
|
||||
\caption{Electrical Power consumption of the \pdome\ \acrshort{hvac} for Experiment 7}
|
||||
\label{fig:Polydome_electricity}
|
||||
\end{figure}
|
||||
|
||||
Figure~\ref{fig:Polydome_electricity} also gives an insight into the workings of
|
||||
the \acrshort{hvac} when it comes to the combination of the two available
|
||||
compressors. The instruction manual of the
|
||||
\acrshort{hvac}~\cite{aermecRoofTopManuelSelection} notes that in summer only
|
||||
one of the compressors is running. This allows for a larger \acrshort{eer} value
|
||||
and thus better performance. We can see that this is the case for most of the
|
||||
experiment, where the power consumption caps at around 6~kW, which is the
|
||||
maximum power consumption of one compressor. There are, however, moments during
|
||||
the first part of the experiment where the power momentarily peaks over the 6~kW
|
||||
limit, and goes as high as around 9~kW. This most probably happens when the
|
||||
\acrshort{hvac} decides that the difference between the set point temperature
|
||||
and the actual measured values is too large to compensate with a single
|
||||
compressor.
|
||||
|
||||
Figure~\ref{fig:Polydome_exp7_settemp} presents the values of the set point
|
||||
temperature and the measured internal temperature.
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width = \textwidth]{Plots/Exp_settemp.pdf}
|
||||
\caption{Measured vs set point temperature of the \acrshort{hvac} for Experiment 7}
|
||||
\label{fig:Polydome_exp7_settemp}
|
||||
\end{figure}
|
||||
|
||||
A few observations can be made on these measurements. First, the second
|
||||
compressor is indeed turned on during the first part of the experiment, when the
|
||||
set point differs greatly from the measured temperature. Second, for the
|
||||
beginning of Experiment 7, as well as the majority of the other experiments, the
|
||||
set point temperature is the value that gets changed in order to excite the
|
||||
system, and since the \acrshort{hvac}'s controller is on during identification,
|
||||
it will oscillate between using one or two compressors. Lastly, it is possible
|
||||
to notice that the \acrshort{hvac} is not turned on during the night, with the
|
||||
exception of the external fan, which continues running.
|
||||
|
||||
\subsubsection{The CARNOT WDB weather data format}\label{sec:CARNOT_WDB}
|
||||
|
||||
For a correct simulation of the building behaviour, CARNOT requires not only the
|
||||
detailed definition of the building blocks/nodes, but also a very detailed set
|
||||
of data on the weather conditions. This set includes detailed information on the
|
||||
sun's position throughout the simulation (zenith and azimuth angles), the
|
||||
\acrfull{dhi} and \acrfull{dni}, direct and diffuse solar radiation on surface,
|
||||
as well as information on the ambient temperature, humidity, precipitation,
|
||||
pressure, wind speed and direction, etc. A detailed overview of each
|
||||
measurement necessary for a simulation is given in the CARNOT user
|
||||
manual~\cite{CARNOTManual}.
|
||||
|
||||
In order to compare the CARNOT model's performance to that of the real \pdome,
|
||||
it is necessary to simulate the CARNOT model under the same set of conditions as
|
||||
the ones present during the experimental data collection. In order to do this,
|
||||
all the missing values that are required by the simulation have to be filled. In
|
||||
some cases, such as the sun angles it is possible to compute the exact values,
|
||||
but in other cases the real data is not available, which means that it has to be
|
||||
inferred from the available data.
|
||||
|
||||
The information on the zenith and azimuth solar angles can be computed exactly
|
||||
if the position and elevation of the building are known. The GPS coordinates and
|
||||
elevation information is found using a map~\cite{ElevationFinder}. With that
|
||||
information available, the zenith, azimuth angles, as well as the \acrfull{aoi}
|
||||
are computed using the Python pvlib
|
||||
library~\cite{f.holmgrenPvlibPythonPython2018}.
|
||||
|
||||
As opposed to the solar angles, which can be computed exactly from the available
|
||||
information, the Solar Radiation Components (DHI and DNI) have to be estimated
|
||||
from the available measurements of GHI, zenith angles (Z) and datetime
|
||||
information. \textcite{erbsEstimationDiffuseRadiation1982} present an empirical
|
||||
relationship between GHI and the diffuse fraction DF and the ratio of GHI to
|
||||
extraterrestrial irradiance $K_t$, known as the Erbs model. The DF is then used
|
||||
to compute DHI and DNI as follows:
|
||||
|
||||
\begin{equation}
|
||||
\begin{aligned}
|
||||
\text{DHI} &= \text{DF} \times \text{GHI} \\
|
||||
\text{DNI} &= \frac{\text{GHI} - \text{DHI}}{\cos{\text{Z}}}
|
||||
\end{aligned}
|
||||
\end{equation}
|
||||
|
||||
All the other parameters related to solar irradiance, such as the in-plane
|
||||
irradiance components, in-plane diffuse irradiance from the sky and the ground
|
||||
are computed using the Python pvlib.
|
||||
|
||||
The values that can be neither calculated nor approximated from the available
|
||||
data, such as the precipitation, wind direction, incidence angles in place of
|
||||
vertical and main/secondary surface axis, have been replaced with the default
|
||||
CARNOT placeholder value of -9999. The relative humidity, cloud index, pressure
|
||||
and wind speed have been fixed to the default CARNOT values of 50\%, 0.5, 96300
|
||||
Pa, 0 $\text{m}/\text{s}$ respectively.
|
||||
|
||||
\subsubsection{\pdome\ and CARNOT model comparison}\label{sec:CARNOT_comparison}
|
||||
|
||||
With the \acrshort{wdb} data compiled, we can now turn to simulating the CARNOT
|
||||
model and compare its behaviour to that of the real \pdome\ building.
|
||||
|
||||
Unfortunately, one crucial piece of information is still missing: the amount of
|
||||
heat that the \acrshort{hvac} either pumps in or takes out of the building at
|
||||
any point in time. This value could be approximated from the information of
|
||||
electrical power consumption and the \acrshort{eer}/\acrshort{cop} values given
|
||||
that it is known if the \acrshort{hvac} is in either heating or cooling mode.
|
||||
|
||||
This information lacking in the existing experimental datasets, the heat
|
||||
supplied/ taken out of the system has been inferred from the electrical energy
|
||||
use, measured building temperature and \acrshort{hvac} temperature set point,
|
||||
with the assumption that the \acrshort{hvac} is in cooling mode whenever the
|
||||
measurements are higher than the set point temperature, and is in heating mode
|
||||
otherwise. As it can already be seen in Figure~\ref{fig:Polydome_exp7_settemp},
|
||||
this is a very strong assumption, that is not necessarily always correct. It
|
||||
works well when the measurements are very different from the set point, as is
|
||||
the case in the first part of the experiment, but this assumption is false for
|
||||
the second part of the experiment, where the set point temperature remains fixed
|
||||
and it is purely the \acrshort{hvac}'s job to regulate the temperature.
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width = \textwidth]{Plots/CARNOT_comparison_1.pdf}
|
||||
\includegraphics[width = \textwidth]{Plots/CARNOT_comparison_2.pdf}
|
||||
\caption{Measured vs CARNOT simulated temperature for the Experimental
|
||||
Datasets}
|
||||
\label{fig:CARNOT_simulation_validation}
|
||||
\end{figure}
|
||||
|
||||
The results of the seven simulations are presented in
|
||||
Figure~\ref{fig:CARNOT_simulation_validation}. Overall, the simulated
|
||||
temperature has the same behaviour as the real \pdome\ measurements. A more
|
||||
detailed inspection shows that for most of the experiments, the simulated
|
||||
temperature is much more volatile than the true measurements. This could be due
|
||||
to an overestimated value of the Air Exchange Rate, underestimated amount of
|
||||
furniture in the building or, more probably, miscalculation of the
|
||||
\acrshort{hvac}'s heating/cooling mode. Of note is the large difference in
|
||||
behaviour for the Experiments 5 and 6. In fact, for these experiments, the
|
||||
values for the electrical power consumption greatly differ in shape from the
|
||||
ones presented in the other datasets, which could potentially mean erroneous
|
||||
measurements, or some other underlying problem with the data.
|
||||
|
||||
Finally, it is possible to conclude that the CARNOT building behaves comparably
|
||||
to the real \pdome\, even if not perfectly simulates its behaviour. These
|
||||
differences could come from multiple factors --- missing information that had
|
||||
to be inferred and/or approximated, such as the Air Exchange Ratio, the heat
|
||||
provided/extracted, the amount of furniture in the building, the overall shape
|
||||
and size of the building, as well as possibly errors in the experimental data
|
||||
used for validation. A more detailed analysis of the building parameters would
|
||||
have to be done in order to find the reasons and eliminate these discrepancies.
|
||||
|
||||
\clearpage
|
511
Sections/50_Choice_of_Hyperparameters.tex
Normal file
511
Sections/50_Choice_of_Hyperparameters.tex
Normal file
|
@ -0,0 +1,511 @@
|
|||
\section{Choice of Hyperparameters}~\label{sec:hyperparameters}
|
||||
|
||||
This section will discuss and try to validate the choice of all the
|
||||
hyperparameters necessary for the training of a \acrshort{gp} model to capture
|
||||
the CARNOT building's behaviour.
|
||||
|
||||
The class of black-box models is very versatile, being able to capture plant
|
||||
behaviour directly from data. This comes in contrast to white-box and grey-box
|
||||
modelling techniques, which require much more physical insight into the plant's
|
||||
behaviour.
|
||||
|
||||
On the flip side, this versatility comes at the cost of having to properly
|
||||
define the model hyperparameters: the number of regressors, the number of
|
||||
autoregressive lags for each class of inputs, the shape of the covariance
|
||||
function have to be taken into account when designing a \acrshort{gp} model.
|
||||
These choices have direct influence on the resulting model behaviour and where
|
||||
it can be generalized, as well as indirect influence in the form of more time
|
||||
consuming computations in the case of larger number of regressors and more
|
||||
complex kernel functions.
|
||||
|
||||
As described in Section~\ref{sec:gp_dynamical_system}, for the purpose of this
|
||||
project, the \acrlong{gp} model will be trained using the \acrshort{narx}
|
||||
structure. This already presents an important choice in the selection of
|
||||
regressors and their respective autoregressive lags.
|
||||
|
||||
The output of the model has been chosen as the \textit{temperature measured
|
||||
inside} the CARNOT building. This is a suitable choice for the \acrshort{ocp}
|
||||
defined in Section~\ref{sec:mpc_problem}, where the goal is tracking as close as
|
||||
possible the inside temperature of the building.
|
||||
|
||||
The input of the \acrshort{gp} model coincides with the input of the CARNOT
|
||||
building, namely the \textit{heat} passed to the idealized \acrshort{hvac},
|
||||
which is held constant at each step.
|
||||
|
||||
As for the exogenous inputs the choice turned out to be more complex. The CARNOT
|
||||
\acrshort{wdb} format (cf. Section~\ref{sec:CARNOT_WDB}) consists of information
|
||||
of all the solar angles, the different components of solar radiation, wind speed
|
||||
and direction, temperature, precipitation, etc. All of this information is
|
||||
required in order for CARNOT's proper functioning.
|
||||
|
||||
Including all of this information into the \acrshort{gp}s exogenous inputs would
|
||||
come with a few downsides. First, depending on the number of lags chosen for the
|
||||
exogenous inputs, the number of inputs to the \acrshort{gp} could be very large,
|
||||
an exogenous inputs vector of 10 elements with 2 lags would yield 20 inputs for
|
||||
the \acrshort{gp} model. This is very computationally expensive both for
|
||||
training and using the model, as its algorithmic complexity is
|
||||
$\mathcal{O}(n^3)$.
|
||||
|
||||
Second, this information may not always available in experimental
|
||||
implementations on real buildings, where legacy equipment might already be
|
||||
installed, or where budget restrictions call for simpler equipment. An example
|
||||
of this are the experimental datasets used for validation of the CARNOT model,
|
||||
where the only available weather information is the \acrshort{ghi} and the
|
||||
measurement of the outside temperature. This would also be a limitation when
|
||||
getting the weather predictions for the next steps during real-world
|
||||
experiments.
|
||||
|
||||
Last, while very verbose information, such as the solar angles and the components
|
||||
of the solar radiation is very useful for CARNOT which simulates each node
|
||||
individually knowing their absolute positions, this information would not
|
||||
always benefit the \acrshort{gp} model at least not comparably to the
|
||||
additional computational complexity.
|
||||
|
||||
For the exogenous inputs the choice has therefore been made to take the
|
||||
\textit{Global Solar Irradiance}\footnotemark and \textit{Outside Temperature
|
||||
Measurement}.
|
||||
|
||||
\footnotetext{Using the \acrshort{ghi} makes sense in the case of the \pdome\
|
||||
building which has windows facing all directions, as opposed to a room which
|
||||
would have windows on only one side.}
|
||||
|
||||
\subsection{The Kernel}
|
||||
|
||||
The covariance matrix is an important choice when creating the \acrshort{gp}. A
|
||||
properly chosen kernel can impose a prior desired behaviour on the
|
||||
\acrshort{gp} such as continuity of the function and its derivatives,
|
||||
periodicity, linearity, etc. On the flip side, choosing the wrong kernel can
|
||||
make computations more expensive, require more data to learn the proper
|
||||
behaviour or outright be numerically unstable and/or give erroneous predictions.
|
||||
|
||||
The \acrlong{se} kernel (cf. Section~\ref{sec:Kernels}) is very versatile,
|
||||
theoretically being able to fit any continuous function given enough data. When
|
||||
including the \acrshort{ard} behaviour, it also gives an insight into the
|
||||
relative importance of each regressor, though their respective lengthscales.
|
||||
|
||||
Many different kernels have been used when identifying models for building
|
||||
thermal control, such as a pure Rational Quadratic
|
||||
Kernel~\cite{pleweSupervisoryModelPredictive2020}, a combination of
|
||||
\acrshort{se}, \acrshort{rq} and a Linear
|
||||
Kernel~\cite{jainLearningControlUsing2018}, Squared Exponential Kernel and
|
||||
Kernels from the M\`atern family~\cite{massagrayThermalBuildingModelling2016}.
|
||||
|
||||
For the purpose of this project, the choice has been made to use the
|
||||
\textit{\acrlong{se} Kernel}, as it provides a very good balance of versatility
|
||||
and computational complexity for the modelling of the CARNOT building.
|
||||
|
||||
\subsection{Lengthscales}\label{sec:lengthscales}
|
||||
|
||||
The hyperparameters of the \acrshort{se} can be useful when studying the
|
||||
importance of regressors. The larger the distance between the two inputs, the
|
||||
less correlated they are. In fact, setting the kernel variance $\sigma^2$ = 1,
|
||||
we can compute the correlation of two inputs located at distance one, two and
|
||||
three lengthscales apart.
|
||||
|
||||
\begin{table}[ht]
|
||||
\centering
|
||||
\begin{tabular}{||c c ||}
|
||||
\hline
|
||||
$\norm{\mathbf{x} - \mathbf{x}'}$ &
|
||||
$\exp{(-\frac{1}{2}\times\frac{\norm{\mathbf{x} - \mathbf{x}'}^2}{l^2})}$ \\
|
||||
\hline \hline
|
||||
$1l$ & 0.606 \\
|
||||
$2l$ & 0.135 \\
|
||||
$3l$ & 0.011 \\
|
||||
\hline
|
||||
\end{tabular}
|
||||
\caption{Correlation of inputs relative to their distance}
|
||||
\label{tab:se_correlation}
|
||||
\end{table}
|
||||
|
||||
From Table~\ref{tab:se_correlation} is can be seen that at 3 lengthscales apart,
|
||||
the inputs are already almost uncorrelated. In order to better visualize this
|
||||
difference the value of \textit{relative lengthscale importance} is introduced:
|
||||
|
||||
\begin{equation}
|
||||
\lambda = \frac{1}{\sqrt{l}}
|
||||
\end{equation}
|
||||
|
||||
Another indicator of model behaviour is the variance of the identified
|
||||
\acrshort{se} kernel. The expected value of the variance is around the variance
|
||||
of the inputs. An extremely high or extremely low value of the variance could
|
||||
mean a numerically unstable model.
|
||||
|
||||
Table~\ref{tab:GP_hyperparameters} presents the relative lengthscale importances
|
||||
and the variance for different combinations of the exogenous input lags ($l_w$),
|
||||
the controlled input lags ($l_u$) and the output lags ($l_y$) for a classical
|
||||
\acrshort{gp} model.
|
||||
|
||||
% TODO: [Lengthscales] Explain lags and w1,1, w2, etc.
|
||||
|
||||
\begin{table}[ht]
|
||||
%\vspace{-8pt}
|
||||
\centering
|
||||
\resizebox{\columnwidth}{!}{%
|
||||
\begin{tabular}{||c c c|c|c c c c c c c c c c c||}
|
||||
\hline
|
||||
\multicolumn{3}{||c|}{Lags} & Variance &\multicolumn{11}{c||}{Kernel
|
||||
lengthscales relative importance} \\
|
||||
$l_w$ & $l_u$ & $l_y$ & $\sigma^2$ &$\lambda_{w1,1}$ & $\lambda_{w1,2}$ &
|
||||
$\lambda_{w1,3}$ & $\lambda_{w2,1}$ & $\lambda_{w2,2}$ &
|
||||
$\lambda_{w2,3}$ & $\lambda_{u1,1}$ & $\lambda_{u1,2}$ &
|
||||
$\lambda_{y1,1}$ & $\lambda_{y1,2}$ & $\lambda_{y1,3}$\\
|
||||
\hline \hline
|
||||
1 & 1 & 1 & 0.11 & 0.721 & & & 2.633 & & & 0.569 & & 2.645 & & \\
|
||||
1 & 1 & 2 & 22.68 & 0.222 & & & 0.751 & & & 0.134 & & 3.154 & 3.073
|
||||
& \\ 1 & 1 & 3 & 0.29 & 0.294 & & & 1.303 & & & 0.356 & & 2.352
|
||||
& 1.361 & 2.045 \\ 1 & 2 & 1 & 7.55 & 0.157 & & & 0.779 & & &
|
||||
0.180 & 0.188 & 0.538 & & \\ 1 & 2 & 3 & 22925.40 & 0.018 & & &
|
||||
0.053 & & & 0.080 & 0.393 & 0.665 & 0.668 & 0.018 \\ 2 & 1 & 2 & 31.53
|
||||
& 0.010 & 0.219 & & 0.070 & 0.719 & & 0.123 & & 3.125 & 3.044 &
|
||||
\\ 2 & 1 & 3 & 0.44 & 0.007 & 0.251 & & 0.279 & 1.229 & & 0.319
|
||||
& & 2.705 & 1.120 & 2.510 \\ 3 & 1 & 3 & 0.56 & 0.046 &
|
||||
0.064 & 0.243 & 0.288 & 1.151 & 0.233 & 0.302 & & 2.809 & 1.086 &
|
||||
2.689 \\ 3 & 2 & 2 & 1.65 & 0.512 & 0.074 & 0.201 & 0.161 & 1.225
|
||||
& 0.141 & 0.231 & 0.331 & 0.684 & 0.064 & \\
|
||||
\hline
|
||||
\end{tabular}%
|
||||
}
|
||||
\caption{GP hyperparameter values for different autoregressive lags}
|
||||
\label{tab:GP_hyperparameters}
|
||||
\end{table}
|
||||
|
||||
In general, the results of Table~\ref{tab:GP_hyperparameters} show that the past
|
||||
outputs are important when predicting future values. Of importance is also the
|
||||
past inputs, with the exception of the models with very high variance, where the
|
||||
relative importances stay almost constant across all the inputs. For example, it
|
||||
can be seen that for the exogenous inputs, the outside temperature ($w2$) is
|
||||
generally more important than the solar irradiation ($w1$). In the case of more
|
||||
autoregressive lags for the exogenous inputs the more recent information is
|
||||
usually more important in the case of the solar irradiation, while the
|
||||
second-to-last measurement is preffered for the outside temperature.
|
||||
|
||||
For the classical \acrshort{gp} model the appropriate choice of lags would be
|
||||
$l_u = 1$ and $l_y = 3$ with $l_w$ taking the values of either 1, 2 or 3,
|
||||
depending on the results of further analysis.
|
||||
|
||||
|
||||
As for the case of the \acrlong{svgp}, the results for the classical
|
||||
\acrshort{gp} (cf. Table~\ref{tab:GP_hyperparameters}) are not necessarily
|
||||
representative of the relationships between the regressors of the
|
||||
\acrshort{svgp} model, due to the fact that the dataset used for training is
|
||||
composed of the \textit{inducing variables}, which are not the real data, but a
|
||||
set of parameters chosen by the training algorithm in a way to best generate the
|
||||
original data.
|
||||
|
||||
Therefore to better understand the behaviour of the \acrshort{svgp} models, the
|
||||
same computations as in Table~\ref{tab:GP_hyperparameters} have been made,
|
||||
presented in Table~\ref{tab:SVGP_hyperparameters}:
|
||||
|
||||
\begin{table}[ht]
|
||||
%\vspace{-8pt}
|
||||
\centering
|
||||
\resizebox{\columnwidth}{!}{%
|
||||
\begin{tabular}{||c c c|c|c c c c c c c c c c c||}
|
||||
\hline
|
||||
\multicolumn{3}{||c|}{Lags} & Variance &\multicolumn{11}{c||}{Kernel
|
||||
lengthscales relative importance} \\
|
||||
$l_w$ & $l_u$ & $l_y$ & $\sigma^2$ &$\lambda_{w1,1}$ & $\lambda_{w1,2}$ &
|
||||
$\lambda_{w1,3}$ & $\lambda_{w2,1}$ & $\lambda_{w2,2}$ &
|
||||
$\lambda_{w2,3}$ & $\lambda_{u1,1}$ & $\lambda_{u1,2}$ &
|
||||
$\lambda_{y1,1}$ & $\lambda_{y1,2}$ & $\lambda_{y1,3}$\\
|
||||
\hline \hline
|
||||
1 & 1 & 1 & 0.2970 & 0.415 & & & 0.748 & & & 0.675 & & 0.680 & & \\
|
||||
1 & 1 & 2 & 0.2717 & 0.430 & & & 0.640 & & & 0.687 & & 0.559 & 0.584 & \\
|
||||
1 & 1 & 3 & 0.2454 & 0.455 & & & 0.589 & & & 0.671 & & 0.522 & 0.512 & 0.529 \\
|
||||
1 & 2 & 1 & 0.2593 & 0.310 & & & 0.344 & & & 0.534 & 0.509 & 0.597 & & \\
|
||||
1 & 2 & 3 & 0.2139 & 0.330 & & & 0.368 & & & 0.537 & 0.447 & 0.563 & 0.410 & 0.363 \\
|
||||
2 & 1 & 2 & 0.2108 & 0.421 & 0.414 & & 0.519 & 0.559 & & 0.680 & & 0.525 & 0.568 & \\
|
||||
2 & 1 & 3 & 0.1795 & 0.456 & 0.390 & & 0.503 & 0.519 & & 0.666 & & 0.508 & 0.496 & 0.516 \\
|
||||
3 & 1 & 3 & 0.1322 & 0.432 & 0.370 & 0.389 & 0.463 & 0.484 & 0.491 & 0.666 & & 0.511 & 0.501 & 0.526 \\
|
||||
3 & 2 & 2 & 0.1228 & 0.329 & 0.317 & 0.325 & 0.334 & 0.337 & 0.331 &
|
||||
0.527 & 0.441 & 0.579 & 0.435 & \\
|
||||
\hline
|
||||
\end{tabular}%
|
||||
}
|
||||
\caption{SVGP hyperparameter values for different autoregressive lags}
|
||||
\label{tab:SVGP_hyperparameters}
|
||||
\end{table}
|
||||
|
||||
The results of Table~\ref{tab:SVGP_hyperparameters} are not very surprising, even
|
||||
if very different from the classical \acrshort{gp} case. The kernel variance is
|
||||
always of a reasonable value, and the relative importance of the lengthscales is
|
||||
relatively constant across the board. It is certainly harder to interpret these
|
||||
results as pertaining to the relevance of the chosen regressors. For the
|
||||
\acrshort{svgp} model, the choice of the autoregressive lags has been made
|
||||
purely on the values of the loss functions, presented in
|
||||
Table~\ref{tab:SVGP_loss_functions}.
|
||||
|
||||
\subsection{Loss functions}
|
||||
|
||||
The most important metric for measuring the performance of a model is the value
|
||||
of the loss function, computed on a dataset separate from the one used for
|
||||
training.
|
||||
|
||||
There exist a number of different loss functions, each focusing on different
|
||||
aspects of a model's performance. A selection of loss functions used in
|
||||
identification of Gaussian Process models is presented
|
||||
below~\cite{kocijanModellingControlDynamic2016}.
|
||||
|
||||
The \acrfull{rmse} is a very commonly used performance measure. As the name
|
||||
suggests, it computes the root of the mean squared error:
|
||||
|
||||
\begin{equation}\label{eq:rmse}
|
||||
\text{RMSE} = \sqrt{\frac{1}{N}\sum_{i=1}^N \left(y_i -
|
||||
E(\hat{y}_i)\right)^{2}}
|
||||
\end{equation}
|
||||
|
||||
This performance metric is very useful when training a model whose goal is
|
||||
solely to minimize the difference between the measured values, and the ones
|
||||
predicted by the model.
|
||||
|
||||
A variant of the \acrshort{mse} is the \acrfull{smse}, which normalizes the
|
||||
\acrlong{mse} by the variance of the output values of the validation dataset.
|
||||
|
||||
\begin{equation}\label{eq:smse}
|
||||
\text{SMSE} = \frac{1}{N}\frac{\sum_{i=1}^N \left(y_i -
|
||||
E(\hat{y}_i)\right)^{2}}{\sigma_y^2}
|
||||
\end{equation}
|
||||
|
||||
While the \acrshort{rmse} and the \acrshort{smse} are very good at ensuring the
|
||||
predicted mean value of the Gaussian Process is close to the measured values of
|
||||
the validation dataset, the confidence of the Gaussian Process prediction is
|
||||
completely ignored. In this case, two models predicting the same mean values but
|
||||
having very different confidence intervals, would be equivalent according to these
|
||||
performance metrics.
|
||||
|
||||
The \acrfull{lpd} is a performance metric which takes into account not only the
|
||||
mean value of the GP prediction, but the entire distribution:
|
||||
|
||||
\begin{equation}
|
||||
\text{LPD} = \frac{1}{2} \ln{\left(2\pi\right)} + \frac{1}{2N}
|
||||
\sum_{i=1}^N\left(\ln{\left(\sigma_i^2\right)} + \frac{\left(y_i -
|
||||
E(\hat{y}_i)\right)^{2}}{\sigma_i^2}\right)
|
||||
\end{equation}
|
||||
|
||||
where $\sigma_i^2$ is the model's output variance at the \textit{i}-th step.
|
||||
The \acrshort{lpd} scales the error of the mean value prediction $\left(y_i -
|
||||
E(\hat{y}_i)\right)^{2}$ by the variance $\sigma_i^2$. This means that the
|
||||
overconfident models get penalized more than the more conservative models for
|
||||
the same mean prediction error, leading to models that better represent
|
||||
the real system.
|
||||
|
||||
The \acrfull{msll} is obtained by subtracting the loss of the model that
|
||||
predicts using a Gaussian with the mean $E(\boldsymbol{y})$ and variance
|
||||
$\sigma_y^2$ of the measured data from the model \acrshort{lpd} and taking the
|
||||
mean of the obtained result:
|
||||
|
||||
\begin{equation}
|
||||
\text{MSLL} = \frac{1}{2N}\sum_{i=1}^N\left[
|
||||
\ln{\left(\sigma_i^2\right) + \frac{\left(y_i -
|
||||
E\left(\hat{y}_i\right)\right)^2}{\sigma_i^2}}
|
||||
\right] - \frac{1}{2N}\sum_{i=1}^N\left[
|
||||
\ln{\left(\sigma_y^2\right) + \frac{\left(y_i -
|
||||
E\left(\boldsymbol{y}\right)\right)^2}{\sigma_y^2}}
|
||||
\right]
|
||||
\end{equation}
|
||||
|
||||
The \acrshort{msll} is approximately zero for simple models and negative for
|
||||
better ones.
|
||||
|
||||
Table~\ref{tab:GP_loss_functions} and Table~\ref{tab:SVGP_loss_functions}
|
||||
present the values of the different loss functions for the same lag combinations
|
||||
as the ones analyzed in Section~\ref{sec:lengthscales} for the classical
|
||||
\acrshort{gp} and the \acrshort{svgp} models respectively:
|
||||
|
||||
\begin{table}[ht]
|
||||
%\vspace{-8pt}
|
||||
\centering
|
||||
\begin{tabular}{||c c c|c c c c||}
|
||||
\hline
|
||||
\multicolumn{3}{||c|}{Lags} & \multicolumn{4}{c||}{Loss functions}\\
|
||||
$l_w$ & $l_u$ & $l_y$ & RMSE & SMSE & MSLL & LPD\\
|
||||
\hline \hline
|
||||
1 & 1 & 1 & 0.3464 & 0.36394 & 20.74 & 21.70 \\
|
||||
1 & 1 & 2 & 0.1415 & 0.06179 & -9.62 & -8.67 \\
|
||||
1 & 1 & 3 & 0.0588 & 0.01066 & -8.99 & -8.03 \\
|
||||
1 & 2 & 1 & 0.0076 & 0.00017 & 71.83 & 72.79 \\
|
||||
1 & 2 & 3 & \textbf{0.0041} & \textbf{0.00005} & 31.25 & 32.21 \\
|
||||
2 & 1 & 2 & 0.1445 & 0.06682 & -9.57 & -8.61 \\
|
||||
2 & 1 & 3 & 0.0797 & 0.02033 & -10.94 & -9.99 \\
|
||||
3 & 1 & 3 & 0.0830 & 0.02219 & \textbf{-11.48} & \textbf{-10.53} \\
|
||||
3 & 2 & 2 & 0.0079 & 0.00019 & 58.30 & 59.26 \\
|
||||
\hline
|
||||
\end{tabular}
|
||||
\caption{GP Loss function values for different autoregressive lags}
|
||||
\label{tab:GP_loss_functions}
|
||||
\end{table}
|
||||
|
||||
For the classical \acrshort{gp} model (cf. Table~\ref{tab:GP_loss_functions}) a
|
||||
number of different lag combinations give rise to models with very large
|
||||
\acrshort{msll}/\acrshort{lpd} values. This might indicate that those models are
|
||||
overconfident, either due to the very large kernel variance parameter, or the
|
||||
specific lengthscales combinations. The model with the best
|
||||
\acrshort{rmse}/\acrshort{smse} metrics \model{1}{2}{3} had very bad
|
||||
\acrshort{msll} and \acrshort{lpd} metrics, as well as by far the largest
|
||||
variance of all the combinations. On the contrary, the \model{3}{1}{3} model has
|
||||
the best \acrshort{msll} and \acrshort{lpd} performance, while still maintaining
|
||||
small \acrshort{rmse} and \acrshort{smse} values. The inconvenience of this set
|
||||
of lags is the large number of regressors, which leads to much more expensive
|
||||
computations. Other good choices for the combinations of lags are
|
||||
\model{2}{1}{3} and \model{1}{1}{3}, which have good performance on all four
|
||||
metrics, as well as being cheaper from a computational perspective. In order to
|
||||
make a more informed choice for the best hyperparameters, the simulation
|
||||
performance of all three combinations has been analyzed.
|
||||
|
||||
\begin{table}[ht]
|
||||
%\vspace{-8pt}
|
||||
\centering
|
||||
\begin{tabular}{||c c c|c c c c||}
|
||||
\hline
|
||||
\multicolumn{3}{||c|}{Lags} & \multicolumn{4}{c||}{Loss functions}\\
|
||||
$l_w$ & $l_u$ & $l_y$ & RMSE & SMSE & MSLL & LPD\\
|
||||
\hline \hline
|
||||
1 & 1 & 1 & 0.3253 & 0.3203 & 228.0278 & 228.9843 \\
|
||||
1 & 1 & 2 & 0.2507 & 0.1903 & 175.5525 & 176.5075 \\
|
||||
1 & 1 & 3 & 0.1983 & 0.1192 & 99.7735 & 100.7268 \\
|
||||
1 & 2 & 1 & 0.0187 & 0.0012 & -9.5386 & -8.5836 \\
|
||||
1 & 2 & 3 & \textbf{0.0182} & \textbf{0.0011} & \textbf{-10.2739} &
|
||||
\textbf{-9.3206} \\
|
||||
2 & 1 & 2 & 0.2493 & 0.1884 & 165.0734 & 166.0284 \\
|
||||
2 & 1 & 3 & 0.1989 & 0.1200 & 103.6753 & 104.6287 \\
|
||||
3 & 1 & 3 & 0.2001 & 0.1214 & 104.4147 & 105.3681 \\
|
||||
3 & 2 & 2 & 0.0206 & 0.0014 & -9.9360 & -8.9826 \\
|
||||
\hline
|
||||
\end{tabular}
|
||||
\caption{SVGP Loss function values for different autoregressive lags}
|
||||
\label{tab:SVGP_loss_functions}
|
||||
\end{table}
|
||||
|
||||
The results for the \acrshort{svgp} model, presented in
|
||||
Table~\ref{tab:SVGP_loss_functions} are much less ambiguous. The \model{1}{2}{3}
|
||||
model has the best performance according to all four metrics, with most of the
|
||||
other combinations scoring much worse on the \acrshort{msll} and \acrshort{lpd}
|
||||
loss functions. This has, therefore, been chosen as the model for the full year
|
||||
simulations.
|
||||
|
||||
\subsection{Validation of hyperparameters}\label{sec:validation_hyperparameters}
|
||||
|
||||
The validation step has the purpose of testing the viability of the trained
|
||||
models. If choosing a model according to loss function values on a new dataset
|
||||
is a way of minimizing the possibility of over fitting the model to the training
|
||||
data, validating the model by analyzing its multi-step prediction performance
|
||||
ensures the model was able to learn the correct dynamics and is useful in
|
||||
simulation scenarios.
|
||||
|
||||
The following subsections analyze the performance of the trained \acrshort{gp}
|
||||
and \acrshort{svgp} models over 20-step ahead predictions. For the \acrshort{gp}
|
||||
model the final choice of parameters is made according to the simulation
|
||||
performance. The simulation performance of the \acrshort{svgp} model is compared
|
||||
to that of the classical models while speculating on the possible reasons for
|
||||
the discrepancies.
|
||||
|
||||
\subsubsection{Conventional Gaussian Process}
|
||||
|
||||
The simulation performance of the three lag combinations chosen for the
|
||||
classical \acrlong{gp} models has been analyzed, with the results presented in
|
||||
Figures~\ref{fig:GP_113_multistep_validation},~\ref{fig:GP_213_multistep_validation}
|
||||
and~\ref{fig:GP_313_multistep_validation}. For reference, the one-step ahead
|
||||
predictions for the training and test datasets are presented in
|
||||
Appendix~\ref{apx:hyperparams_gp}.
|
||||
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width =
|
||||
\textwidth]{Plots/GP_113_-1pts_test_prediction_20_steps.pdf}
|
||||
\vspace{-25pt}
|
||||
\caption{20-step ahead simulation for \model{1}{1}{3}}
|
||||
\label{fig:GP_113_multistep_validation}
|
||||
\end{figure}
|
||||
|
||||
In the case of the simplest model (cf.
|
||||
Figure~\ref{fig:GP_113_multistep_validation}), overall the predictions are quite
|
||||
good. The large deviation from true values starts happening at around 15 steps.
|
||||
This could impose an additional limit on the size of the control horizon of the
|
||||
\acrlong{ocp}.
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width =
|
||||
\textwidth]{Plots/GP_213_-1pts_test_prediction_20_steps.pdf}
|
||||
\vspace{-25pt}
|
||||
\caption{20-step ahead simulation for \model{2}{1}{3}}
|
||||
\label{fig:GP_213_multistep_validation}
|
||||
\end{figure}
|
||||
|
||||
The more complex model, presented in
|
||||
Figure~\ref{fig:GP_213_multistep_validation} has a much better prediction
|
||||
performance, with only two predictions out of a total of twenty five diverging
|
||||
at the later steps. Except for the late-stage divergence on the two predictions,
|
||||
this proves to be the best simulation model.
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width =
|
||||
\textwidth]{Plots/GP_313_-1pts_test_prediction_20_steps.pdf}
|
||||
\vspace{-25pt}
|
||||
\caption{20-step ahead simulation for \model{3}{1}{3}}
|
||||
\label{fig:GP_313_multistep_validation}
|
||||
\end{figure}
|
||||
|
||||
Lastly, \model{3}{1}{3} (cf. Figure~\ref{fig:GP_313_multistep_validation}) has a
|
||||
much worse simulation performance than the other two models. This could hint at
|
||||
an over fitting of the model on the training data.
|
||||
|
||||
\clearpage
|
||||
|
||||
This is consistent with the results found in Table~\ref{tab:GP_loss_functions}
|
||||
for the \acrshort{rmse} and \acrshort{smse}, as well as can be seen in
|
||||
Appendix~\ref{apx:hyperparams_gp}, Figure~\ref{fig:GP_313_test_validation},
|
||||
where \model{3}{1}{3} has much worse performance on the testing dataset predictions
|
||||
than the other two models.
|
||||
|
||||
The performance of the three models in simulation mode is consistent with the
|
||||
previously found results. It is of note that neither the model that scored the
|
||||
best on the \acrshort{rmse}/\acrshort{smse}, \model{1}{2}{3}, nor the one that
|
||||
had the best \acrshort{msll}/\acrshort{lpd}, perform the best under a simulation
|
||||
scenario. In the case of the former it is due to numerical instability, the
|
||||
training/ prediction often failing depending on the inputs. On the other hand,
|
||||
in the case of the latter, only focusing on the \acrshort{msll}/\acrshort{lpd}
|
||||
performance metrics can lead to very conservative models, that give good and
|
||||
confident one-step ahead predictions, while still unable to fit the true
|
||||
behaviour of the plant.
|
||||
|
||||
Overall, the \model{2}{1}{3} performed the best in the simulation scenario,
|
||||
while still having good performance on all loss functions. In implementation,
|
||||
however, this model turned out to be very unstable, and the more conservative
|
||||
\model{1}{1}{3} model was used instead.
|
||||
|
||||
\subsubsection{Sparse and Variational Gaussian Process}
|
||||
|
||||
For the \acrshort{svgp} models, only the performance of \model{1}{2}{3} was
|
||||
investigated, since it had the best performance according to all four loss
|
||||
metrics.
|
||||
|
||||
As a first validation step, it is of note that the \acrshort{svgp} model was
|
||||
able to accurately reproduce the training dataset with only 150 inducing
|
||||
locations (cf. Appendix~\ref{apx:hyperparams_svgp}). It also performs about as
|
||||
well as the better \acrshort{gp} models for the one step prediction on the
|
||||
testing datasets.
|
||||
|
||||
In the case of the simulation performance, presented in
|
||||
Figure~\ref{fig:SVGP_multistep_validation}, two things are of particular
|
||||
interest. First, all 25 simulations have good overall behaviour --- there are no
|
||||
simulations starting to exhibit erratic behaviour --- this is a good indicator
|
||||
for lack of over fitting. This behaviour is indicative of a more conservative
|
||||
model than the ones identified for the \acrshort{gp} models. It is also possible
|
||||
to conclude that given the same amount of data, the classical \acrshort{gp}
|
||||
models can better learn plant behaviour, provided the correct choice of
|
||||
regressors.
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width =
|
||||
\textwidth]{Plots/SVGP_123_test_prediction_20_steps.pdf}
|
||||
\caption{20-step ahead simulation for \model{1}{2}{3}}
|
||||
\label{fig:SVGP_multistep_validation}
|
||||
\end{figure}
|
||||
|
||||
|
||||
\clearpage
|
50
Sections/60_The_MPC_Problem.tex
Normal file
50
Sections/60_The_MPC_Problem.tex
Normal file
|
@ -0,0 +1,50 @@
|
|||
\section{The MPC Problem}\label{sec:mpc_problem}
|
||||
|
||||
The \acrlong{ocp} to be solved was chosen in such a way as to make the
|
||||
analysis of the models' performances more straightforward. The objective is
|
||||
tracking a defined reference temperature as close as possible, while ensuring
|
||||
the heat input stays within the HVAC capacity. The \textit{zero-variance} method
|
||||
is used for multi-step prediction when using an existing \acrshort{gp}. This
|
||||
does ignore the propagation of uncertainty for multi step ahead prediction, but
|
||||
even with its simplicity, this method has been proven to work
|
||||
~\cite{kocijanModellingControlDynamic2016,jainLearningControlUsing2018,
|
||||
pleweSupervisoryModelPredictive2020}.
|
||||
|
||||
The optimization problem is therefore defined as follows:
|
||||
|
||||
\begin{subequations}\label{eq:optimal_control_problem}
|
||||
\begin{align}
|
||||
& \text{minimize}
|
||||
& & \sum_{i=0}^{N-1} (\bar{y}_{t+i} - y_{ref, t})^2 \\
|
||||
& \text{subject to}
|
||||
& & \bar{y}_{t+i} = K_*K^{-1}\mathbf{x}_{t+i-1} \\
|
||||
&&& \mathbf{x}_{t+i-1} = \left[\mathbf{w}_{t+i-1},\quad
|
||||
\mathbf{u}_{t+i-1},\quad \mathbf{y}_{t+i-1}\right]^T \\
|
||||
\label{eq:components}
|
||||
&&& u_{t+i} \in \mathcal{U}
|
||||
\end{align}
|
||||
\end{subequations}
|
||||
|
||||
where $y_{ref, t}$ is the reference temperature at time t, $\mathcal{U}$ is the
|
||||
set of allowed inputs, $\mathbf{x}_{t}$ is the GP input vector at time t,
|
||||
composed of the exogenous autoregressive inputs $\mathbf{w}_{t}$, the
|
||||
autoregressive controlled inputs $\mathbf{u}_{t}$ and the autoregressive outputs
|
||||
$\mathbf{y}_{t}$.
|
||||
|
||||
\subsection{Temperature reference}\label{sec:reference_temperature}
|
||||
|
||||
The temperature reference for the controller has been taken as the mean value of
|
||||
the SIA~180:2014~\cite{sia180:2014ProtectionThermiqueProtection2014} temperature
|
||||
norm. It imposes a range of temperatures that are allowed for residential
|
||||
buildings based on the rolling 48h average outside temperature.
|
||||
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width = \textwidth]{Images/sia_180_2014.png}
|
||||
\caption{The SIA 180:2014 norm for residential building
|
||||
temperatures~\cite{sia180:2014ProtectionThermiqueProtection2014}}
|
||||
\label{fig:sia_temperature_norm}
|
||||
\end{figure}
|
||||
|
||||
\clearpage
|
201
Sections/70_Implementation.tex
Normal file
201
Sections/70_Implementation.tex
Normal file
|
@ -0,0 +1,201 @@
|
|||
\section{Implementation}\label{sec:implementation}
|
||||
|
||||
This section goes into the details of the implementation of the Simulink plant
|
||||
and Python controller setup.
|
||||
|
||||
A high-level view of the setup is presented in Figure~\ref{fig:setup_diagram}.
|
||||
The Simulink model's main responsibility is running the CARNOT simulation. It
|
||||
also has the task of providing the \acrshort{mpc} with information on the
|
||||
weather forecast, since the weather information for the simulation comes from a
|
||||
CARNOT \acrshort{wdb} object. A detailed view of all the information available
|
||||
in the \acrshort{wdb} object is given in Section~\ref{sec:CARNOT_WDB}.
|
||||
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width = 0.5\textwidth]{Images/setup_diagram.pdf}
|
||||
\caption{Block diagram of the Simulink plant and Python Controller}
|
||||
\label{fig:setup_diagram}
|
||||
\end{figure}
|
||||
|
||||
\subsection{Simulink Model}
|
||||
|
||||
The secondary functions of the Simulink model is the weather prediction, as well
|
||||
as communication with the Python controller. A complete schematic of the
|
||||
Simulink setup is presented in Figure~\ref{fig:Simulink_complete}.
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width = 0.75\textwidth]{Images/polydome_python.pdf}
|
||||
\caption{Simulink diagram of the Complete Simulation}
|
||||
\label{fig:Simulink_complete}
|
||||
\end{figure}
|
||||
|
||||
The communication between Simulink and the controller is done using three
|
||||
separate TCP/IP sockets: one for sending the control signal, one for reading the
|
||||
temperature measurement, and one for reading the weather forecast. This is
|
||||
mainly due to a Simulink limitation which can only transfer one signal over a
|
||||
single socket. This implementation also has the benefit of providing an
|
||||
additional layer of abstraction for the controller and the controlled plant: as
|
||||
long as the measurements, the actuators and the weather prediction can
|
||||
communicate over TCP, these elements can all be implemented completely
|
||||
separately, which is much more similar to a real-life implementation.
|
||||
|
||||
With this structure, the only information received and sent by the Python
|
||||
controller is the actual sampled data, without any additional information.
|
||||
While the controller needs information on the control horizon in order to read
|
||||
the correct amount of data for the weather predictions and to properly generate
|
||||
the optimization problem, the discrete/continuous transition and vice-versa
|
||||
happens on the Simulink side. This simplifies the adjustment of the sampling
|
||||
time, with the downside of harder inclusion of meta-data such as hour of the
|
||||
day, day of the week, etc.\ in the \acrlong{gp} Model.
|
||||
|
||||
The weather prediction is done using the information present in the CARNOT
|
||||
\acrshort{wdb} object. Since the sampling time and control horizon of the
|
||||
controller can be adjusted, the required weather predictions can lie within an
|
||||
arbitrary time interval. At each sampling point, the weather measurements are
|
||||
piece-wise linearly interpolated for the span of time ranging from the most
|
||||
recent measurement to the next measurement after the last required prediction
|
||||
time. This provides a better approximation than pure linear interpolation over
|
||||
the starting and ending points, while retaining a simple implementation.
|
||||
|
||||
\subsection{Gaussian Processes}
|
||||
|
||||
As described in Section~\ref{sec:gaussian_processes}, both training and
|
||||
evaluating a \acrshort{gp} has an algorithmic complexity of $\mathcal{O}(n^3)$.
|
||||
This means that naive implementations can get too expensive in terms of
|
||||
computation time very quickly.
|
||||
|
||||
In order to have as smallest of a bottleneck as possible when dealing with
|
||||
\acrshort{gp}s, a very fast implementation of \acrlong{gp} Models was used, in
|
||||
the form of GPflow~\cite{matthewsGPflowGaussianProcess2017}. It is based on
|
||||
TensorFlow~\cite{tensorflow2015-whitepaper}, which has very efficient
|
||||
implementation of all the necessary Linear Algebra operations. Another benefit
|
||||
of this implementation is the very simple use of any additional computational
|
||||
resources, such as a GPU, TPU, etc.
|
||||
|
||||
\subsubsection{Classical Gaussian Process training}
|
||||
|
||||
For the training of the classical \acrshort{gp} models, the Scipy optimizer
|
||||
provided by GPflow was used. By default, it uses the `L-BFGS-B' optimization
|
||||
method, which runs until a local minimum of the negative log likelihood is
|
||||
found. Since in the present implementation the \acrshort{gp} models are trained
|
||||
only once, with a small amount of initial data, this approach was not only
|
||||
sufficient, but also faster than the more complex implementation of training
|
||||
used for \acrshort{svgp} models.
|
||||
|
||||
\subsubsection{Sparse and Variational Gaussian Process training}
|
||||
|
||||
The \acrshort{svgp} models have a more involved optimization procedure due to to
|
||||
several factors. First, when training an \acrshort{svgp} model, the optimization
|
||||
objective is the value of the \acrshort{elbo} (cf. Section~\ref{sec:elbo}).
|
||||
After several implementations, the more complex \textit{Adam} optimizer turned
|
||||
out to provide much faster convergence compared to other optimizers. Second, in
|
||||
the case of updating the model once per day with all the historical information,
|
||||
the training dataset keeps getting larger each time. In order to combat this,
|
||||
the sparse model was trained on minibatches of 1000 datapoints for 10000
|
||||
iterations. Evaluating the \acrshort{elbo} on minibatches provide an unbiased
|
||||
estimate of the actual value, given enough training iterations. This
|
||||
implementation has the benefit of taking constant training time, which becomes
|
||||
important later in the simulation, where the training dataset is much larger
|
||||
than the initial amount of data.
|
||||
|
||||
\subsection{Optimal Control Problem}
|
||||
|
||||
The \acrlong{ocp} has been implemented using the
|
||||
CasADi~\cite{anderssonCasADiSoftwareFramework2019} algorithmic differentiation
|
||||
framework. It provides an interface between a high-level definition of the
|
||||
optimization problems, and the very efficient low-level solvers.
|
||||
|
||||
The implementation of the \acrshort{ocp} defined in
|
||||
Equation~\ref{eq:optimal_control_problem} has a few particularities, discussed
|
||||
in the following subsections.
|
||||
|
||||
\subsubsection{Integrating GPflow models in CasADi}
|
||||
|
||||
The first particularity of the implementing the \acrshort{ocp} with CasADi is
|
||||
the use of the CasADi callback objects. The purpose of these objects is
|
||||
integration of external functions into CasADi. Generally, using callbacks is not
|
||||
advised because each call to the external function incurs additional overhead.
|
||||
These callbacks usually can't directly provide information on the
|
||||
forward/reverse sensitivities, used by CasADi to drastically reduce computation
|
||||
time. In the absence of this information, CasADi has to do many evaluations
|
||||
around a point in order to approximate the gradients. TensorFlow keeps track of
|
||||
all the computational graph's gradients, which can be accessed at a cost
|
||||
slightly higher than the evaluation cost of the function.
|
||||
|
||||
Integrating the gradient information into the CasADi callbacks reduces the
|
||||
number of function calls by around an order of magnitude, which already
|
||||
drastically reduces computation time.
|
||||
|
||||
Another significant speed improvement comes from transforming the Python calls
|
||||
to TensorFlow into native tf-functions. This change incurs a small overhead the
|
||||
first time the optimization problem is run, since all the TensorFlow functions
|
||||
have to be compiled before execution, but afterwards speeds up the execution by
|
||||
around another order of magnitude.
|
||||
|
||||
The last optimization done to the CasADi implementation is the use of the MA27
|
||||
linear solver provided by the HSL optimization
|
||||
library~\cite{HSLCollectionFortran}. This change results in an speedup of around
|
||||
10\% compared to using the default MUMPS solver. While not as drastic as the
|
||||
other improvements, this still provides a significant reduction in the runtime
|
||||
of the whole year simulation.
|
||||
|
||||
\subsubsection{Sparse Implementation of the Optimization Problem}
|
||||
|
||||
The optimization problem as presented in
|
||||
Equation~\ref{eq:optimal_control_problem} becomes very nonlinear quite fast. In
|
||||
fact, due to the autoregressive structure of the \acrshort{gp}, the predicted
|
||||
temperature at time t is passed as an input to the model at time $t+1$. A simple
|
||||
recursive implementation of the Optimization Problem becomes intractable after
|
||||
only 3~---~4 prediction steps.
|
||||
|
||||
In order to solve this problem, a new OCP is introduced. It has a much sparser
|
||||
structure, in exchange for a larger number of variables. This turns out to be
|
||||
much faster to solve than the original problem.
|
||||
|
||||
Let $w_l$, $u_l$, and $y_l$ be the lengths of the state vector components
|
||||
$\mathbf{w}$, $\mathbf{u}$, $\mathbf{y}$ (cf. Equation~\ref{eq:components}).
|
||||
Also, let X be the matrix of all the system states over the optimization horizon
|
||||
and W be the matrix of the predicted disturbances for all the future steps. The
|
||||
original \acrlong{ocp} can be rewritten using index notation as:
|
||||
|
||||
\begin{subequations}\label{eq:sparse_optimal_control_problem}
|
||||
\begin{align}
|
||||
& \text{minimize}
|
||||
& & \sum_{i=2}^{N + 1} \left(X[i, w_l + u_l + 1] - y_{ref, t}\right)^2 \\
|
||||
& \text{subject to}
|
||||
& & X[i+1, w_l + u_l + 1] = K_*K^{-1}X[i, :] \quad \text{for} \quad
|
||||
i\in[1, N]\\
|
||||
&&& X[i, w_l + u_l + 2: ] = X[i, w_l+ u_l + 1: w_l + u_l + y_l - 1]\\
|
||||
&&& X[i, 1:w_l] = W[i, :] \\
|
||||
&&& X[i+1, w_l + 2: w_l + u_l] = X[i, w_l + 1: w_l + u_l - 1] \\
|
||||
&&& X[:, w_l + 1] \in \mathcal{U}
|
||||
\end{align}
|
||||
\end{subequations}
|
||||
|
||||
\subsection{Model Identification and Update using a Python server}
|
||||
|
||||
The Python server is responsible for the control part of the simulation. It
|
||||
delegates which controller is active, is responsible for training and updating
|
||||
the \acrshort{gp} and \acrshort{svgp} models, as well as keeping track of all
|
||||
the intermediate results for analysis.
|
||||
|
||||
In the beginning of the experiment there is no information available on the
|
||||
building's thermal behaviour. For this part of the simulation, the server
|
||||
switches to a \acrshort{pi} controller with random disturbances until it gathers
|
||||
enough data to train a \acrshort{gp} model. This ensures that the building is
|
||||
sufficiently excited to capture its dynamics, while maintaining the temperature
|
||||
within an acceptable range (~15 --- 25 $\degree$C).
|
||||
|
||||
Once enough data has been captured all the features are first scaled to the
|
||||
range [-1, 1] in order to reduce the possibility of numerical instabilities. The
|
||||
Python server then trains the \acrshort{gp} model and switches to tracking the
|
||||
appropriate SIA 180:2014 reference temperature (cf.
|
||||
Section~\ref{sec:reference_temperature}).
|
||||
|
||||
For the case of the \acrshort{svgp}, a new model is trained once enough data is
|
||||
gathered. The implementations tested were updated once a day, either on the
|
||||
whole historical set of data, or on a window of the last five days of data.
|
||||
|
||||
\clearpage
|
485
Sections/80_Results.tex
Normal file
485
Sections/80_Results.tex
Normal file
|
@ -0,0 +1,485 @@
|
|||
\section{Full-year Simulation Results}\label{sec:results}
|
||||
|
||||
|
||||
This section focuses on the presentation and interpretation of the year-long
|
||||
simulation of the control schemes presented previously. All the control schemes
|
||||
analyzed in this Section have used a sampling time of 15 minutes and a control
|
||||
horizon of 8 steps.
|
||||
|
||||
Section~\ref{sec:GP_results} analyzes the results of a conventional
|
||||
\acrlong{gp} Model trained on the first five days of gathered data. The model
|
||||
is then used for the rest of the year, with the goal of tracking the defined
|
||||
reference temperature.
|
||||
|
||||
Section~\ref{sec:SVGP_results} goes into details on the analysis of the learning
|
||||
scheme using a \acrshort{svgp} Model. In this scenario, the model is first
|
||||
trained on the first five days of data, and updates every day at midnight with
|
||||
the new information gathered from closed-loop operation.
|
||||
|
||||
\subsection{Conventional Gaussian Processes}\label{sec:GP_results}
|
||||
|
||||
The first simulation, to be used as a baseline comparison with the
|
||||
\acrshort{svgp} Models developed further, consists of using a `static'
|
||||
\acrshort{gp} model trained on five days worth of experimental data. This model
|
||||
is then employed for the rest of the year.
|
||||
|
||||
With a sampling time of 15 minutes, the model is trained on 480 points of data.
|
||||
This size of the identification dataset is enough to learn the behaviour of the
|
||||
plant, without being too complex to solve from a numerical perspective. The
|
||||
current implementation takes roughly 1.5 seconds of computation time per step.
|
||||
For reference, identifying a model on 15 days worth of experimental data (1440
|
||||
points) makes simulation time approximately 11 --- 14 seconds per step, or
|
||||
around eight time slower. This is consistent with the $\mathcal{O}(n^3)$
|
||||
complexity of evaluating a \acrshort{gp}.
|
||||
|
||||
The results of the simulation are presented in
|
||||
Figure~\ref{fig:GP_fullyear_simulation}. Overall, the performance of this model
|
||||
is not very good. The tracked temperature presents an offset of around 0.5
|
||||
$\degree$C in the stable part of the simulation. The offset becomes much larger
|
||||
once the reference temperature starts moving from the initial constant value.
|
||||
The controller becomes completely unstable around the middle of July, and can
|
||||
only regain some stability at the middle of October. It is also possible to note
|
||||
that from mid-October to end-December the controller has very similar
|
||||
performance to that exhibited in the beginning of the year, namely January to
|
||||
end-February.
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width =
|
||||
\textwidth]{Plots/4_GP_480pts_12_averageYear_fullyear.pdf}
|
||||
\caption{GP full year simulation}
|
||||
\label{fig:GP_fullyear_simulation}
|
||||
\end{figure}
|
||||
|
||||
This very large difference in performance could be explained by the change in
|
||||
weather during the year. The winter months of the beginning and end of the year
|
||||
exhibit similar performance. The spring months already make the controller less
|
||||
stable than at the start of the year, while the drastic temperature changes in
|
||||
the summer make the controller completely unstable.
|
||||
|
||||
Figure~\ref{fig:GP_fullyear_abserr} presents the absolute error measured at each
|
||||
step of the simulation over the course of the year. We can note a mean absolute
|
||||
error of 1.33 $\degree$C, with the largest deviations occurring in late summer
|
||||
where the absolute error can reach extreme values, and the `best' performance
|
||||
occurring during the winter months.
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width =
|
||||
\textwidth]{Plots/4_GP_480pts_12_averageYear_abserr.pdf}
|
||||
\caption{GP full year absolute error}
|
||||
\label{fig:GP_fullyear_abserr}
|
||||
\end{figure}
|
||||
|
||||
Figure~\ref{fig:GP_first_model_performance} analyzes the 20-step ahead
|
||||
simulation performance of the identified model over the course of the year. At
|
||||
experimental step 250, the controller is still gathering data. It is therefore
|
||||
expected that the identified model will be capable of reproducing this data. At
|
||||
step 500, 20 steps after identification, the model correctly steers the internal
|
||||
temperature towards the reference temperature. On the flip side, already at
|
||||
experimental steps 750 and 1000, only 9 days into the simulation, the model is
|
||||
unable to properly simulate the behaviour of the plant, with the maximum
|
||||
difference at the end of the simulation reaching 0.75 $\degree$C and 1.5
|
||||
$\degree$C respectively.
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width =
|
||||
\textwidth]{Plots/4_GP_480pts_12_averageYear_first_model_performance.pdf}
|
||||
\caption{GP model performance}
|
||||
\label{fig:GP_first_model_performance}
|
||||
\end{figure}
|
||||
|
||||
This large difference of performance could be explained by the change in outside
|
||||
weather (solar irradiance and outside temperature --- the exogenous inputs) from
|
||||
the one present during the training phase. It can be seen in
|
||||
Figure~\ref{fig:Dataset_outside_temperature} that already at 500 points in the
|
||||
simulation both the GHI and the Outside Temperature are outside of the training
|
||||
ranges, with the latter exhibiting a much larger variation.
|
||||
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width =
|
||||
\textwidth]{Plots/Exogenous_inputs_fullyear.pdf}
|
||||
\caption{Exogenous inputs for the simulation}
|
||||
\label{fig:Dataset_outside_temperature}
|
||||
\end{figure}
|
||||
|
||||
Finally, it is possible to conclude that this approach does not perform well due
|
||||
to several causes:
|
||||
|
||||
\begin{itemize}
|
||||
\item The size of the training dataset is limited by the computation budget
|
||||
\item The model does not extrapolate correctly the information on
|
||||
disturbances
|
||||
\item The model stays fixed for the duration of the year, being unable to
|
||||
adapt to new weather conditions.
|
||||
\end{itemize}
|
||||
|
||||
These problems could be solved in several ways, such as periodically
|
||||
re-identifying the model to fit the current weather pattern. This approach would
|
||||
be quite cumbersome due to repeated need of disturbing the model in order to
|
||||
sufficiently excite it. Another approach would be to keep the whole historical
|
||||
dataset of measurements, which quickly renders the problem intractable. More
|
||||
complex solutions, such as keeping a fixed-size data dictionary whose points are
|
||||
deleted when they no longer help the predictions and new points are added as
|
||||
they are deemed useful or compiling the training dataset with multiple
|
||||
experiments in different weather conditions could dramatically improve model
|
||||
performance, but are more complex in implementation.
|
||||
|
||||
|
||||
\subsection{Sparse and Variational Gaussian Process}\label{sec:SVGP_results}
|
||||
|
||||
The \acrlong{svgp} models are setup in a similar way as described before. The
|
||||
model is first identified using 5 days worth of experimental data collected
|
||||
using a \acrshort{pi} controller and a random disturbance signal. The difference
|
||||
lies in the fact than the \acrshort{svgp} model gets re-identified every night
|
||||
at midnight using the newly accumulated data from closed-loop operation.
|
||||
|
||||
The results of this setup are presented in
|
||||
Figure~\ref{fig:SVGP_fullyear_simulation}. It can already be seen that this
|
||||
setup performs much better than the initial one. The only large deviations from
|
||||
the reference temperature are due to cold weather, when the \acrshort{hvac}'s
|
||||
limited heat capacity is unable to maintain the proper temperature.
|
||||
Additionnaly, the \acrshort{svgp} controller takes around 250 - 300ms of
|
||||
computation time for each simulation time, decreasing the computational cost of
|
||||
the original \acrshort{gp} by a factor of six.
|
||||
|
||||
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width =
|
||||
\textwidth]{Plots/1_SVGP_480pts_inf_window_12_averageYear_fullyear.pdf}
|
||||
\caption{SVGP full year simulation}
|
||||
\label{fig:SVGP_fullyear_simulation}
|
||||
\end{figure}
|
||||
|
||||
\clearpage
|
||||
|
||||
Comparing the absolute error of the measured vs reference temperature for the
|
||||
duration of the experiment (cf. Figure~\ref{fig:SVGP_fullyear_abserr}) with the
|
||||
one of the original experiment, the average absolute error is reduced from 1.33
|
||||
$\degree$C to only 0.05 $\degree$C, with the majority of the values being lower
|
||||
than 0.4 $\degree$ C.
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width =
|
||||
\textwidth]{Plots/1_SVGP_480pts_inf_window_12_averageYear_abserr.pdf}
|
||||
\caption{SVGP full year absolute error}
|
||||
\label{fig:SVGP_fullyear_abserr}
|
||||
\end{figure}
|
||||
|
||||
Figures~\ref{fig:SVGP_first_model_performance},
|
||||
~\ref{fig:SVGP_later_model_performance}
|
||||
and~\ref{fig:SVGP_last_model_performance} show the 20-step simulation performance of three
|
||||
different models, identified at three different stages of the experiment. They
|
||||
have all been set to simulate 25 consecutive experimental steps starting at
|
||||
steps 250, 500, 10750 and 11000 respectively.
|
||||
|
||||
The initial model (cf. Figure~\ref{fig:SVGP_first_model_performance}),
|
||||
identified after the first five days has the worst performance. It is unable to
|
||||
correctly simulate even the learning dataset. This behaviour is similar to that
|
||||
discovered in Figure~\ref{fig:SVGP_multistep_validation}
|
||||
(cf. Section~\ref{sec:validation_hyperparameters}), where the \acrshort{svgp}
|
||||
model performed worse than the equivalent \acrshort{gp} trained on the same
|
||||
dataset. It also performs worse than the initial \acrshort{gp} model in the rest
|
||||
of the simulations, being unable to correctly predict the heating to reference
|
||||
at step 500, and having maximum errors of around 10 $\degree$C for the simulations
|
||||
starting at 107500 and 11000 points.
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width =
|
||||
\textwidth]{Plots/1_SVGP_480pts_inf_window_12_averageYear_first_model_performance.pdf}
|
||||
\caption{SVGP first model performance}
|
||||
\label{fig:SVGP_first_model_performance}
|
||||
\end{figure}
|
||||
|
||||
\clearpage
|
||||
|
||||
Figure~\ref{fig:SVGP_later_model_performance} shows the performance of the 100th
|
||||
trained model (i.e the model trained on April 15). This model performs much
|
||||
better in all simulations. It is able to correctly simulate the 20-step
|
||||
behaviour of the plant over all the experimental steps in the first two cases.
|
||||
It still has a noticeable error when predicting the behaviour of the plant on
|
||||
new data (i.e. simulations starting at steps 10750 and 11000), but it is much
|
||||
less than before. This gives a hint at the fact that the \acrshort{svgp} model's
|
||||
performance improves throughout the year, but it does require much more data
|
||||
than the classical \acrshort{gp} model to capture the building dynamics.
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width =
|
||||
\textwidth]{Plots/1_SVGP_480pts_inf_window_12_averageYear_later_model_performance.pdf}
|
||||
\caption{SVGP later model performance}
|
||||
\label{fig:SVGP_later_model_performance}
|
||||
\end{figure}
|
||||
|
||||
The last model is trained on the whole-year dataset.
|
||||
Figure~\ref{fig:SVGP_last_model_performance} shows its performance for the same
|
||||
situation described before. The model is able to predict the plant's behaviour
|
||||
at steps 250 and 500 even better than before, as well as predict the behaviour
|
||||
at steps 10750 and 11000 with maximum error of 0.6 $\degree$C and 0.1 $\degree$C
|
||||
respectively.
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width =
|
||||
\textwidth]{Plots/1_SVGP_480pts_inf_window_12_averageYear_last_model_performance.pdf}
|
||||
\caption{SVGP last model performance}
|
||||
\label{fig:SVGP_last_model_performance}
|
||||
\end{figure}
|
||||
|
||||
The analysis of the model evolution as more data gets gathered already gives
|
||||
very good insight into the strengths and weaknesses of this approach. The
|
||||
initial model is unable to correctly extrapolate the plant's behaviour in new
|
||||
regions of the state space. Also, given the same amount of data, the
|
||||
\acrshort{svgp} model is able to capture less information about the plant
|
||||
dynamics than the equivalent \acrshort{gp} model. On the flip side, re-training
|
||||
the model every day with new information is able to mitigate this by adding the
|
||||
data in new regions as it gets discovered while being able to maintain constant
|
||||
training and evaluation cost.
|
||||
|
||||
A more in depth analysis of the evolution of the \acrshort{svgp} hyperparameters
|
||||
over the duration of the experiment is presented in
|
||||
Section~\ref{sec:lengthscales_results}.
|
||||
|
||||
A few questions arise naturally after investigating the performance of this
|
||||
control scheme:
|
||||
|
||||
\begin{itemize}
|
||||
\item If the model is able to correctly understand data gathered in
|
||||
closed-loop operation, will the performance deteriorate drastically if
|
||||
the first model is trained on less data?
|
||||
\item How much information can the model extract from closed-loop operation?
|
||||
Would a model trained on a window of only the last five days of
|
||||
closed-loop operation data be able to perform correctly?
|
||||
\end{itemize}
|
||||
|
||||
These questions will be further analysed in the Sections~\ref{sec:svgp_96pts}
|
||||
and~\ref{sec:svgp_window} respectively.
|
||||
|
||||
\clearpage
|
||||
|
||||
\subsubsection{Lengthscales}\label{sec:lengthscales_results}
|
||||
|
||||
Figure~\ref{fig:SVGP_evol_importance} provides a deeper insight into the
|
||||
evolution of the relative importance of the \acrshort{svgp} regressors over the
|
||||
course of the full-year simulation\footnotemark. A few remarks are immediate:
|
||||
the importance of most hyperparameters changes drastically the first few
|
||||
iterations, until reaching a more steady change pace, until around the month of
|
||||
July where most of the hyperparameters settle for the rest of the simulation.
|
||||
This behaviour could be explained by the model learning new regions of the state
|
||||
space (i.e the span of the \acrshort{ghi} and Outside Temperatures) over the
|
||||
first months as these values change, and remaining more constant once it has
|
||||
already gathered information on these different operating points.
|
||||
|
||||
\footnotetext{The evolution of the \textit{hyperparameters} is provided for
|
||||
reference in Annex~\ref{anx:hyperparams_evol}.}
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width =
|
||||
\textwidth]{Plots/1_SVGP_480pts_inf_window_12_averageYear_evol_importance.pdf}
|
||||
\caption{Evolution of SVGP model parameters}
|
||||
\label{fig:SVGP_evol_importance}
|
||||
\end{figure}
|
||||
|
||||
As seen in Figure~\ref{fig:SVGP_evol_importance}, the variance of the
|
||||
\acrshort{se} kernel steadily decreases, until reaching a plateau, which
|
||||
signifies the increase in confidence of the model. The hyperparameters
|
||||
corresponding to the exogenous inputs --- $w1,1$ and $w1,2$ --- become generally
|
||||
less important for future predictions over the course of the year, with the
|
||||
importance of $w1,1$, the \acrlong{ghi}, climbing back up over the last, colder
|
||||
months. This might be due to the fact that during the colder months, the
|
||||
\acrshort{ghi} is the only way for the exogenous inputs to \textit{provide}
|
||||
additional heat to the system.
|
||||
|
||||
A similar trend can be observed for the evolution of the input's
|
||||
hyperparameters, with the exception that the first lag of the controlled input
|
||||
($u1,1$) remains the most important over the course of the year.
|
||||
|
||||
For the lags of the measured output it can be seen that, over the course of the
|
||||
year, the importance of the first lag decreases, while that of the second and
|
||||
third lag increase --- until all three reach relatively similar values.
|
||||
|
||||
Another interesting comparison is provided by looking at the possible values of
|
||||
the \acrshort{se} kernel components. Since all the values are normalized within
|
||||
the -1 to 1 range, it is unlikely that any two points will be a distance higher
|
||||
than 2 apart. It is possible then to plot the values of the kernel terms due to
|
||||
each regressor as a function of their distance. This is done in
|
||||
Figure~\ref{fig:SVGP_first_covariance} for the first identified model and in
|
||||
Figure~\ref{fig:SVGP_last_covariance} for the last. It is clear that in both
|
||||
cases the kernel terms behave mostly linearly, with the exception of two points
|
||||
being close to each other, when the correlation remains stronger before it
|
||||
starts diminishing.
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width =
|
||||
\textwidth]{Plots/1_SVGP_480pts_inf_window_12_averageYear_first_covariance.pdf}
|
||||
\caption{SVGP model first covariance parameters}
|
||||
\label{fig:SVGP_first_covariance}
|
||||
\end{figure}
|
||||
|
||||
As for the last model, it can be noted that only the scale of the kernel terms
|
||||
changes, with their shape remaining consistent with the first identified model.
|
||||
This means that the model does not get much more complex as the data is
|
||||
gathered, but instead the same general structure is kept, with further
|
||||
refinements being done as data is added to the system.
|
||||
|
||||
\clearpage
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width =
|
||||
\textwidth]{Plots/1_SVGP_480pts_inf_window_12_averageYear_last_covariance.pdf}
|
||||
\caption{SVGP model last covariance parameters}
|
||||
\label{fig:SVGP_last_covariance}
|
||||
\end{figure}
|
||||
|
||||
One question that could be addressed given these mostly linear kernel terms is
|
||||
how well would an \acrshort{svgp} model perform with a linear kernel.
|
||||
Intuition would hint that it should still be able to track the reference
|
||||
temperature, albeit not as precisely due to the correlation that diminishes much
|
||||
slower when the two points are closer together in the \acrshort{se} kernel. This
|
||||
will be further investigated in Section~\ref{sec:svgp_linear}.
|
||||
|
||||
\clearpage
|
||||
|
||||
\subsection{SVGP with one day of starting data}\label{sec:svgp_96pts}
|
||||
|
||||
As previously discussed in Section~\ref{sec:SVGP_results}, the \acrshort{svgp}
|
||||
model is able to properly adapt given new information, overtime refining its
|
||||
understanding of the plant's dynamics.
|
||||
|
||||
Analyzing the results of a simulation done on only one day's worth of initial
|
||||
simulation data (cf. Figures~\ref{fig:SVGP_96pts_fullyear_simulation}
|
||||
and~\ref{fig:SVGP_96pts_abserr}) it is very notable that the model performs
|
||||
almost identically to the one identified in the previous sections. This
|
||||
highlights one of the practical benefits of the \acrshort{svgp} implementations
|
||||
compared to the classical \acrlong{gp} -- it is possible to start with a rougher
|
||||
controller trained on less data and refine it over time, reducing the need for
|
||||
cumbersome and potentially costly initial experiments for gathering data.
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width =
|
||||
\textwidth]{Plots/6_SVGP_96pts_inf_window_12_averageYear_fullyear.pdf}
|
||||
\caption{One Day SVGP full year simulation}
|
||||
\label{fig:SVGP_96pts_fullyear_simulation}
|
||||
\end{figure}
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width =
|
||||
\textwidth]{Plots/6_SVGP_96pts_inf_window_12_averageYear_abserr.pdf}
|
||||
\caption{One Day SVGP Absolute Error}
|
||||
\label{fig:SVGP_96pts_abserr}
|
||||
\end{figure}
|
||||
|
||||
\clearpage
|
||||
|
||||
\subsection{SVGP with a five days moving window}\label{sec:svgp_window}
|
||||
|
||||
This section presents the result of running a different control scheme. Here, as
|
||||
was the case for the base \acrshort{svgp} model, it is first trained on 5 days
|
||||
worth of data, with the difference being that each new model is only identified
|
||||
using the last five days' worth of data. This should provide an insight on
|
||||
whether the \acrshort{svgp} model is able to understand model dynamics only
|
||||
based on closed-loop operation.
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width =
|
||||
\textwidth]{Plots/5_SVGP_480pts_480pts_window_12_averageYear_fullyear.pdf}
|
||||
\caption{Windowed SVGP full year simulation}
|
||||
\label{fig:SVGP_480window_fullyear_simulation}
|
||||
\end{figure}
|
||||
|
||||
As it can be seen in Figure~\ref{fig:SVGP_480window_fullyear_simulation}, this
|
||||
model is unable to exhaustively track the reference temperature. In fact, five days
|
||||
after the identification, the model forgets all the initial data and becomes
|
||||
unstable. This instability then generates enough excitation of the plant for the
|
||||
model, to again learn its behaviour. This cycle repeats every five days, when the
|
||||
controller becomes unstable. In the stable regions, however, the controller is
|
||||
able to track the reference temperature.
|
||||
|
||||
\clearpage
|
||||
|
||||
\subsection{SVGP with Linear Kernel}\label{sec:svgp_linear}
|
||||
|
||||
The last model to be investigated is the \acrshort{svgp} with Linear Kernel. As
|
||||
it was suggested previously, the terms of the originally identified
|
||||
\acrshort{svgp} model are not very complex, leading to the question whether a
|
||||
pure linear kernel could suffice to understand the plant's behaviour.
|
||||
|
||||
Figure~\ref{fig:SVGP_linear_fullyear_simulation} shows the results of the
|
||||
full-year simulation. While this controller is still able to track the reference
|
||||
temperature, it shows a much larger variance in the measured values than the
|
||||
\acrshort{se} kernel \acrshort{svgp} model. This confirms the previous
|
||||
suspicions that a pure linear model would not be able to capture the more
|
||||
nuanced details of the CARNOT model dynamics.
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width =
|
||||
\textwidth]{Plots/10_SVGP_480pts_inf_window_12_averageYear_LinearKernel_fullyear.pdf}
|
||||
\caption{Linear SVGP full year simulation}
|
||||
\label{fig:SVGP_linear_fullyear_simulation}
|
||||
\end{figure}
|
||||
|
||||
\clearpage
|
||||
|
||||
\subsection{Comparative analysis}
|
||||
|
||||
Presented in Table~\ref{tab:Model_comparations} are the Mean Error, Error
|
||||
Variance and Mean Absolute Error for the full year simulation for the three
|
||||
stable \acrshort{svgp} models, as well as the classical \acrshort{gp} model.
|
||||
|
||||
\begin{table}[ht]
|
||||
%\vspace{-8pt}
|
||||
\centering
|
||||
\begin{tabular}{||c c c c||}
|
||||
\hline
|
||||
Model & Mean Error [$\degree$C] & Error Variance [$\degree$C] & Mean
|
||||
Absolute Error [$\degree$C]\\
|
||||
\hline \hline
|
||||
GP & 5.08 & 6.88 & 1.330 \\
|
||||
SVGP (5 days) & -0.06 & 0.25 & 0.055 \\
|
||||
SVGP (1 day) & -0.04 & 0.24 & 0.050 \\
|
||||
SVGP (Linear)& -0.03 & 0.29 & 0.093 \\
|
||||
\hline
|
||||
\end{tabular}
|
||||
\caption{Full-year model performance comparison}
|
||||
\label{tab:Model_comparations}
|
||||
\end{table}
|
||||
|
||||
The worst performing model, as noted previously, is the \acrshort{gp} model. The
|
||||
\acrshort{svgp} with Linear Kernel results in a stable model with a mean error
|
||||
very close to zero, which means no constant bias/ offset. This model has the
|
||||
highest error variance of all the identified \acrshort{svgp} models, which was
|
||||
also noted beforehand from qualitative observations. It is therefore possible to
|
||||
conclude that a Linear Kernel does not suffice for properly modeling the
|
||||
dynamics of the CARNOT model.
|
||||
|
||||
The two \acrshort{svgp} models with \acrlong{se} kernels perform the best. They
|
||||
have a comparable performance, with very small differences in Mean Absolute
|
||||
Error and Error variance. This leads to the conclusion that the \acrshort{svgp}
|
||||
models can be deployed with less explicit identification data, but they will
|
||||
continue to improve over the course of the year, as the building passes through
|
||||
different regions of the state space and more data is collected.
|
||||
|
||||
However, these results do not discredit the use of \acrlong{gp} for employment
|
||||
in a multi-seasonal situation. As shown before, given the same amount of data
|
||||
and ignoring the computational cost, they perform better than the alternative
|
||||
\acrshort{svgp} models. The bad initial performance could be mitigated by
|
||||
sampling the identification data at different points in time during multiple
|
||||
experiments, updating a fixed-size dataset based on the gained information, as
|
||||
well as more cleverly designing the kernel to include prior information.
|
||||
|
||||
|
||||
\clearpage
|
106
Sections/90_Conclusion.tex
Normal file
106
Sections/90_Conclusion.tex
Normal file
|
@ -0,0 +1,106 @@
|
|||
\section{Conclusion}~\label{sec:conclusion}
|
||||
|
||||
The aim of this project was to analyze the performance of \acrshort{gp} based
|
||||
controllers for use in longer lasting implementations, where differences in
|
||||
building behaviour become important compared to the initially available data.
|
||||
|
||||
First, the performance of a classical \acrshort{gp} model trained on five days
|
||||
worth of experimental data was analyzed. Initially, this model performed very
|
||||
well both in one step ahead prediction and multi-step ahead simulation over new,
|
||||
unseen, data. With the change in weather, however, the model shifted from
|
||||
operating in the interpolated regions to the extrapolated regions of the initial
|
||||
weather data. In this scenario the model was unable to properly predict the
|
||||
\pdome\ behaviour and, as a consequence, the \acrshort{mpc} controller became
|
||||
unstable.
|
||||
|
||||
Following that, several \acrshort{svgp} implementations were analyzed. The
|
||||
initial behaviour exhibited during parameter identification (cf.
|
||||
Section~\ref{sec:hyperparameters}) showed that the \acrshort{svgp} model was
|
||||
less capable of capturing building dynamics only based on the initial
|
||||
experimental dataset, possibly due to the \acrshort{elbo} approximation of the
|
||||
true log likelihood. While the \acrshort{svgp} model remained stable over the
|
||||
course of the 20-step ahead simulation, in the later steps it drifted much
|
||||
further from the real values than the equivalent \acrshort{gp} model.
|
||||
However, during the full-year simulation, this downside of the \acrshort{svgp}
|
||||
model was compensated by adding new data to the model training dataset each
|
||||
night at midnight. The model performance continuously improved over the course
|
||||
of the simulation, providing much better results overall.
|
||||
|
||||
To better analyze the learning behaviour of the \acrshort{svgp} models, three
|
||||
variations of the initial \acrshort{svgp} were also simulated. The first
|
||||
variation consisted of training the initial model on only one day's worth of
|
||||
experimental data, as opposed to five days in the first case. This model was
|
||||
then regularly updated every night at midnight, just as the initial case. It
|
||||
turned out to provide very comparable results to the initial model, leading to
|
||||
the conclusion that the \acrshort{svgp} model can be initially deployed using
|
||||
much less training data, and it will still be able to correctly capture the
|
||||
building dynamics on subsequent updates.
|
||||
|
||||
The second variation of the \acrshort{svgp} model was re-trained using a rolling
|
||||
window of five days' worth of data, in order to see the model's ability to learn
|
||||
the proper building dynamics based only on closed-loop operation data. This
|
||||
model turned out to be unstable, and the full-year simulation showed that every
|
||||
time the model was trained using \textit{only} closed-loop operation data it
|
||||
turned unstable. This prompted a much higher excitation of the building for the
|
||||
following day, which in turn provided enough information to train a good model,
|
||||
that would last until this information was too old to be included in the
|
||||
training window, at which point the model would turn unstable again.
|
||||
|
||||
In the last variation, the \acrshort{svgp} model was trained using a linear
|
||||
kernel. This model turned out to perform worse overall than the \acrshort{se}
|
||||
kernel model since it was unable to capture the more nuanced, non-linear
|
||||
behaviour of the building.
|
||||
|
||||
|
||||
\subsection{Further Research}~\label{sec:further_research}
|
||||
|
||||
Section~\ref{sec:results} has presented and compared the results of a full-year
|
||||
simulation for a classical \acrshort{gp} model, as well as a few incarnations of
|
||||
\acrshort{svgp} models. The results show that the \acrshort{svgp} have much
|
||||
better performance, mainly due to the possibility of updating the model
|
||||
throughout the year. The \acrshort{svgp} models also present a computational
|
||||
cost advantage both in training and in evaluation, due to several approximations
|
||||
shown in Section~\ref{sec:gaussian_processes}.
|
||||
|
||||
Focusing on the \acrlong{gp} models, there could be several ways of improving
|
||||
its performance, as noted previously: a more varied identification dataset and
|
||||
smart update of a fixed-size data dictionary according to information gain,
|
||||
could mitigate the present problems.
|
||||
|
||||
Using a Sparse \acrshort{gp} without replacing the maximum log likelihood
|
||||
with the \acrshort{elbo} could improve performance of the \acrshort{gp} model at
|
||||
the expense of training time.
|
||||
|
||||
An additional change that could be made is inclusion of the most amount of prior
|
||||
information possible through setting a more refined kernel, as well as adding
|
||||
prior information on all the model hyperparameters when available. This approach
|
||||
however goes against the `spirit' of black-box approaches, since significant
|
||||
insight into the physics of the plant is required in order to properly model and
|
||||
implement this information.
|
||||
|
||||
On the \acrshort{svgp} side, several changes could also be proposed, which were
|
||||
not properly addressed in this work. First, the size of the inducing dataset was
|
||||
chosen experimentally until it was found to accurately reproduce the manually
|
||||
collected experimental data. In order to better use the available computational
|
||||
resources, this value could be found programmatically in a way to minimize
|
||||
evaluation time, while still providing good performance. Another possibility is
|
||||
the periodic re-evaluation of this value when new data comes in, since as more
|
||||
and more data is collected the model becomes more complex, and in general more
|
||||
inducing locations could be necessary to properly reproduce the training data.
|
||||
|
||||
Finally, none of the presented controllers take into account occupancy rates or
|
||||
adapt to possible changes in the real building, such as adding or removing
|
||||
furniture, deteriorating insulation and so on. The presented update methods only
|
||||
deals with adding information on behaviour in different state space regions, i.e
|
||||
\textit{learning}. Additionally, their ability to \textit{adapt} to changes in
|
||||
the actual plant's behaviour should be further addressed.
|
||||
|
||||
\section*{Acknowledgements}
|
||||
|
||||
I would like to thank Manuel Koch for the great help provided during the course
|
||||
of the project starting from the basics on CARNOT modelling, to helping me
|
||||
better compare the performance of different controllers, as well as Prof.\ Colin
|
||||
Jones, whose insights were always very guiding, while still allowing me to
|
||||
discover everything on my own.
|
||||
|
||||
\clearpage
|
42
Sections/90_Further_Research.tex
Normal file
42
Sections/90_Further_Research.tex
Normal file
|
@ -0,0 +1,42 @@
|
|||
\section{Further Research}
|
||||
|
||||
Section~\ref{sec:results} has presented and compared the results of a full-year
|
||||
simulation for a classical \acrshort{gp} model, as well as a few incarnations of
|
||||
\acrshort{svgp} models. The results show that the \acrshort{svgp} have much
|
||||
better performance, mainly due to the possibility of updating the model
|
||||
throughout the year. The \acrshort{svgp} models also present a computational
|
||||
cost advantage both in training and in evaluation, due to several approximations
|
||||
shown in Section~\ref{sec:gaussian_processes}.
|
||||
|
||||
Focusing on the \acrlong{gp} models, there could be several ways of improving
|
||||
its performance, as noted previously: a more varied identification dataset and
|
||||
smart update of a fixed-size data dictionary according to information gain,
|
||||
could mitigate the present problems.
|
||||
|
||||
Using a Sparse \acrshort{gp} without replacing the maximum log likelihood
|
||||
with the \acrshort{elbo} could improve performance of the \acrshort{gp} model at
|
||||
the expense of training time.
|
||||
|
||||
An additional change that could be made is inclusion of the most amount of prior
|
||||
information possible through setting a more refined kernel, as well as adding
|
||||
prior information on all the model hyperparameters when available. This approach
|
||||
however goes against the "spirit" of black-box approaches, since significant
|
||||
insight into the physics of the plant is required in order to properly model and
|
||||
implement this information.
|
||||
|
||||
On the \acrshort{svgp} side, several changes could also be proposed, which were
|
||||
not properly addressed in this work. First, the size of the inducing dataset was
|
||||
chosen experimentally until it was found to accurately reproduce the manually
|
||||
collected experimental data. In order to better use the available computational
|
||||
resources, this value could be found programmatically in a way to minimize
|
||||
evaluation time, while still providing good performance. Another possibility is
|
||||
the periodic re-evaluation of this value when new data comes in, since as more
|
||||
and more data is collected the model becomes more complex, and in general more
|
||||
inducing locations could be necessary to properly reproduce the training data.
|
||||
|
||||
Finally, none of the presented controllers take into account occupancy rates or
|
||||
adapt to possible changes in the real building, such as adding or removing
|
||||
furniture, deteriorating insulation and so on. The presented update methods only
|
||||
deals with adding information on behaviour in different state space regions, i.e
|
||||
\textit{learning}, and their ability to \textit{adapt} to changes in the actual
|
||||
plant's behaviour should be further addressed.
|
34
Sections/95_Conclusion.tex
Normal file
34
Sections/95_Conclusion.tex
Normal file
|
@ -0,0 +1,34 @@
|
|||
\section{Conclusion}
|
||||
|
||||
The aim of this project was to analyse the performance of \acrshort{gp} based
|
||||
controllers for use in longer lasting implementations, where differences in
|
||||
building behaviour become important compared to the initially available data.
|
||||
|
||||
First, the performance of a classical \acrshort{gp} model trained on 5 days
|
||||
worth of experimental data was analysed. This model turned out to be unable to
|
||||
correctly extrapolate building behaviour as the weather changed throughout the
|
||||
year.
|
||||
|
||||
Several \acrshort{svgp} implementations were then analysed. They turned out to
|
||||
provide important benefits over the classical models, such as the ability to
|
||||
easily scale when new data is being added and the much reduced computational
|
||||
effort required. They do however present some downsides, namely increasing the
|
||||
number of hyperparameters by having to choose the number of inducing locations,
|
||||
as well as performing worse than then classical \acrshort{gp} implementation
|
||||
given the same amount of data.
|
||||
|
||||
Finally, the possible improvements to the current implementations have been
|
||||
addressed, noting that classical \acrshort{gp} implementations could also be
|
||||
adapted to the \textit{learning control} paradigm, even if their implementation
|
||||
could turn out to be much more involved and more computationally expensive than
|
||||
the \acrshort{svgp} alternative.
|
||||
|
||||
\section*{Acknowledgements}
|
||||
|
||||
I would like to thank Koch Manuel Pascal for the great help provided during the
|
||||
course of the project starting from the basics on CARNOT modelling, to helping
|
||||
me better compare the performance of different controllers, as well as Professor
|
||||
Jones, whose insights were always very guiding, while still allowing me to
|
||||
discover everything on my own.
|
||||
|
||||
\clearpage
|
78
Sections/99A_GP_hyperparameters_validation.tex
Normal file
78
Sections/99A_GP_hyperparameters_validation.tex
Normal file
|
@ -0,0 +1,78 @@
|
|||
\clearpage
|
||||
|
||||
\section{Hyperparameters validation for classical GP}\label{apx:hyperparams_gp}
|
||||
|
||||
\subsection{\texorpdfstring{\model{1}{1}{3}}{113}}
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width = \textwidth]{Plots/GP_113_training_performance.pdf}
|
||||
\caption{Prediction performance of \model{1}{1}{3} on the training dataset}
|
||||
\label{fig:GP_train_validation}
|
||||
\end{figure}
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width = \textwidth]{Plots/GP_113_test_performance.pdf}
|
||||
\caption{Prediction performance of \model{1}{1}{3} on the test dataset}
|
||||
\label{fig:GP_test_validation}
|
||||
\end{figure}
|
||||
|
||||
\clearpage
|
||||
|
||||
\subsection{\texorpdfstring{\model{2}{1}{3}}{213}}
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width = \textwidth]{Plots/GP_213_training_performance.pdf}
|
||||
\caption{Prediction performance of \model{2}{1}{3} on the training dataset}
|
||||
\label{fig:GP_213_train_validation}
|
||||
\end{figure}
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width = \textwidth]{Plots/GP_213_test_performance.pdf}
|
||||
\caption{Prediction performance of \model{2}{1}{3} on the test dataset}
|
||||
\label{fig:GP_213_test_validation}
|
||||
\end{figure}
|
||||
|
||||
\clearpage
|
||||
|
||||
\subsection{\texorpdfstring{\model{3}{1}{3}}{313}}
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width = \textwidth]{Plots/GP_313_training_performance.pdf}
|
||||
\caption{Prediction performance of \model{3}{1}{3} on the training dataset}
|
||||
\label{fig:GP_313_train_validation}
|
||||
\end{figure}
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width = \textwidth]{Plots/GP_313_test_performance.pdf}
|
||||
\caption{Prediction performance of \model{3}{1}{3} on the test dataset}
|
||||
\label{fig:GP_313_test_validation}
|
||||
\end{figure}
|
||||
|
||||
|
||||
\clearpage
|
||||
|
||||
\section{Hyperparameters validation for SVGP}\label{apx:hyperparams_svgp}
|
||||
|
||||
\subsection{\texorpdfstring{\model{1}{2}{3}}{123}}
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width = \textwidth]{Plots/SVGP_123_training_performance.pdf}
|
||||
\caption{Prediction performance of \model{1}{2}{3} on the training dataset}
|
||||
\label{fig:SVGP_train_validation}
|
||||
\end{figure}
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width = \textwidth]{Plots/SVGP_123_test_performance.pdf}
|
||||
\caption{Prediction performance of \model{1}{2}{3} on the test dataset}
|
||||
\label{fig:SVGP_test_validation}
|
||||
\end{figure}
|
||||
|
||||
\clearpage
|
10
Sections/99C_hyperparameters_results.tex
Normal file
10
Sections/99C_hyperparameters_results.tex
Normal file
|
@ -0,0 +1,10 @@
|
|||
\section{SVGP hyperparameters evolution}\label{anx:hyperparams_evol}
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width =
|
||||
\textwidth]{Plots/1_SVGP_480pts_inf_window_12_averageYear_evol_hyperparameters.pdf}
|
||||
\caption{Evolution of SVGP hyperparameters}
|
||||
\label{fig:SVGP_evol_hyperparameters}
|
||||
\end{figure}
|
||||
|
Loading…
Add table
Add a link
Reference in a new issue