Fixed unconsistent use of acronyms

This commit is contained in:
Radu C. Martin 2021-07-22 22:13:51 +02:00
parent 1e1cc5acd8
commit 721953642c
7 changed files with 49 additions and 47 deletions

View file

@ -35,7 +35,7 @@ The idea of using Gaussian Processes as regression models for control of dynamic
systems is not new, and has already been explored a number of times. A general systems is not new, and has already been explored a number of times. A general
description of their use, along with the necessary theory and some example description of their use, along with the necessary theory and some example
implementations is given in~\cite{kocijanModellingControlDynamic2016}. implementations is given in~\cite{kocijanModellingControlDynamic2016}.
In~\cite{pleweSupervisoryModelPredictive2020}, a \acrlong{gp} Model with a In~\cite{pleweSupervisoryModelPredictive2020}, a \acrshort{gp} Model with a
\acrlong{rq} Kernel is used for temperature set point optimization. \acrlong{rq} Kernel is used for temperature set point optimization.
Gaussian Processes for building control have also been studied before in the Gaussian Processes for building control have also been studied before in the
@ -66,7 +66,7 @@ the original identified model goes further and further into the extrapolated
regions. regions.
This project tries to combine the use of online learning control schemes with This project tries to combine the use of online learning control schemes with
\acrlong{gp} Models through implementing \acrlong{svgp} Models. \acrshort{svgp}s \acrshort{gp} Models through implementing \acrfull{svgp} Models. \acrshort{svgp}s
provide means of extending the use of \acrshort{gp}s to larger datasets, thus provide means of extending the use of \acrshort{gp}s to larger datasets, thus
enabling the periodic re-training of the model to include all the historically enabling the periodic re-training of the model to include all the historically
available data. available data.
@ -81,7 +81,7 @@ multiple control schemes using both classical \acrshort{gp}s, as well as
Section~\ref{sec:gaussian_processes} provides the mathematical background for Section~\ref{sec:gaussian_processes} provides the mathematical background for
understanding \acrshort{gp}s, as well as the definition in very broad strokes of understanding \acrshort{gp}s, as well as the definition in very broad strokes of
\acrshort{svgp}s and their differences from the classical implementation of \acrshort{svgp}s and their differences from the classical implementation of
\acrlong{gp}es. This information is later used for comparing their performances \acrshort{gp}s. This information is later used for comparing their performances
and outlining their respective pros and cons. and outlining their respective pros and cons.
Section~\ref{sec:CARNOT} goes into the details of the implementation of the Section~\ref{sec:CARNOT} goes into the details of the implementation of the

View file

@ -144,7 +144,7 @@ choices~\cite{kocijanModellingControlDynamic2016}:
\subsubsection*{Squared Exponential Kernel} \subsubsection*{Squared Exponential Kernel}
This kernel is used when the system to be modelled is assumed to be smooth and This kernel is used when the system to be modelled is assumed to be smooth and
continuous. The basic version of the \acrshort{se} kernel has the following form: continuous. The basic version of the \acrfull{se} kernel has the following form:
\begin{equation} \begin{equation}
k(\mathbf{x}, \mathbf{x'}) = \sigma^2 \exp{\left(- \frac{1}{2}\frac{\norm{\mathbf{x} - k(\mathbf{x}, \mathbf{x'}) = \sigma^2 \exp{\left(- \frac{1}{2}\frac{\norm{\mathbf{x} -
@ -182,7 +182,7 @@ value of the hyperparameters. This is the \acrfull{ard} property.
The \acrfull{rq} Kernel can be interpreted as an infinite sum of \acrshort{se} The \acrfull{rq} Kernel can be interpreted as an infinite sum of \acrshort{se}
kernels with different lengthscales. It has the same smooth behaviour as the kernels with different lengthscales. It has the same smooth behaviour as the
\acrlong{se} Kernel, but can take into account the difference in function \acrshort{se} Kernel, but can take into account the difference in function
behaviour for large scale vs small scale variations. behaviour for large scale vs small scale variations.
\begin{equation} \begin{equation}
@ -207,11 +207,11 @@ without inquiring the penalty of inverting the covariance matrix. An overview
and comparison of multiple methods is given and comparison of multiple methods is given
at~\cite{liuUnderstandingComparingScalable2019}. at~\cite{liuUnderstandingComparingScalable2019}.
For the scope of this project, the choice of using the \acrfull{svgp} models has For the scope of this project, the choice of using the \acrshort{svgp} models
been made, since it provides a very good balance of scalability, capability, has been made, since it provides a very good balance of scalability, capability,
robustness and controllability~\cite{liuUnderstandingComparingScalable2019}. robustness and controllability~\cite{liuUnderstandingComparingScalable2019}.
The \acrlong{svgp} has been first introduced The \acrshort{svgp} has been first introduced
by~\textcite{hensmanGaussianProcessesBig2013} as a way to scale the use of by~\textcite{hensmanGaussianProcessesBig2013} as a way to scale the use of
\acrshort{gp}s to large datasets. A detailed explanation on the mathematics of \acrshort{gp}s to large datasets. A detailed explanation on the mathematics of
\acrshort{svgp}s and reasoning behind it is given \acrshort{svgp}s and reasoning behind it is given
@ -264,7 +264,7 @@ In order to solve this problem, the log likelihood equation
classical \acrshort{gp} is replaced with an approximate value, that is classical \acrshort{gp} is replaced with an approximate value, that is
computationally tractable on larger sets of data. computationally tractable on larger sets of data.
The following derivation of the \acrshort{elbo} is based on the one presented The following derivation of the \acrfull{elbo} is based on the one presented
in~\cite{yangUnderstandingVariationalLower}. in~\cite{yangUnderstandingVariationalLower}.
Assume $X$ to be the observations, and $Z$ the set parameters of the Assume $X$ to be the observations, and $Z$ the set parameters of the
@ -300,7 +300,7 @@ divergence, which for variational inference takes the following form:
\end{equation} \end{equation}
\vspace{5pt} \vspace{5pt}
where L is the \acrfull{elbo}. Rearranging this equation we get: where L is the \acrshort{elbo}. Rearranging this equation we get:
\begin{equation} \begin{equation}
L = \log{\left(p(X)\right)} - KL\left[q(Z)||p(Z|X)\right] L = \log{\left(p(X)\right)} - KL\left[q(Z)||p(Z|X)\right]
@ -312,13 +312,13 @@ lower bound of the log probability of observations.
\subsection{Gaussian Process Models for Dynamical \subsection{Gaussian Process Models for Dynamical
Systems}\label{sec:gp_dynamical_system} Systems}\label{sec:gp_dynamical_system}
In the context of Dynamical Systems Identification and Control, Gaussian In the context of Dynamical Systems Identification and Control, \acrshort{gp}s
Processes are used to represent different model structures, ranging from state are used to represent different model structures, ranging from state
space and \acrshort{nfir} structures, to the more complex \acrshort{narx}, space and \acrfull{nfir} structures, to the more complex \acrfull{narx},
\acrshort{noe} and \acrshort{narmax}. \acrfull{noe} and \acrfull{narmax}.
The general form of an \acrfull{narx} model is as follows: The general form of an \acrshort{narx} model is as follows:
\begin{equation} \begin{equation}
\hat{y}(k) = \hat{y}(k) =

View file

@ -378,7 +378,7 @@ The unit has a typical \acrlong{eer} (\acrshort{eer}, cooling efficiency) of 4.9
maximum cooling capacity of 64.2 kW. maximum cooling capacity of 64.2 kW.
One particularity of this \acrshort{hvac} unit is that during summer, only one One particularity of this \acrshort{hvac} unit is that during summer, only one
of the two compressors are running. This results in a higher \acrlong{eer}, in of the two compressors are running. This results in a higher \acrshort{eer}, in
the cases where the full cooling capacity is not required. the cases where the full cooling capacity is not required.
\subsubsection*{Ventilation} \subsubsection*{Ventilation}
@ -504,7 +504,7 @@ it will oscillate between using one or two compressors. Lastly, it is possible
to notice that the \acrshort{hvac} is not turned on during the night, with the to notice that the \acrshort{hvac} is not turned on during the night, with the
exception of the external fan, which continues running. exception of the external fan, which continues running.
\subsubsection{The CARNOT WDB weather data format}\label{sec:CARNOT_WDB} \subsubsection{The CARNOT Weather Data Bus format}\label{sec:CARNOT_WDB}
For a correct simulation of the building behaviour, CARNOT requires not only the For a correct simulation of the building behaviour, CARNOT requires not only the
detailed definition of the building blocks/nodes, but also a very detailed set detailed definition of the building blocks/nodes, but also a very detailed set
@ -514,7 +514,7 @@ sun's position throughout the simulation (zenith and azimuth angles), the
as well as information on the ambient temperature, humidity, precipitation, as well as information on the ambient temperature, humidity, precipitation,
pressure, wind speed and direction, etc. A detailed overview of each pressure, wind speed and direction, etc. A detailed overview of each
measurement necessary for a simulation is given in the CARNOT user measurement necessary for a simulation is given in the CARNOT user
manual~\cite{CARNOTManual}. manual~\cite{CARNOTManual}. This data structure is known as the \acrfull{wdb}.
In order to compare the CARNOT model's performance to that of the real \pdome, In order to compare the CARNOT model's performance to that of the real \pdome,
it is necessary to simulate the CARNOT model under the same set of conditions as it is necessary to simulate the CARNOT model under the same set of conditions as
@ -532,17 +532,19 @@ are computed using the Python pvlib
library~\cite{f.holmgrenPvlibPythonPython2018}. library~\cite{f.holmgrenPvlibPythonPython2018}.
As opposed to the solar angles, which can be computed exactly from the available As opposed to the solar angles, which can be computed exactly from the available
information, the Solar Radiation Components (DHI and DNI) have to be estimated information, the Solar Radiation Components (\acrshort{dhi} and \acrshort{dni})
from the available measurements of GHI, zenith angles (Z) and datetime have to be estimated from the available measurements of \acrfull{ghi}, zenith
information. \textcite{erbsEstimationDiffuseRadiation1982} present an empirical angles (Z) and datetime information.
relationship between GHI and the diffuse fraction DF and the ratio of GHI to \textcite{erbsEstimationDiffuseRadiation1982} present an empirical relationship
extraterrestrial irradiance $K_t$, known as the Erbs model. The DF is then used between \acrshort{ghi} and the \acrfull{df} and the ratio of \acrshort{ghi} to
to compute DHI and DNI as follows: extraterrestrial irradiance $K_t$, known as the Erbs model. The \acrshort{df}
is then used to compute \acrshort{dhi} and \acrshort{dni} as follows:
\begin{equation} \begin{equation}
\begin{aligned} \begin{aligned}
\text{DHI} &= \text{DF} \times \text{GHI} \\ \text{\acrshort{dhi}} &= \text{DF} \times \text{\acrshort{ghi}} \\
\text{DNI} &= \frac{\text{GHI} - \text{DHI}}{\cos{\text{Z}}} \text{\acrshort{dni}} &= \frac{\text{\acrshort{ghi}} -
\text{\acrshort{dhi}}}{\cos{\text{Z}}}
\end{aligned} \end{aligned}
\end{equation} \end{equation}

View file

@ -19,7 +19,7 @@ consuming computations in the case of larger number of regressors and more
complex kernel functions. complex kernel functions.
As described in Section~\ref{sec:gp_dynamical_system}, for the purpose of this As described in Section~\ref{sec:gp_dynamical_system}, for the purpose of this
project, the \acrlong{gp} model will be trained using the \acrshort{narx} project, the \acrshort{gp} model will be trained using the \acrshort{narx}
structure. This already presents an important choice in the selection of structure. This already presents an important choice in the selection of
regressors and their respective autoregressive lags. regressors and their respective autoregressive lags.
@ -185,7 +185,7 @@ $l_u = 1$ and $l_y = 3$ with $l_w$ taking the values of either 1, 2 or 3,
depending on the results of further analysis. depending on the results of further analysis.
As for the case of the \acrlong{svgp}, the results for the classical As for the case of the \acrshort{svgp}, the results for the classical
\acrshort{gp} (cf. Table~\ref{tab:GP_hyperparameters}) are not necessarily \acrshort{gp} (cf. Table~\ref{tab:GP_hyperparameters}) are not necessarily
representative of the relationships between the regressors of the representative of the relationships between the regressors of the
\acrshort{svgp} model, due to the fact that the dataset used for training is \acrshort{svgp} model, due to the fact that the dataset used for training is
@ -259,8 +259,8 @@ This performance metric is very useful when training a model whose goal is
solely to minimize the difference between the measured values, and the ones solely to minimize the difference between the measured values, and the ones
predicted by the model. predicted by the model.
A variant of the \acrshort{mse} is the \acrfull{smse}, which normalizes the A variant of the \acrfull{mse} is the \acrfull{smse}, which normalizes the
\acrlong{mse} by the variance of the output values of the validation dataset. \acrshort{mse} by the variance of the output values of the validation dataset.
\begin{equation}\label{eq:smse} \begin{equation}\label{eq:smse}
\text{SMSE} = \frac{1}{N}\frac{\sum_{i=1}^N \left(y_i - \text{SMSE} = \frac{1}{N}\frac{\sum_{i=1}^N \left(y_i -
@ -403,7 +403,7 @@ the discrepancies.
\subsubsection{Conventional Gaussian Process} \subsubsection{Conventional Gaussian Process}
The simulation performance of the three lag combinations chosen for the The simulation performance of the three lag combinations chosen for the
classical \acrlong{gp} models has been analyzed, with the results presented in classical \acrshort{gp} models has been analyzed, with the results presented in
Figures~\ref{fig:GP_113_multistep_validation},~\ref{fig:GP_213_multistep_validation} Figures~\ref{fig:GP_113_multistep_validation},~\ref{fig:GP_213_multistep_validation}
and~\ref{fig:GP_313_multistep_validation}. For reference, the one-step ahead and~\ref{fig:GP_313_multistep_validation}. For reference, the one-step ahead
predictions for the training and test datasets are presented in predictions for the training and test datasets are presented in

View file

@ -48,7 +48,7 @@ the correct amount of data for the weather predictions and to properly generate
the optimization problem, the discrete/continuous transition and vice-versa the optimization problem, the discrete/continuous transition and vice-versa
happens on the Simulink side. This simplifies the adjustment of the sampling happens on the Simulink side. This simplifies the adjustment of the sampling
time, with the downside of harder inclusion of meta-data such as hour of the time, with the downside of harder inclusion of meta-data such as hour of the
day, day of the week, etc.\ in the \acrlong{gp} Model. day, day of the week, etc.\ in the \acrshort{gp} Model.
The weather prediction is done using the information present in the CARNOT The weather prediction is done using the information present in the CARNOT
\acrshort{wdb} object. Since the sampling time and control horizon of the \acrshort{wdb} object. Since the sampling time and control horizon of the
@ -66,13 +66,13 @@ evaluating a \acrshort{gp} has an algorithmic complexity of $\mathcal{O}(n^3)$.
This means that naive implementations can get too expensive in terms of This means that naive implementations can get too expensive in terms of
computation time very quickly. computation time very quickly.
In order to have as smallest of a bottleneck as possible when dealing with In order to have as smallest of a bottleneck as possible when dealing with the
\acrshort{gp}s, a very fast implementation of \acrlong{gp} Models was used, in required algebraic operations, a very fast implementation of \acrshort{gp}
the form of GPflow~\cite{matthewsGPflowGaussianProcess2017}. It is based on Models was used, in the form of GPflow~\cite{matthewsGPflowGaussianProcess2017}.
TensorFlow~\cite{tensorflow2015-whitepaper}, which has very efficient It is based on TensorFlow~\cite{tensorflow2015-whitepaper}, which has very
implementation of all the necessary Linear Algebra operations. Another benefit efficient implementation of all the necessary Linear Algebra operations. Another
of this implementation is the very simple use of any additional computational benefit of this implementation is the very simple use of any additional
resources, such as a GPU, TPU, etc. computational resources, such as a GPU, TPU, etc.
\subsubsection{Classical Gaussian Process training} \subsubsection{Classical Gaussian Process training}
@ -158,7 +158,7 @@ Let $w_l$, $u_l$, and $y_l$ be the lengths of the state vector components
$\mathbf{w}$, $\mathbf{u}$, $\mathbf{y}$ (cf. Equation~\ref{eq:components}). $\mathbf{w}$, $\mathbf{u}$, $\mathbf{y}$ (cf. Equation~\ref{eq:components}).
Also, let X be the matrix of all the system states over the optimization horizon Also, let X be the matrix of all the system states over the optimization horizon
and W be the matrix of the predicted disturbances for all the future steps. The and W be the matrix of the predicted disturbances for all the future steps. The
original \acrlong{ocp} can be rewritten using index notation as: original \acrshort{ocp} can be rewritten using index notation as:
\begin{subequations}\label{eq:sparse_optimal_control_problem} \begin{subequations}\label{eq:sparse_optimal_control_problem}
\begin{align} \begin{align}

View file

@ -7,7 +7,7 @@ analyzed in this Section have used a sampling time of 15 minutes and a control
horizon of 8 steps. horizon of 8 steps.
Section~\ref{sec:GP_results} analyzes the results of a conventional Section~\ref{sec:GP_results} analyzes the results of a conventional
\acrlong{gp} Model trained on the first five days of gathered data. The model \acrshort{gp} Model trained on the first five days of gathered data. The model
is then used for the rest of the year, with the goal of tracking the defined is then used for the rest of the year, with the goal of tracking the defined
reference temperature. reference temperature.
@ -131,7 +131,7 @@ performance, but are more complex in implementation.
\subsection{Sparse and Variational Gaussian Process}\label{sec:SVGP_results} \subsection{Sparse and Variational Gaussian Process}\label{sec:SVGP_results}
The \acrlong{svgp} models are setup in a similar way as described before. The The \acrshort{svgp} models are setup in a similar way as described before. The
model is first identified using 5 days worth of experimental data collected model is first identified using 5 days worth of experimental data collected
using a \acrshort{pi} controller and a random disturbance signal. The difference using a \acrshort{pi} controller and a random disturbance signal. The difference
lies in the fact than the \acrshort{svgp} model gets re-identified every night lies in the fact than the \acrshort{svgp} model gets re-identified every night
@ -143,7 +143,7 @@ setup performs much better than the initial one. The only large deviations from
the reference temperature are due to cold weather, when the \acrshort{hvac}'s the reference temperature are due to cold weather, when the \acrshort{hvac}'s
limited heat capacity is unable to maintain the proper temperature. limited heat capacity is unable to maintain the proper temperature.
Additionnaly, the \acrshort{svgp} controller takes around 250 - 300ms of Additionnaly, the \acrshort{svgp} controller takes around 250 - 300ms of
computation time for each simulation time, decreasing the computational cost of computation time for each simulation step, decreasing the computational cost of
the original \acrshort{gp} by a factor of six. the original \acrshort{gp} by a factor of six.
@ -293,7 +293,7 @@ As seen in Figure~\ref{fig:SVGP_evol_importance}, the variance of the
signifies the increase in confidence of the model. The hyperparameters signifies the increase in confidence of the model. The hyperparameters
corresponding to the exogenous inputs --- $w1,1$ and $w1,2$ --- become generally corresponding to the exogenous inputs --- $w1,1$ and $w1,2$ --- become generally
less important for future predictions over the course of the year, with the less important for future predictions over the course of the year, with the
importance of $w1,1$, the \acrlong{ghi}, climbing back up over the last, colder importance of $w1,1$, the \acrshort{ghi}, climbing back up over the last, colder
months. This might be due to the fact that during the colder months, the months. This might be due to the fact that during the colder months, the
\acrshort{ghi} is the only way for the exogenous inputs to \textit{provide} \acrshort{ghi} is the only way for the exogenous inputs to \textit{provide}
additional heat to the system. additional heat to the system.
@ -361,7 +361,7 @@ simulation data (cf. Figures~\ref{fig:SVGP_96pts_fullyear_simulation}
and~\ref{fig:SVGP_96pts_abserr}) it is very notable that the model performs and~\ref{fig:SVGP_96pts_abserr}) it is very notable that the model performs
almost identically to the one identified in the previous sections. This almost identically to the one identified in the previous sections. This
highlights one of the practical benefits of the \acrshort{svgp} implementations highlights one of the practical benefits of the \acrshort{svgp} implementations
compared to the classical \acrlong{gp} -- it is possible to start with a rougher compared to the classical \acrshort{gp} -- it is possible to start with a rougher
controller trained on less data and refine it over time, reducing the need for controller trained on less data and refine it over time, reducing the need for
cumbersome and potentially costly initial experiments for gathering data. cumbersome and potentially costly initial experiments for gathering data.
@ -473,7 +473,7 @@ models can be deployed with less explicit identification data, but they will
continue to improve over the course of the year, as the building passes through continue to improve over the course of the year, as the building passes through
different regions of the state space and more data is collected. different regions of the state space and more data is collected.
However, these results do not discredit the use of \acrlong{gp} for employment However, these results do not discredit the use of \acrshort{gp} for employment
in a multi-seasonal situation. As shown before, given the same amount of data in a multi-seasonal situation. As shown before, given the same amount of data
and ignoring the computational cost, they perform better than the alternative and ignoring the computational cost, they perform better than the alternative
\acrshort{svgp} models. The bad initial performance could be mitigated by \acrshort{svgp} models. The bad initial performance could be mitigated by

View file

@ -62,7 +62,7 @@ throughout the year. The \acrshort{svgp} models also present a computational
cost advantage both in training and in evaluation, due to several approximations cost advantage both in training and in evaluation, due to several approximations
shown in Section~\ref{sec:gaussian_processes}. shown in Section~\ref{sec:gaussian_processes}.
Focusing on the \acrlong{gp} models, there could be several ways of improving Focusing on the \acrshort{gp} models, there could be several ways of improving
its performance, as noted previously: a more varied identification dataset and its performance, as noted previously: a more varied identification dataset and
smart update of a fixed-size data dictionary according to information gain, smart update of a fixed-size data dictionary according to information gain,
could mitigate the present problems. could mitigate the present problems.