Thesis update

2021-06-25 06:22:43 +02:00 · 2021-06-25 06:22:43 +02:00 · c213d3064e
commit c213d3064e
parent 3b1f852876
14 changed files with 678 additions and 131 deletions
--- a/10_Introduction.tex
+++ b/10_Introduction.tex
@ -7,5 +7,3 @@


 % TODO: [Introduction] Big lines previous research and why
-
-\clearpage
--- a/20_Previous_Research.tex
+++ b/20_Previous_Research.tex
@ -1,19 +1,40 @@
 \section{Previous Research}
 With the increase in computational power and availability of data  over time the
-accesibility of data-driven methods for System Identfication and Control has
+accessibility of data-driven methods for System Identification and Control has
 also risen significantly. 

 The idea of using Gaussian Processes as regression models for control of dynamic
 systems is not new, and has already been explored a number of times. A general
 description of their use, along with the necessary theory and some example
-implementations is given in {\color{red} Add citation to the Gaussian Process
-for dynamic models textbook}
+implementations is given in~\cite{kocijanModellingControlDynamic2016}.
+In~\cite{pleweSupervisoryModelPredictive2020} a \acrlong{gp} Model with a
+\acrlong{rq} Kernel is used for temperature set point optimization.

-Gaussian Processes for building control have been studied before in the context
-of Demand Response, {\color{orange} where the buildings are used for their heat
-capacity in order to reduce the stress on energy provides during peak load times}
+Gaussian Processes for building control have also been studied before in the
+context of Demand Response~\cite{nghiemDatadrivenDemandResponse2017,
+jainLearningControlUsing2018}, where the buildings are used for their heat
+capacity in order to reduce the stress on energy providers during peak load
+times.

-% TODO: [Previous Research] Finish with need for adaptive schemes
+There are, however multiple limitations with these approaches. 
+In~\cite{nghiemDatadrivenDemandResponse2017} the model is only identified once,
+ignoring changes in weather or plant parameters which could lead to different
+dynamics. This is addressed in \cite{jainLearningControlUsing2018} by
+re-identifying the model every two weeks using new information. Another
+limitation is that of the scalability of the \acrshort{gp}s, which become
+prohibitively expensive from a computational point of view when too much data is
+added.


+The ability to learn the plant's behaviour in new regions is very helpful in
+maintaining model performance over time as the behaviour of the plants starts
+deviating and the original identified model goes further and further into the
+extrapolated regions.
+
+
+This project will therefore try to combine the use of online learning schemes
+with \acrlong{gp}es by using \acrlong{svgp}es, which provide means of using
+\acrshort{gp} Models on larger datasets, and re-training the models every day at
+midnight to include all the historically available data.
+
 \clearpage
--- a/30_Gaussian_Processes_Background.tex
+++ b/30_Gaussian_Processes_Background.tex
@ -130,7 +130,7 @@ observations and the fixed mean function:
 The choice of the kernel is an important part for any kernel machine class
 algorithm. It serves the purpose of shaping the behaviour of the \acrshort{gp}
 by imposing a desired level of smoothness of the resulting functions, a
-prediodicity, linearity, etc. This extends the use cases of the \acrshort{gp}
+periodicity, linearity, etc. This extends the use cases of the \acrshort{gp}
 models while including any available prior information of the system to be
 modeled.

@ -148,8 +148,7 @@ continuous. The basic version of the \acrshort{se} kernel has the following form
    \mathbf{x'}}^2}{l^2}\right)}
 \end{equation}

-with the parameters $\sigma^2$ (model variance) and $l$ (lengthscale).
-with the model variance $\sigma^2$ and lengthscale $l$ as parameters.
+With the model variance $\sigma^2$ and lengthscale $l$ as parameters.

 The lengthscale indicates how fast the correlation diminishes as the two points
 get further apart from each other.
@ -178,7 +177,7 @@ value of the hyperparameters. This is the \acrfull{ard} property.

 \subsubsection*{Rational Quadratic Kernel}

-The \acrfull{rq} Kernel can be intepreted as an infinite sum of \acrshort{se}
+The \acrfull{rq} Kernel can be interpreted as an infinite sum of \acrshort{se}
 kernels with different lengthscales. It has the same smooth behaviour as the
 \acrlong{se} Kernel, but can take into account the difference in function
 behaviour for large scale vs small scale variations.
@ -340,7 +339,7 @@ The \acrshort{noe} structure is therefore a \textit{simulation model}.
 In order to get the best simulation results from a \acrshort{gp} model, the
 \acrshort{noe} structure would have to be employed. Due to the high algorithmic
 complexity of training and evaluating \acrshort{gp} models, this approach is
-computationally untractable. In practice a \acrshort{narx} model will be trained,
+computationally intractable. In practice a \acrshort{narx} model will be trained,
 which will be validated through multi-step ahead prediction.

 \clearpage
--- a/40_CARNOT_model.tex
+++ b/40_CARNOT_model.tex
@ -8,7 +8,7 @@ of different control schemes over long periods of time.

 The model is designed using the CARNOT
 toolbox~\cite{lohmannEinfuehrungSoftwareMATLAB} for Simulink. It is based on the
-CARNOT default \textit{Room Radiator} model, with the following canges:
+CARNOT default \textit{Room Radiator} model, with the following changes:
 \begin{itemize}
    \item Only one of the two default rooms is used
    \item The outside walls are replaced with windows
@ -188,7 +188,7 @@ wall edge of 25m, we get the approximate volume of the building:

 The value presented in Equation~\ref{eq:numerical_volume} is used directly in
 the \textit{room\_node} of the CARNOT model (cf.
-Figure~\ref{fig:CARNOT_polydome}), as well as the calcualtion of the Air
+Figure~\ref{fig:CARNOT_polydome}), as well as the calculation of the Air
 Exchange Rate, presented in Section~\ref{sec:Air_Exchange_Rate}.

 \subsection{Furniture}
@ -275,7 +275,7 @@ volume by the surface:

 In order to better simulate the behaviour of the real \pdome\ building it is
 necessary to approximate the building materials and their properties as close as
-possible. This section goes into the detailes and arguments for the choice of
+possible. This section goes into the details and arguments for the choice of
 parameters for each of the CARNOT nodes' properties.

 \subsubsection{Windows}
@ -300,7 +300,7 @@ value for new window installations in the private sector buildings in
 Switzerland is 1.5
 \(\frac{W}{m^2K}\)~\cite{glassforeuropeMinimumPerformanceRequirements2018}.

-Considering the aforementioned values, and the fact the the \pdome\ building was
+Considering the aforementioned values, and the fact the \pdome\ building was
 built in 1993~\cite{nattererModelingMultilayerBeam2008}, the default U-factor of
 1.8 \(\frac{W}{m^2K}\) has been deemed appropriate.

@ -356,17 +356,17 @@ Table~\ref{tab:material_properties}:

 \subsection{HVAC parameters}\label{sec:HVAC_parameters}

-The \pdome\ is equiped with an \textit{AERMEC RTY-04} HVAC system. According to
+The \pdome\ is equipped with an \textit{AERMEC RTY-04} HVAC system. According to
 the manufacturer's manual~\cite{aermecRoofTopManuelSelection}, this HVAC houses
 two compressors, of power 11.2 kW and 8.4 kW respectively, an external
-ventillator of power 1.67 kW, and a reflow ventillator of power 2 kW. The unit
-has a typical Energy Efficiency Ratio (EER, cooling efficiency) of 4.9 --- 5.1
-and a Coefficient of Performance (COP, heating efficiency) of 5.0, for a maximum
+ventilator of power 1.67 kW, and a reflow ventilator of power 2 kW. The unit has
+a typical \acrlong{eer} (\acrshort{eer}, cooling efficiency) of 4.9 --- 5.1 and
+a \acrlong{cop} (\acrshort{cop}, heating efficiency) of 5.0, for a maximum
 cooling capacity of 64.2 kW. 

 One particularity of this HVAC unit is that during summer only one of the two
-compressors are running. This results in a higher EER, in the cases where the
-full cooling capacity is not required.
+compressors are running. This results in a higher \acrlong{eer}, in the cases
+where the full cooling capacity is not required.

 \subsubsection*{Ventilation}

@ -459,13 +459,13 @@ consumption of the HVAC has a baseline of 1.67 kW of power consumption.
 Figure~\ref{fig:Polydome_electricity} also gives an insight into the workings of
 the HVAC when it comes to the combination of the two available compressors. The
 instruction manual of the HVAC~\cite{aermecRoofTopManuelSelection} notes that in
-summer only one of the compressors is running. This allows for a larger EER
-value and thus better performance. We can see that this is the case for most of
-the experiment, where the pwoer consumption caps at around 6 kW. There are,
-however, moments during the first part of the experiment where the power
-momentarily peaks over the 6 kW limit, and goes as high as around 9 kW. This
-most probably happens when the HVAC decides that the difference between the
-setpoint temperature and the actual measured values is too large.
+summer only one of the compressors is running. This allows for a larger
+\acrshort{eer} value and thus better performance. We can see that this is the
+case for most of the experiment, where the power consumption caps at around 6
+kW. There are, however, moments during the first part of the experiment where
+the power momentarily peaks over the 6 kW limit, and goes as high as around 9
+kW. This most probably happens when the HVAC decides that the difference between
+the set point temperature and the actual measured values is too large.

 Figure~\ref{fig:Polydome_exp7_settemp} presents the values of the set point
 temperature and the measured internal temperature. 
@ -482,22 +482,22 @@ compressor is indeed turned on during the first part of the experiment, when the
 set point differs greatly from the measured temperature. Second, for the
 beginning of Experiment 7, as well as the majority of the other experiments, the
 set point temperature is the value that gets changed in order to excite the
-system, and since the HVAC's controller is on during identification, it will
-oscillate between using one or two compressors. Lastly, it is possible to notice
-that the HVAC is not turned on during the night, with the exception of the
-external fan, which runs continuously.
+system, and since the \acrshort{hvac}'s controller is on during identification,
+it will oscillate between using one or two compressors. Lastly, it is possible
+to notice that the HVAC is not turned on during the night, with the exception of
+the external fan, which runs continuously.

 \subsubsection{The CARNOT WDB weather data format}\label{sec:CARNOT_WDB}

-For a corect simulation of the building behaviour, CARNOT requires not only the
+For a correct simulation of the building behaviour, CARNOT requires not only the
 detailed definition of the building blocks/nodes, but also a very detailed set
 of data on the weather conditions. This set includes detailed information on the
-sun's position throughout the simulation (zenith and azimuth angles), the Direct
-Normal Irradiance (DHI) and Direct Horizontal Irradiance (DNI), direct and
-diffuse solar radiation on surface, as well as information on the ambient
-temperature, humidity, precipitation, pressure, wind speed and direction, etc.
-A detailed overview of each measurement necessary for a simulation is given in
-the CARNOT user manual~\cite{CARNOTManual}.
+sun's position throughout the simulation (zenith and azimuth angles), the
+\acrfull{dhi} and \acrfull{dni}, direct and diffuse solar radiation on surface,
+as well as information on the ambient temperature, humidity, precipitation,
+pressure, wind speed and direction, etc.  A detailed overview of each
+measurement necessary for a simulation is given in the CARNOT user
+manual~\cite{CARNOTManual}.

 In order to compare the CARNOT model's performance to that of the real \pdome\
 it is necessary to simulate the CARNOT model under the same set of conditions as
@ -510,8 +510,8 @@ inferred from the available data.
 The information on the zenith and azimuth solar angles can be computed exactly
 if the position and elevation of the building are known. The GPS coordinates and
 elevation information is found using a map~\cite{ElevationFinder}. With that
-information available, the zenith, azimuth angles, as well as the angle of
-incidence (AOI) are computed using the Python pvlib
+information available, the zenith, azimuth angles, as well as the \acrfull{aoi}
+are computed using the Python pvlib
 library~\cite{f.holmgrenPvlibPythonPython2018}.

 As opposed to the solar angles which can be computed exactly from the available
@ -530,7 +530,7 @@ to compute DHI and DNI as follows:
 \end{equation}

 All the other parameters related to solar irradiance, such as the in-plane
-irradiance components, in-plane diffuse irradiances from the sky and the ground
+irradiance components, in-plane diffuse irradiance from the sky and the ground
 are computed using the Python pvlib.

 The values that cannot be either calculated or approximated from the available
@ -558,9 +558,9 @@ assumption that the HVAC is in cooling mode whenever the measurements are
 higher than the set point temperature, and is in heating mode otherwise. As it
 can already be seen in Figure~\ref{fig:Polydome_exp7_settemp}, this is a very
 strong assumption, that is not necessarily always correct. It works well when
-the measurements are very different from the sepoint, as is the case in the
+the measurements are very different from the set point, as is the case in the
 first part of the experiment, but this assumption is false for the second part
-of the experiment, where the sepoint temperature remains fixed and it is purely
+of the experiment, where the set point temperature remains fixed and it is purely
 the HVAC's job to regulate the temperature.

 \begin{figure}[ht]
@ -581,7 +581,7 @@ to an overestimated value of the Air Exchange Rate, underestimated amount of
 furniture in the building, or, more probably, miscalculation of the HVAC's
 heating/cooling mode. Of note is the large difference in behaviour for the
 Experiments 5 and 6. In fact, for these experiments, the values for the
-electical power consumption greatly differ in shape from the ones presented in
+electrical power consumption greatly differ in shape from the ones presented in
 the other datasets, which could potentially mean erroneous measurements, or some
 other underlying problem with the data.

@ -594,5 +594,4 @@ and size of the building, as well as possibly errors in the experimental data
 used for validation. A more detailed analysis of the building parameters would
 have to be done in order to find the reason and eliminate these discrepancies.

-
 \clearpage
--- a/50_Choice_of_Hyperparameters.tex
+++ b/50_Choice_of_Hyperparameters.tex
@ -11,7 +11,7 @@ behaviour.

 The advantage of black-box models lies in the lack of physical parameters to be
 fitted. On the flip side, this versatility of being able to fit much more
-complex models putely on data comes at the cost of having to properly define the
+complex models purely on data comes at the cost of having to properly define the
 model hyperparameters: the number of regressors, the number of autoregressive
 lags for each class of inputs, the shape of the covariance function have to be
 taken into account when designing a \acrshort{gp} model. These choices have
@ -30,7 +30,7 @@ inside} the CARNOT building. This is a suitable choice for the \acrshort{ocp}
 defined in Section~\ref{sec:mpc_problem}, where the goal is tracking as close as
 possible the inside temperature of the building.

-The input of the \acrshort{gp} model conincides with the input of the CARNOT
+The input of the \acrshort{gp} model coincides with the input of the CARNOT
 building, namely the \textit{power} passed to the idealized \acrshort{hvac},
 which is held constant during the complete duration of a step.

@ -73,7 +73,7 @@ properly chosen kernel can impose a prior desired behaviour on the
 \acrshort{gp} such as continuity of the function an its derivatives,
 periodicity, linearity, etc. On the flip side, choosing the wrong kernel can
 make computations more expensive, require more data to learn the proper
-behaviour or outright be numerically instable and/or give erroneous predictions.
+behaviour or outright be numerically unstable and/or give erroneous predictions.

 The \acrlong{se} kernel (cf. Section~\ref{sec:Kernels}) is very versatile,
 theoretically being able to fit any continuous function given enough data. When
@ -88,7 +88,7 @@ Kernel~\cite{jainLearningControlUsing2018}, Squared Exponential Kernel and
 Kernels from the M\`atern family~\cite{massagrayThermalBuildingModelling2016}.

 For the purpose of this project the choice has been made to use the
-\textit{\acrlong{se} Kernel}, as it provides a very good balance of versatily
+\textit{\acrlong{se} Kernel}, as it provides a very good balance of versatility
 and computational complexity for the modelling of the CARNOT building.

 \subsection{Lengthscales}\label{sec:lengthscales}
@ -125,10 +125,10 @@ difference the value of relative lengthscale importance is introduced:

 Another indicator of model behaviour is the variance of the identified
 \acrshort{se} kernel. The expected value of the variance is around the variance
-of the inputs. An extremenly high or extremely low value of the variance could
+of the inputs. An extremely high or extremely low value of the variance could
 mean a numerically unstable model.

-Table~\ref{tab:GP_hyperparameters} presents the relative lengthscale imporances
+Table~\ref{tab:GP_hyperparameters} presents the relative lengthscale importances
 and the variance for different combinations of the exogenous input lags ($l_w$),
 the controlled input lags ($l_u$) and the output lags ($l_y$) for a classical
 \acrshort{gp} model. 
@ -168,7 +168,7 @@ the controlled input lags ($l_u$) and the output lags ($l_y$) for a classical
 In general, the results of Table~\ref{tab:GP_hyperparameters} show that the
 past outputs are important when predicting future values. Of importance is also
 the past inputs, with the exception of the models with very high variance, where
-the relative importances stay almost constant accross all the inputs. For the
+the relative importances stay almost constant across all the inputs. For the
 exogenous inputs, the outside temperature ($w2$) is generally more important
 than the solar irradiation ($w1$). In the case of more autoregressive lags for
 the exogenous inputs, the more recent information is usually more important,
@ -220,10 +220,10 @@ presented in Table~\ref{tab:SVGP_hyperparameters}:
 \label{tab:SVGP_hyperparameters}
 \end{table}

-The results of Table~\ref{tab:SVGP_hyperparameters} are not very suprising, even
+The results of Table~\ref{tab:SVGP_hyperparameters} are not very surprising, even
 if very different from the classical \acrshort{gp} case. The kernel variance is
 always of a reasonable value, and the relative importance of the lengthscales is
-relatively constant accross the board. It is certainly harder to interpret these
+relatively constant across the board. It is certainly harder to interpret these
 results as pertaining to the relevance of the chosen regressors. For the
 \acrshort{svgp} model, the choice of the autoregressive lags has been made
 purely on the values of the loss functions, presented in
@ -264,11 +264,11 @@ While the \acrshort{rmse} and the \acrshort{smse} are very good at ensuring the
 predicted mean value of the Gaussian Process is close to the measured values of
 the validation dataset, the confidence of the Gaussian Process prediction is
 completely ignored. In this case two models predicting the same mean values, but
-having very differnt confidence intervals would be equivalent according to these
+having very different confidence intervals would be equivalent according to these
 performance metrics.

 The \acrfull{lpd} is a performance metric which takes into account not only the
-the mean value of the GP prediction, but the entire distribution:
+mean value of the GP prediction, but the entire distribution:

 \begin{equation}
    \text{LPD} = \frac{1}{2} \ln{\left(2\pi\right)} + \frac{1}{2N}
@ -283,7 +283,7 @@ overconfident models get penalized more than the more conservative models for
 the same mean prediction error, leading to models that better represent
 the real system. 

-The \acrfull{msll} is obtained by substacting the loss of the model that
+The \acrfull{msll} is obtained by subtracting the loss of the model that
 predicts using a Gaussian with the mean $E(\boldsymbol{y})$ and variance
 $\sigma_y^2$ of the measured data from the model \acrshort{lpd} and taking the
 mean of the obtained result:
@ -334,19 +334,17 @@ number of different lag combinations give rise to models with very large
 \acrshort{msll}/\acrshort{lpd} values. This might indicate that those models are
 overconfident, either due to the very large kernel variance parameter, or the
 specific lengthscales combinations. The model with the best
-\acrshort{rmse}/\acrshort{smse} metrics $\mathcal{M}$($l_w = 1$, $l_u = 2$, $l_y
-= 3$) had very bad \acrshort{msll} and \acrshort{lpd} metrics, as well as by far
-the largest variance of all the combinations. On the contrary the
-$\mathcal{M}$($l_w = 3$, $l_u = 1$, $l_y = 3$) model has the best
-\acrshort{msll} and \acrshort{lpd} performance, while still maintaining small
-\acrshort{rmse} and \acrshort{smse} values. The inconvenience of this set of
-lags is the large number of regressors, which leads to much more expensive
+\acrshort{rmse}/\acrshort{smse} metrics \model{1}{2}{3} had very bad
+\acrshort{msll} and \acrshort{lpd} metrics, as well as by far the largest
+variance of all the combinations. On the contrary the \model{3}{1}{3} model has
+the best \acrshort{msll} and \acrshort{lpd} performance, while still maintaining
+small \acrshort{rmse} and \acrshort{smse} values. The inconvenience of this set
+of lags is the large number of regressors, which leads to much more expensive
 computations. Other good choices for the combinations of lags are
-$\mathcal{M}$($l_w = 2$, $l_u = 1$, $l_y = 3$) and $\mathcal{M}$($l_w = 1$, $l_u
-= 1$, $l_y = 3$), which have good performance on all four metrics, as well as
-being cheaper from a computational perspective. In order to make a more informed
-choice for the best hyperparamerers, the performance of all three combinations
-has been analysed.
+\model{2}{1}{3} and \model{1}{1}{3}, which have good performance on all four
+metrics, as well as being cheaper from a computational perspective. In order to
+make a more informed choice for the best hyperparameters, the performance of all
+three combinations has been analysed.

 \clearpage

@ -375,18 +373,16 @@ has been analysed.
 \end{table}

 The results for the \acrshort{svgp} model, presented in
-Table~\ref{tab:SVGP_loss_functions} are much less ambiguous. The
-$\mathcal{M}$($l_w = 1$, $l_u = 2$, $l_y = 3$) model has the best performance
-according to all four metrics, with most of the other combinations scoring much
-worse on the \acrshort{msll} and \acrshort{lpd} loss functions. This has
-therefore been chosen as the model for the full year simulations.
+Table~\ref{tab:SVGP_loss_functions} are much less ambiguous. The \model{1}{2}{3}
+model has the best performance according to all four metrics, with most of the
+other combinations scoring much worse on the \acrshort{msll} and \acrshort{lpd}
+loss functions. This has therefore been chosen as the model for the full year
+simulations.


-\subsection{Validation of hyperparameters}
+\subsection{Validation of hyperparameters}\label{sec:validation_hyperparameters}

-% TODO: [Hyperparameters] Validation of hyperparameters
-
-The validation step has the purpose of testing the fiability of the trained
+The validation step has the purpose of testing the viability of the trained
 models. If choosing a model according to loss function values on a new dataset
 is a way of minimizing the possibility of over fitting the model to the training
 data, validating the model by analyzing its multi-step prediction performance
@ -402,55 +398,103 @@ the discrepancies.

 \subsubsection{Conventional Gaussian Process}

+The simulation performance of the three lag combinations chosen for the
+classical \acrlong{gp} models has been analysed, with the results presented in
+Figures~\ref{fig:GP_113_multistep_validation},~\ref{fig:GP_213_multistep_validation}
+and~\ref{fig:GP_313_multistep_validation}. For reference, the one-step ahead
+predictions for the training and test datasets are presented in
+Appendix~\ref{apx:hyperparams_gp}.
+

 \begin{figure}[ht]
    \centering
    \includegraphics[width =
    \textwidth]{Plots/GP_113_-1pts_test_prediction_20_steps.pdf}
-    \caption{}
-    \label{fig:GP_multistep_validation}
+    \vspace{-25pt}
+    \caption{20-step ahead simulation for \model{1}{1}{3}}
+    \label{fig:GP_113_multistep_validation}
 \end{figure}

+In the case of the simplest model (cf.
+Figure~\ref{fig:GP_113_multistep_validation}), overall the predictions are quite
+good. The large deviation from true values starts happening at around 15 steps.
+This could impose an additional limit on the size of the control horizon of the
+\acrlong{ocp}.
+
 \begin{figure}[ht]
    \centering
    \includegraphics[width =
    \textwidth]{Plots/GP_213_-1pts_test_prediction_20_steps.pdf}
-    \caption{}
+    \vspace{-25pt}
+    \caption{20-step ahead simulation for \model{2}{1}{3}}
    \label{fig:GP_213_multistep_validation}
 \end{figure}

+The more complex model, presented in
+Figure~\ref{fig:GP_213_multistep_validation} has a much better prediction
+performance, with only two predictions out of a total of twenty five diverging
+at the later steps. Except for the late-stage divergence on the two predictions,
+this proves to be the best simulation model.
+
 \begin{figure}[ht]
    \centering
    \includegraphics[width =
    \textwidth]{Plots/GP_313_-1pts_test_prediction_20_steps.pdf}
-    \caption{}
+    \vspace{-25pt}
+    \caption{20-step ahead simulation for \model{3}{1}{3}}
    \label{fig:GP_313_multistep_validation}
 \end{figure}

+Lastly, \model{3}{1}{3} has a much worse simulation performance than the other
+two models. This could hint at an over fitting of the model on the training data.
+This is consistent with the results found in Table~\ref{tab:GP_loss_functions}
+for the \acrshort{rmse} and \acrshort{smse}, as well as can be seen in
+Appendix~\ref{apx:hyperparams_gp}, Figure~\ref{fig:GP_313_test_validation},
+where the model has much worse performance on the testing dataset predictions
+than the other two models.
+
+Overall, the performance of the three models in simulation mode is consistent
+with the previously found results. It is of note that neither the model that
+performed the best on the \acrshort{rmse}/\acrshort{smse}, \model{1}{2}{3}, nor
+the one that had the best \acrshort{msll}/\acrshort{lpd}, perform the best under
+a simulation scenario. In the case of the former it is due to numerical
+instability, the training/ prediction often failing depending on the inputs. On
+the other hand, in the case of the latter, only focusing on the
+\acrshort{msll}/\acrshort{lpd} performance metrics can lead to over fitted
+models, that give good and confident one-step ahead predictions, while still
+unable to fit the true behaviour of the plant.
+
 \clearpage

    \subsubsection{Sparse and Variational Gaussian Process}

-%\begin{figure}[ht]
-%    \centering
-%    \includegraphics[width = \textwidth]{Plots/SVGP_123_training_performance.pdf}
-%    \caption{}
-%    \label{fig:SVGP_train_validation}
-%\end{figure}
-%
-%\begin{figure}[ht]
-%    \centering
-%    \includegraphics[width = \textwidth]{Plots/SVGP_123_test_performance.pdf}
-%    \caption{}
-%    \label{fig:SVGP_test_validation}
-%\end{figure}
+For the \acrshort{svgp} models, only the performance of \model{1}{2}{3} was
+investigated, since it had the best performance according to all four loss
+metrics. 
+
+As a first validation step, it is of note that the \acrshort{svgp} model was
+able to accurately reproduce the training dataset with only 150 inducing
+locations (cf.  Appendix~\ref{apx:hyperparams_svgp}). It also performs about as
+well as the better \acrshort{gp} models for the one step prediction on the
+testing datasets.
+
+In the case of the simulation performance, presented in
+Figure~\ref{fig:SVGP_multistep_validation}, two things are of particular
+interest. First, all 25 simulations have good overall behaviour --- there are no
+simulations starting to exhibit erratic behaviour --- this is a good indicator
+for lack of over fitting. This behaviour is indicative of a more conservative
+model than the ones identified for the \acrshort{gp} models. It is also possible
+to conclude that given the same amount of data, the classical \acrshort{gp}
+models can better learn plant behaviour, provided the correct choice of
+regressors.

 \begin{figure}[ht]
    \centering
    \includegraphics[width =
    \textwidth]{Plots/SVGP_123_test_prediction_20_steps.pdf}
-    \caption{}
+    \caption{20-step ahead simulation for \model{1}{2}{3}}
    \label{fig:SVGP_multistep_validation}
 \end{figure}

+
 \clearpage
--- a/70_Implementation.tex
+++ b/70_Implementation.tex
@ -4,7 +4,7 @@ This section goes into the details of the implementation of the Simulink plant
 and Python controller setup.

 A high-level view of the setup is presented in Figure~\ref{fig:setup_diagram}.
-The Simulink model's main responsability is running the CARNOT simulation. It
+The Simulink model's main responsibility is running the CARNOT simulation. It
 also has the task of providing the \acrshort{mpc} with information on the
 weather forecast, since the weather information for the simulation comes from a
 CARNOT \acrshort{wdb} object. A detailed view of all the information available
@ -62,7 +62,7 @@ starting and ending points, while retaining a simple implementation.
 \subsection{Gaussian Processes}

 As described in Section~\ref{sec:gaussian_processes}, both training and
-evaluating a \acrshort{gp} has an algotirhmic complexity of $\mathcal{O}(n^3)$.
+evaluating a \acrshort{gp} has an algorithmic complexity of $\mathcal{O}(n^3)$.
 This means that naive implementations can get too expensive in terms of
 computation time very quickly.

@ -70,7 +70,7 @@ In order to have as smallest of a bottleneck as possible when dealing with
 \acrshort{gp}s, a very optimized implementation of \acrlong{gp} Models was
 used, in the form of GPflow~\cite{matthewsGPflowGaussianProcess2017}. It is
 based on TensorFlow~\cite{tensorflow2015-whitepaper}, which has very efficient
-imeplentation of all the necessary Linear Algebra operations. Another benefit of
+implementation of all the necessary Linear Algebra operations. Another benefit of
 this implementation is the very simple use of any additional computational
 resources, such as a GPU, TPU, etc.

@ -86,7 +86,7 @@ used for \acrshort{svgp} models.

 \subsubsection{Sparse and Variational Gaussian Process training}

-The \acrshort{svgp} models have a more involved oprimization procedure due to to
+The \acrshort{svgp} models have a more involved optimization procedure due to to
 several factors. First, when training an \acrshort{svgp} model, the optimization
 objective is the value of the \acrshort{elbo} (cf. Section~\ref{sec:elbo}).
 After several implementations, the more complex \textit{Adam} optimizer turned
@ -147,7 +147,7 @@ The optimization problem as presented in
 Equation~\ref{eq:optimal_control_problem} becomes very nonlinear quite fast. In
 fact, due to the autoregressive structure of the \acrshort{gp}, the predicted
 temperature at time t is passed as an input to the model at time $t+1$. A simple
-recursive implementation of the Optimization Problem becomes untractable after
+recursive implementation of the Optimization Problem becomes intractable after
 only 3 --- 4 prediction steps. 

 In order to solve this problem, a new OCP is introduced. It has a much sparser
@ -197,4 +197,6 @@ For the case of the \acrshort{svgp}, a new model is trained once enough data is
 gathered. The implementations tested were updated once a day, either on the
 whole historical set of data, or on a window of the last five days of data.

+% TODO [Implementation] Add info on scaling
+
 \clearpage
--- a/80_Results.tex
+++ b/80_Results.tex
@ -1,6 +1,47 @@
-\section{Results}
+\section{Results}\label{sec:results}

-\subsection{Conventional Gaussian Processes}
+
+% TODO [Results] Add info on control horizon
+
+This section focuses on the presentation and interpretation of the year-long
+simulation of the control schemes present previously.
+
+Section~\ref{sec:GP_results} analyses the results of a conventional
+\acrlong{gp} Model trained on the first five days of gathered data. The models
+is then used for the rest of the year, with the goal of tracking the defined
+reference temperature.
+
+Section~\ref{sec:SVGP_results} goes into details on the analysis of the Learning
+scheme using a \acrshort{svgp} Model. In this scenario, the model is first
+trained on the first five days of data, and updates every day at midnight with
+the new information gathered from closed-loop operation.
+
+\subsection{Conventional Gaussian Processes}\label{sec:GP_results}
+
+The first simulation, to be used as a baseline comparison with the
+\acrshort{svgp} Models developed further consists of using a `static'
+\acrshort{gp} model trained on five days worth of experimental data. This model
+is then employed for the rest of the year.
+
+With a sampling time of 15 minutes, the model is trained on 480 points of data.
+This size of the identification dataset is enough to learn the behaviour of the
+plant, without being too complex to solve from a numerical perspective, the
+current implementation takes roughly 1.5 seconds of computation time per step.
+For reference, identifying a model on 15 days worth of experimental data (1440
+points) makes simulation time approximately 11 --- 14 seconds per step, or
+around eight time slower. This is consistent with the $\mathcal{O}(n^3)$
+complexity of evaluating a \acrshort{gp}.
+
+The results of the simulation are presented in
+Figure~\ref{fig:GP_fullyear_simulation}. Overall, the performance of this model
+is not very good. The tracked temperature presents an offset of around 0.5
+$\degree$C in the stable part of the simulation. The offset becomes much larger
+once the reference temperature starts moving from the initial constant value.
+The controller becomes completely unstable around the middle of July, and can
+only regain some stability at the middle of October. It is also possible to note
+that from mid-October --- end-December the controller has very similar
+performance to that exhibited in the beginning of the year, namely January ---
+end-February.

 \begin{figure}[ht]
    \centering
@ -10,19 +51,102 @@
    \label{fig:GP_fullyear_simulation}
 \end{figure}

+This very large difference in performance could be explained by the change in
+weather during the year. The winter months of the beginning of the year and end
+of year exhibit similar performance, the spring months already make the
+controller less stable than at the start of the year, while the drastic
+temperature changes in the summer make the controller completely unstable.
+
+\clearpage
+
+
+Figure~\ref{fig:GP_fullyear_abserr} presents the absolute error measured at each
+step of the simulation over the course of the year. We can note a mean absolute
+error of 1.33 $\degree$C, with the largest deviations occurring in late summer
+where the absolute error can reach extreme values, and the `best' performance
+occurring during the winter months. 
+
+\begin{figure}[ht]
+    \centering
+    \includegraphics[width =
+    \textwidth]{Plots/4_GP_480pts_12_averageYear_abserr.pdf}
+    \caption{GP full year absolute error}
+    \label{fig:GP_fullyear_abserr}
+\end{figure}
+
+Figure~\ref{fig:GP_first_model_performance} analyses the 20-step ahead
+simulation performance of the identified model over the course of the year. At
+experimental step 250 the controller is still gathering data. It is therefore
+expected that the identified model will be capable of reproducing this data. At
+step 500, 20 steps after identification, the model correctly steers the internal
+temperature towards the reference temperature. On the flip side, already at
+experimental steps 750 and 1000, only 9 days into the simulation, the model is
+unable to properly simulate the behaviour of the plant, with the maximum
+difference at the end of the simulation reaching 0.75 and 1.5 $\degree$C
+respectively.
+
 \begin{figure}[ht]
    \centering
    \includegraphics[width =
    \textwidth]{Plots/4_GP_480pts_12_averageYear_first_model_performance.pdf}
-    \caption{GP first model performance}
+    \caption{GP model performance}
    \label{fig:GP_first_model_performance}
 \end{figure}

-\clearpage
+This large difference of performance could be explained by the change in outside
+weather (Solar Irradiance and Outside Temperature --- the exogenous inputs) from
+the one present during the training phase. It can be seen in
+Figure~\ref{fig:Dataset_outside_temperature} that already at 500 points in the
+simulation both the GHI and the Outside Temperature are outside of the training
+ranges, with the latter exhibiting a much larger variation. 

-\subsection{Adaptive scheme with SVGP}

-\subsubsection{RENAME ME- All data}
+\begin{figure}[ht]
+    \centering
+    \includegraphics[width =
+    \textwidth]{Plots/Exogenous_inputs_fullyear.pdf}
+    \caption{Exogenous inputs for the simulation}
+    \label{fig:Dataset_outside_temperature}
+\end{figure}
+
+Finally, it is possible to conclude that this approach does not perform well due
+to several causes:
+
+\begin{itemize}
+    \item The size of the training dataset is limited by the computation budget
+    \item The model does not extrapolate correctly the information on
+        disturbances
+    \item The model stays fixed for the duration of the year, being unable to
+        adapt to new weather conditions.
+\end{itemize}
+
+These problems could be solved in several ways, such as periodically
+re-identifying the model to fit the current weather pattern. This approach would
+be quite cumbersome due to repeated need of disturbing the model in order to
+sufficiently excite it. Another approach would be to keep the whole historical
+dataset of measurements, which quickly renders the problem intractable. More
+complex solutions, such as keeping a fixed-size data dictionary whose points are
+deleted when they no longer help the predictions and new points are added as
+they are deemed useful or compiling the training dataset with multiple
+experiments in different weather conditions could dramatically improve model
+performance, but are more complex in implementation.
+
+
+\subsection{Sparse and Variational Gaussian Process}\label{sec:SVGP_results}
+
+The \acrlong{svgp} models are setup in a similar way as described before. The
+model is first identified using 5 days worth of experimental data collected
+using a \acrshort{pi} controller and a random disturbance signal. The difference
+lies in the fact than the \acrshort{svgp} model gets re-identified every night
+at midnight using the newly accumulated data from closed-loop operation.
+
+The results of this setup are presented in
+Figure~\ref{fig:SVGP_fullyear_simulation}. It can already be seen that this
+setup performs much better than the initial one. The only large deviations from
+the reference temperature are due to cold --- when the \acrshort{hvac}'s limited
+heat capacity is unable to maintain the proper temperature.
+
+% TODO: [Results] Add info on SVGP vs GP computation speed

 \begin{figure}[ht]
    \centering
@ -32,6 +156,40 @@
    \label{fig:SVGP_fullyear_simulation}
 \end{figure}

+\clearpage
+
+Comparing the Absolute Error of the Measured vs Reference temperature for the
+duration of the experiment (cf. Figure~\ref{fig:SVGP_fullyear_abserr}) with the
+one of the original experiment, the average absolute error is reduced from 1.33
+$\degree$C to only 0.05 $\degree$C, with the majority of the values being lower
+than 0.4 $\degree$ C. 
+
+\begin{figure}[ht]
+    \centering
+    \includegraphics[width =
+    \textwidth]{Plots/1_SVGP_480pts_inf_window_12_averageYear_abserr.pdf}
+    \caption{SVGP full year absolute error}
+    \label{fig:SVGP_fullyear_abserr}
+\end{figure}
+
+Figures~\ref{fig:SVGP_first_model_performance},
+~\ref{fig:SVGP_later_model_performance}
+and~\ref{fig:SVGP_last_model_performance} show the 20-step simulation performance of three
+different models, identified at three different stages of the experiment. They
+have all been set to simulate 25 consecutive experimental steps starting at
+steps 250, 500, 10750 and 11000 respectively.
+
+The initial model (cf. Figure~\ref{fig:SVGP_first_model_performance}),
+identified after the first five days has the worst performance. It is unable to
+correctly simulate even the learning dataset. This behaviour is similar to that
+discovered in Figure~\ref{fig:SVGP_multistep_validation}
+(cf. Section~\ref{sec:validation_hyperparameters}), where the \acrshort{svgp}
+model performed worse than the equivalent \acrshort{gp} trained on the same
+dataset. It also performs worse than the initial \acrshort{gp} model in the rest
+of the simulations, being unable to correctly predict the heating to reference
+at step 500, and having maximum errors of around 10 $\degree$C for the simulations
+starting at 107500 and 11000 points.
+
 \begin{figure}[ht]
    \centering
    \includegraphics[width =
@ -40,6 +198,18 @@
    \label{fig:SVGP_first_model_performance}
 \end{figure}

+\clearpage
+
+Figure~\ref{fig:SVGP_later_model_performance} shows the performance of the 100th
+trained model (i.e the model trained on April 15). This model performs much
+better in all simulations. It is able to correctly simulate the 20-step
+behaviour of the plant over all the experimental steps in the first two cases.
+It still has a noticeable error when predicting the behaviour of the plant on
+new data (i.e. simulations starting at steps 10750 and 11000), but it is much
+less than before. This gives a hint at the fact that the \acrshort{svgp} model's
+performance ameliorates throughout the year, but it does require much more data
+than the classical \acrshort{gp} model to capture the building dynamics.
+
 \begin{figure}[ht]
    \centering
    \includegraphics[width =
@ -48,6 +218,13 @@
    \label{fig:SVGP_later_model_performance}
 \end{figure}

+The last model is trained on the whole-year dataset.
+Figure~\ref{fig:SVGP_last_model_performance} shows its performance for the same
+situation described before. The model is able to predict the plant's behaviour
+at steps 250 and 500 even better than before, as well as predict the behaviour
+at steps 10750 and 11000 with maximum error of 0.6 $\degree$C and 0.1 $\degree$C
+respectively.
+
 \begin{figure}[ht]
    \centering
    \includegraphics[width =
@ -56,9 +233,160 @@
    \label{fig:SVGP_last_model_performance}
 \end{figure}

+The analysis of the model evolution as more data gets gathered already gives
+very good insight into the strengths and weaknesses of this approach. The
+initial model is unable to correctly extrapolate the plant's behaviour in new
+regions of the state space. Also, given the same amount of data, the
+\acrshort{svgp} model is able to capture less information about the plant
+dynamics than the equivalent \acrshort{gp} model. On the flip side, re-training
+the model every day with new information is able to mitigate this by adding the
+data in new regions as it gets discovered while being able to maintain constant
+training and evaluation cost.
+
+A more in depth analysis of the evolution of the \acrshort{svgp} hyperparameters
+over the duration of the experiment is presented in
+Section~\ref{sec:lengthscales_results}.
+
+A few questions arise naturally after investigating the performance of this
+control scheme: 
+
+\begin{itemize}
+    \item If the model is able to correctly understand data gathered in
+        closed-loop operation, will the performance deteriorate drastically if
+        the first model is trained on less data?
+    \item How much information can the model extract from closed-loop operation?
+        Would a model trained on only the last five days of closed-loop
+        operation data be able to perform correctly?
+\end{itemize}
+
+These questions will be further analysed in the Sections~\ref{sec:svgp_window}
+and~\ref{sec:svgp_96pts} respectively.
+
 \clearpage

-\subsubsection{RENAME ME- 480pts window}
+\subsubsection{Lengthscales}\label{sec:lengthscales_results}
+
+Figure~\ref{fig:SVGP_evol_importance} provides a deeper insight into the
+evolution of the relative importance of the \acrshort{svgp} regressors over the
+course of the full-year simulation\footnotemark. A few remarks are immediate:
+the importance of most hyperparameters changes drastically the first few
+iterations, until reaching a more steady change pace, until around the month of
+July where most of the hyperparameters settle for the rest of the simulation.
+This behaviour could be explained by the model learning new regions of the state
+space (i.e the span of the \acrshort{ghi} and Outside Temperatures) over the
+first months as these values change, and remaining more constant once it has
+already gathered information on these different operating points.
+
+\footnotetext{The evolution of the \textit{hyperparameters} is provided for
+reference in Annex~\ref{anx:hyperparams_evol}.}
+
+\begin{figure}[ht]
+    \centering
+    \includegraphics[width =
+    \textwidth]{Plots/1_SVGP_480pts_inf_window_12_averageYear_evol_importance.pdf}
+    \caption{Evolution of SVGP model parameters}
+    \label{fig:SVGP_evol_importance}
+\end{figure}
+
+As seen in Figure~\ref{fig:SVGP_evol_importance}, the variance of the
+\acrshort{se} kernel steadily decreases, until reaching a plateau, which
+signifies the increase in confidence of the model. The hyperparameters
+corresponding to the exogenous inputs --- $w1,1$ and $w1,2$ --- become generally
+less important for future predictions over the course of the year, with the
+importance of $w1,1$, the \acrlong{ghi}, climbing back up over the last, colder
+months. This might be due to the fact that during the colder months, the
+\acrshort{ghi} is the only way for the exogenous inputs to \textit{provide}
+additional heat to the system.
+
+A similar trend can be observed for the evolution of the input's
+hyperparameters, with the exception that the first lag of the controlled input,
+$u1,1$ remains the most important over the course of the year.
+
+For the lags of the measured output it can be seen that, over the course of the
+year, the importance of the first lag decreases, while that of the second and
+third lag increase --- until all three reach relatively similar values.
+
+Another interesting comparison is provided by looking at the possible values of
+the \acrshort{se} kernel components. Since all the values are normalized within
+the -1 to 1 range, it is unlikely that any two points will be a distance higher
+than 2 apart. It is possible then to plot the values of the kernel terms due to
+each regressor as a function of their distance. This is done in
+Figure~\ref{fig:SVGP_first_covariance} for the first identified model and in
+Figure~\ref{fig:SVGP_last_covariance} for the last. It is clear that in both
+cases the kernel terms behave mostly linearly, with the exception of two points
+being close to each other, when the correlation remains stronger before it
+starts diminishing.
+
+\begin{figure}[ht]
+    \centering
+    \includegraphics[width =
+    \textwidth]{Plots/1_SVGP_480pts_inf_window_12_averageYear_first_covariance.pdf}
+    \caption{SVGP model first covariance parameters}
+    \label{fig:SVGP_first_covariance}
+\end{figure}
+
+As for the last model, it can be noted that only the scale of the kernel terms
+changes, with their shape remaining consistent with the first identified model.
+This means that the model does not get much more complex as the data is
+gathered, but instead the same general structure is kept, with further
+refinements being done as data is added to the system.
+
+\begin{figure}[ht]
+    \centering
+    \includegraphics[width =
+    \textwidth]{Plots/1_SVGP_480pts_inf_window_12_averageYear_last_covariance.pdf}
+    \caption{SVGP model last covariance parameters}
+    \label{fig:SVGP_last_covariance}
+\end{figure}
+
+One question that could be addressed given these mostly linear kernel terms is
+how well would an \acrshort{svgp} model perform with a linear kernel.
+Intuition would hint that it should still be able to track the reference
+temperature, albeit not as precisely due to the correlation that diminished much
+slower when the two points are closer together in the \acrshort{se} kernel. This
+will be further investigated in Section~\ref{sec:svgp_linear}.
+
+\clearpage
+
+\subsection{SVGP with one day of starting data}\label{sec:svgp_96pts}
+
+As previously discussed in Section~\ref{sec:SVGP_results}, the \acrshort{svgp}
+model is able to properly adapt given new information, overtime refining it's
+understanding of the plant's dynamics.
+
+Analyzing the results of a simulation done on only one day's worth of initial
+simulation data (cf. Figures~\ref{fig:SVGP_96pts_fullyear_simulation}
+and~\ref{fig:SVGP_96pts_abserr}) it is very notable that the model performs
+almost identically to the one identified in the previous sections. This
+nightlights one of the practical benefits of the \acrshort{svgp} implementations
+compared to the classical \acrlong{gp} -- it is possible to start with a more
+rough controller trained on less data and refine it over time, reducing the need
+for cumbersome and potentially costly initial experiments for gathering data.
+
+\begin{figure}[ht]
+    \centering
+    \includegraphics[width =
+    \textwidth]{Plots/6_SVGP_96pts_inf_window_12_averageYear_fullyear.pdf}
+    \caption{One Day SVGP full year simulation}
+    \label{fig:SVGP_96pts_fullyear_simulation}
+\end{figure}
+
+\begin{figure}[ht]
+    \centering
+    \includegraphics[width =
+    \textwidth]{Plots/6_SVGP_96pts_inf_window_12_averageYear_abserr.pdf}
+    \caption{One Day SVGP Absolute Error}
+    \label{fig:SVGP_96pts_abserr}
+\end{figure}
+
+\subsection{SVGP with a five days moving window}\label{sec:svgp_window}
+
+This section presents the result of running a different control scheme. Here,
+as the base \acrshort{svgp} model, it is first trained on 5 days worth of data,
+with the difference being that each new model is only identified using the last
+five days' worth of data. This should provide an insight on whether the
+\acrshort{svgp} model is able to understand model dynamics only based on
+closed-loop operation.

 \begin{figure}[ht]
    \centering
@ -68,23 +396,87 @@
    \label{fig:SVGP_480window_fullyear_simulation}
 \end{figure}

-\clearpage
+As it can be seen in Figure~\ref{fig:SVGP_480window_fullyear_simulation}, this
+model is unable to properly track the reference temperature. In fact, five days
+after the identification the model forgets all the initial data and becomes
+unstable. This instability then generates enough excitation of the plant for the
+model to again learn its behaviour. This cycle repeats every five days, when the
+controller becomes unstable. In the stable regions, however, the controller is
+able to track the reference temperature. 

-\subsubsection{RENAME ME- 96pts starting data}
+\subsection{SVGP with Linear Kernel}\label{sec:svgp_linear}
+
+The last model to be investigated is the \acrshort{svgp} with Linear Kernel. As
+it was suggested previously, the terms of the originally identified
+\acrshort{svgp} model are not very complex, leading to the question whether a
+pure linear kernel could suffice to understand the plant's behaviour.
+
+Figure~\ref{fig:SVGP_linear_fullyear_simulation} shows the results of the
+full-year simulation. While this controller is still able to track the reference
+temperature, it shows a much larger variance in the measured values than the
+\acrshort{se} kernel \acrshort{svgp} model. This confirms the previous
+suspicions that a pure linear model would not be able to capture the more
+nuanced details of the CARNOT model dynamics.

 \begin{figure}[ht]
    \centering
    \includegraphics[width =
-    \textwidth]{Plots/6_SVGP_96pts_inf_window_12_averageYear_fullyear.pdf}
+    \textwidth]{Plots/10_SVGP_480pts_inf_window_12_averageYear_LinearKernel_fullyear.pdf}
    \caption{SVGP full year simulation}
-    \label{fig:SVGP_96pts_fullyear_simulation}
+    \label{fig:SVGP_linear_fullyear_simulation}
 \end{figure}

 \clearpage

-\subsection{Qualitative analysis}
+\subsection{Comparative analysis}

-\subsection{Quantitative analysis}
+This section will compare all the results presented in the previous Sections and
+try to analyze the differences and their origin.
+
+Presented in Table~\ref{tab:Model_comparations} are the Mean Error, Error
+Variance and Mean Absolute Error for the full year simulation for the three
+stable \acrshort{svgp} models, as well as the classical \acrshort{gp} model. 
+
+\begin{table}[ht]
+%\vspace{-8pt}
+\centering
+    \begin{tabular}{||c c c c||}
+        \hline
+        Model & Mean Error [$\degree$C] & Error Variance [$\degree$C] & Mean
+        Absolute Error [$\degree$C]\\
+        \hline \hline
+        GP & 5.08 & 6.88 & 1.330 \\ 
+        SVGP (5 days) & -0.06 & 0.25 & 0.055 \\ 
+        SVGP (1 day) & -0.04 & 0.24 & 0.050 \\ 
+        SVGP (Linear)& -0.03 & 0.29 & 0.093 \\ 
+        \hline
+    \end{tabular}
+\caption{Full-year model performance comparison}
+\label{tab:Model_comparations}
+\end{table}
+
+The worst performing model, as noted previously, is the \acrshort{gp} model. The
+\acrshort{svgp} with Linear Kernel results in a stable model with a mean error
+very close to zero, which means no constant bias/ offset. This model has the
+highest error variance of all the identified \acrshort{svgp} models, which was
+also noted beforehand from qualitative observations. It is therefore possible to
+conclude that a Linear Kernel does not suffice for properly modeling the
+dynamics of the CARNOT model.
+
+The two \acrshort{svgp} models with \acrlong{se} kernels perform the best. They
+have a comparable performance, with very small differences in Mean Absolute
+Error and Error variance. This leads to the conclusion that the \acrshort{svgp}
+models can be deployed with less explicit identification data, but they will
+continue to improve over the course of the year as the building passes through
+different regions of the state space and more data is collected.
+
+These results do not, however, discredit the use of \acrlong{gp} for use in a
+multi-seasonal situation. As shown before, given the same amount of data and
+ignoring the computational cost, they perform better than the alternative
+\acrshort{svgp} models. The bad initial performance could be mitigated by
+sampling the identification data at different points in time during multiple
+experiments, updating a fixed-size dataset based on the gained information, as
+well as more cleverly designing the kernel to include prior information.


 \clearpage
--- a/90_Further_Research.tex
+++ b/90_Further_Research.tex
@ -1,4 +1,42 @@
 \section{Further Research}

+Section~\ref{sec:results} has presented and compared the results of a full-year
+simulation for a classical \acrshort{gp} model, as well as a few incarnations of
+\acrshort{svgp} models. The results show that the \acrshort{svgp} have much
+better performance, mainly due to the possibility of updating the model
+throughout the year. The \acrshort{svgp} models also present a computational
+cost advantage both in training and in evaluation due to several approximations
+shown in Section~\ref{sec:gaussian_processes}.

-\clearpage
+Focusing on the \acrlong{gp} models, there could be several ways of improving
+its performance, as noted previously: a more varied identification dataset and
+smart update of a fixed-size data dictionary according to information gain could
+mitigate the present problems.
+
+Using a Sparse \acrshort{gp} without also replacing the maximum log likelihood
+with the \acrshort{elbo} could improve performance of the \acrshort{gp} model at
+the expense of training time.
+
+An additional change that could be made is inclusion of the most amount of prior
+information possible through setting a more refined kernel, as well as adding
+prior information on all the model hyperparameters when available. This approach
+however goes against the "spirit" of black-box approaches since significant
+insight into the physics of the plant is required in order to properly model and
+implement this information.
+
+On the \acrshort{svgp} side, several changes could also be proposed, which were
+not properly addressed in this work. First, the size of the inducing dataset was
+chosen experimentally until it was found to accurately reproduce the manually
+collected experimental data. In order to better use the available computational
+resources, this value could be found programmatically in a way to minimize
+evaluation time while still providing good performance. Another possibility is
+the periodic re-evaluation of this value when new data comes in, since as more
+and more data is collected the model becomes more complex, and in general more
+inducing locations could be necessary to properly reproduce the training data.
+
+Finally, none of the presented controllers take into account occupancy rates or adapt to
+possible changes in the real building, such as adding or removing furniture,
+deteriorating insulation and so on. The presented update methods only deals with
+adding information on behaviour in different state space regions, i.e
+\textit{learning}, and their ability to \textit{adapt} to changes in the actual
+plant's behaviour should be further addressed.
--- a/99A_GP_hyperparameters_validation.tex
+++ b/99A_GP_hyperparameters_validation.tex
@ -1,8 +1,8 @@
 \clearpage

-\section{Hyperparameters validation for classical GP}
+\section{Hyperparameters validation for classical GP}\label{apx:hyperparams_gp}

-\subsection{113}
+\subsection{\texorpdfstring{\model{1}{1}{3}}{113}}

 \begin{figure}[ht]
    \centering
@ -20,7 +20,7 @@

 \clearpage

-\subsection{213}
+\subsection{\texorpdfstring{\model{2}{1}{3}}{213}}

 \begin{figure}[ht]
    \centering
@ -38,7 +38,7 @@

 \clearpage

-\subsection{313}
+\subsection{\texorpdfstring{\model{3}{1}{3}}{313}}

 \begin{figure}[ht]
    \centering
@ -54,4 +54,25 @@
    \label{fig:GP_313_test_validation}
 \end{figure}

+
+\clearpage
+
+\section{Hyperparameters validation for SVGP}\label{apx:hyperparams_svgp}
+
+\subsection{\texorpdfstring{\model{1}{2}{3}}{123}}
+
+\begin{figure}[ht]
+    \centering
+    \includegraphics[width = \textwidth]{Plots/SVGP_123_training_performance.pdf}
+    \caption{}
+    \label{fig:SVGP_train_validation}
+\end{figure}
+
+\begin{figure}[ht]
+    \centering
+    \includegraphics[width = \textwidth]{Plots/SVGP_123_test_performance.pdf}
+    \caption{}
+    \label{fig:SVGP_test_validation}
+\end{figure}
+
 \clearpage
--- a/99C_hyperparameters_results.tex
+++ b/99C_hyperparameters_results.tex
@ -0,0 +1,10 @@
+\section{SVGP hyperparameters evolution}\label{anx:hyperparams_evol}
+
+\begin{figure}[ht]
+    \centering
+    \includegraphics[width =
+    \textwidth]{Plots/1_SVGP_480pts_inf_window_12_averageYear_evol_hyperparameters.pdf}
+    \caption{GP last model performance}
+    \label{fig:SVGP_evol_hyperparameters}
+\end{figure}
+
--- a/Plots/Exogenous_inputs_fullyear.pdf
+++ b/Plots/Exogenous_inputs_fullyear.pdf
--- a/glossaries.tex
+++ b/glossaries.tex
@ -11,6 +11,9 @@

 \newacronym{wdb}{WDB}{Weather Data Bus}

+\newacronym{eer}{EER}{Energy Efficiency Ratio}
+\newacronym{cop}{COP}{Coefficient of Performance}
+
 \newacronym{hvac}{HVAC}{Heating and Ventilation System}
 \newacronym{dni}{DNI}{Direct Normal Irradiance}
 \newacronym{dhi}{DHI}{Diffuse Horizontal Irradiance}
--- a/main.tex
+++ b/main.tex
@ -112,6 +112,7 @@ temperature control}

 % Define new user commands
 \newcommand{\pdome}{Polyd\^ome}
+\newcommand{\model}[3]{$\mathcal{M}$($l_w = #1$, $l_u = #2$, $l_y = #3$)}
 \DeclarePairedDelimiter{\norm}{\lVert}{\rVert}

 \begin{document}
@ -138,4 +139,5 @@ temperature control}
 \printbibliography
 \appendix
 \input{99A_GP_hyperparameters_validation.tex}
+\input{99C_hyperparameters_results.tex}
 \end{document}
--- a/references.bib
+++ b/references.bib
@ -392,6 +392,24 @@
  number = {2}
 }

+@inproceedings{nghiemDatadrivenDemandResponse2017,
+  title = {Data-Driven Demand Response Modeling and Control of Buildings with {{Gaussian Processes}}},
+  booktitle = {2017 {{American Control Conference}} ({{ACC}})},
+  author = {Nghiem, Truong X. and Jones, Colin N.},
+  date = {2017-05},
+  pages = {2919--2924},
+  publisher = {{IEEE}},
+  location = {{Seattle, WA, USA}},
+  doi = {10.23919/ACC.2017.7963394},
+  url = {http://ieeexplore.ieee.org/document/7963394/},
+  urldate = {2019-06-09},
+  abstract = {This paper presents an approach to provide demand response services with buildings. Each building receives a normalized signal that tells it to increase or decrease its power demand, and the building is free to implement any suitable strategy to follow the command, most likely by changing some of its setpoints. Due to this freedom, the proposed approach lowers the barrier for any buildings equipped with a reasonably functional building management system to participate in the scheme. The response of the buildings to the control signal is modeled by a Gaussian Process, which can predict the power demand of the buildings and also provide a measure of its confidence in the prediction. A battery is included in the system to compensate for this uncertainty and improve the demand response performance of the system. A model predictive controller is developed to optimally control the buildings and the battery, while ensuring their operational constraints with high probability. Our approach is validated by realistic co-simulations between Matlab and the building energy simulator EnergyPlus.},
+  eventtitle = {2017 {{American Control Conference}} ({{ACC}})},
+  file = {/home/radu/Zotero/storage/FHIIJQWW/Nghiem și Jones - 2017 - Data-driven demand response modeling and control o.pdf},
+  isbn = {978-1-5090-5992-8},
+  langid = {english}
+}
+
@online{pinterestSphericalDomeCalculator,
  title = {Spherical {{Dome Calculator}}},
  author = {this on Pinterest, Dave South Share this via Email Share this on Twitter Share this on Facebook Share this on Reddit Share},