diff --git a/30_Gaussian_Processes_Background.tex b/30_Gaussian_Processes_Background.tex
index 71e13ff..74fbd19 100644
--- a/30_Gaussian_Processes_Background.tex
+++ b/30_Gaussian_Processes_Background.tex
@@ -126,7 +126,7 @@ observations and the fixed mean function:
     \sigma_n^2I\right)^{-1}(\mathbf{y} - \mathbf{m}(X)) \\
 \end{equation}
 
-\subsection{Kernels}
+\subsection{Kernels}\label{sec:Kernels}
 The choice of the kernel is an important part for any kernel machine class
 algorithm. It serves the purpose of shaping the behaviour of the \acrshort{gp}
 by imposing a desired level of smoothness of the resulting functions, a
diff --git a/50_Choice_of_Hyperparameters.tex b/50_Choice_of_Hyperparameters.tex
index ac6edbe..cdf133b 100644
--- a/50_Choice_of_Hyperparameters.tex
+++ b/50_Choice_of_Hyperparameters.tex
@@ -4,14 +4,231 @@ This section will discuss and try to validate the choice of all the
 hyperparameters necessary for the training of a \acrshort{gp} model to capture
 the CARNOT building's behaviour.
 
+The class of black-box models is very versatile, being able to capture plant
+behaviour directly from data. This comes in contrast to white-box and grey-box
+modelling techniques, which require much more physical insight into the plant's
+behaviour.
+
+The advantage of black-box models lies in the lack of physical parameters to be
+fitted. On the flip side, this versatility of being able to fit much more
+complex models putely on data comes at the cost of having to properly define the
+model hyperparameters: the number of regressors, the number of autoregressive
+lags for each class of inputs, the shape of the covariance function have to be
+taken into account when designing a \acrshort{gp} model. These choices have
+direct influence on the resulting model behaviour and where it can be
+generalized, as well as indirect influence in the form of more expensive
+computations in the case of larger number of regressors and more complex kernel
+functions.
+
 As described in Section~\ref{sec:gp_dynamical_system}, for the purpose of this
 project the \acrlong{gp} model will be trained using the \acrshort{narx}
-structure.
+structure. This already presents an important choice in the selection of
+regressors and their respective autoregressive lags.
 
-\subsection{Lengthscales}
+The output of the model has been chosen as the \textit{temperature measured
+inside} the CARNOT building. This is a suitable choice for the \acrshort{ocp}
+defined in Section~\ref{sec:mpc_problem}, where the goal is tracking as close as
+possible the inside temperature of the building.
+
+The input of the \acrshort{gp} model conincides with the input of the CARNOT
+building, namely the \textit{power} passed to the idealized \acrshort{hvac},
+which is held constant during the complete duration of a step.
+
+As for the exogenous inputs the choice turned out to be more complex. The CARNOT
+WDB format (cf. Section~\ref{sec:CARNOT_WDB}) consists of information of all the
+solar angles, the different components of solar radiation, wind speed and
+direction, temperature, precipitation, etc. All of this information is required
+in order for CARNOT's proper functioning. 
+
+Including all of this information into the \acrshort{gp}s exogenous inputs would
+come with a few downsides. First, depending on the number of lags chosen for the
+exogenous inputs, the number of inputs to the \acrshort{gp} could be very large,
+an exogenous inputs vector of 10 elements with 2 lags would yield 20 inputs for
+the \acrshort{gp} model. This is very computationally expensive both for
+training and using the model, as its algorithmic complexity is
+$\mathcal{O}(n^3)$.
+
+Second, this information may not always available in experimental
+implementations on real buildings, where legacy equipment might already be
+installed, or where budget restrictions call for simpler equipment.  An example
+of this are the experimental datasets used for validation of the CARNOT model,
+where the only available weather information is the \acrshort{ghi} and the
+measurement of the outside temperature. This would also be a limitation when
+getting the weather predictions for the next steps during real-world
+experiments.
+
+Last, while very verbose information such as the solar angles and the components
+of the solar radiation is very useful for CARNOT, which simulated each node
+individually, knowing their absolute positions, this information would not
+always benefit the \acrshort{gp} model, at least not comparably to the
+additional computational complexity.
+
+For the exogenous inputs the choice has therefore been made to take the
+\textit{Global Solar Irradiance} and \textit{Outside Temperature Measurement}.
 
 \subsection{The Kernel}
 
+The covariance matrix is an important choice when creating the \acrshort{gp}. A
+properly chosen kernel can impose a prior desired behaviour on the
+\acrshort{gp} such as continuity of the function an its derivatives,
+periodicity, linearity, etc. On the flip side, choosing the wrong kernel can
+make computations more expensive, require more data to learn the proper
+behaviour or outright be numerically instable and/or give erroneous predictions.
+
+The \acrlong{se} kernel (cf. Section~\ref{sec:Kernels}) is very versatile,
+theoretically being able to fit any continuous function given enough data. When
+including the \acrshort{ard} behaviour, it also gives an insight into the
+relative importance of each regressor, though their respective lengthscales.
+
+Many different kernels have been used when identifying models for building
+thermal control, such as a pure Rational Quadratic
+Kernel~\cite{pleweSupervisoryModelPredictive2020}, a combination of
+\acrshort{se}, \acrshort{rq} and a Linear
+Kernel~\cite{jainLearningControlUsing2018}, Squared Exponential Kernel and
+Kernels from the M\`atern family~\cite{massagrayThermalBuildingModelling2016}.
+
+For the purpose of this project the choice has been made to use the
+\textit{\acrlong{se} Kernel}, as it provides a very good balance of versatily
+and computational complexity for the modelling of the CARNOT building.
+
+\subsection{Lengthscales}\label{sec:lengthscales}
+
+The hyperparameters of the \acrshort{se} can be useful when studying the
+importance of regressors. The larger the distance between the two inputs, the
+less correlated they are. In fact, setting the kernel variance $\sigma^2$ = 1,
+we can compute the correlation of two inputs located at distance one, two and
+three lengthscales apart. 
+
+\begin{table}[ht]
+\centering
+    \begin{tabular}{||c c ||}
+        \hline
+        $\norm{\mathbf{x} - \mathbf{x}'}$ &
+        $\exp{(-\frac{1}{2}*\frac{\norm{\mathbf{x} - \mathbf{x}'}^2}{l^2})}$ \\
+        \hline \hline
+        $1l$ & 0.606 \\
+        $2l$ & 0.135 \\
+        $3l$ & 0.011 \\
+        \hline
+    \end{tabular}
+\caption{Correlation of inputs relative to their distance}
+\label{tab:se_correlation}
+\end{table}
+
+From Table~\ref{tab:se_correlation} is can be seen that at 3 lengthscales apart,
+the inputs are already almost uncorrelated. In order to better visualize this
+difference the value of relative lengthscale importance is introduced:
+
+\begin{equation}
+    \lambda = \frac{1}{\sqrt{l}}
+\end{equation}
+
+Another indicator of model behaviour is the variance of the identified
+\acrshort{se} kernel. The expected value of the variance is around the variance
+of the inputs. An extremenly high or extremely low value of the variance could
+mean a numerically unstable model.
+
+Table~\ref{tab:GP_hyperparameters} presents the relative lengthscale imporances
+and the variance for different combinations of the exogenous input lags ($l_w$),
+the controlled input lags ($l_u$) and the output lags ($l_y$) for a classical
+\acrshort{gp} model. 
+
+\begin{table}[ht]
+%\vspace{-8pt}
+\centering
+    \resizebox{\columnwidth}{!}{%
+    \begin{tabular}{||c c c|c|c c c c c c c c c c c||}
+        \hline
+        \multicolumn{3}{||c|}{Lags} & Variance &\multicolumn{11}{c||}{Kernel
+        lengthscales relative importance} \\
+            $l_w$ & $l_u$ & $l_y$ & $\sigma^2$ &$\lambda_{w1,1}$ & $\lambda_{w1,2}$ &
+            $\lambda_{w1,3}$ & $\lambda_{w2,1}$ & $\lambda_{w2,2}$ &
+            $\lambda_{w2,3}$ & $\lambda_{u1,1}$ & $\lambda_{u1,2}$ &
+            $\lambda_{y1,1}$ & $\lambda_{y1,2}$ & $\lambda_{y1,3}$\\
+        \hline \hline
+        1 & 1 & 1 & 0.11 & 0.721 &  &  & 2.633 &  &  & 0.569 &  & 2.645 &  &  \\
+        1 & 1 & 2 & 22.68 & 0.222 &  &  & 0.751 &  &  & 0.134 &  & 3.154 & 3.073
+          &  \\ 1 & 1 & 3 & 0.29 & 0.294 &  &  & 1.303 &  &  & 0.356 &  & 2.352
+          & 1.361 & 2.045 \\ 1 & 2 & 1 & 7.55 & 0.157 &  &  & 0.779 &  &  &
+        0.180 & 0.188 & 0.538 &  &  \\ 1 & 2 & 3 & 22925.40 & 0.018 &  &  &
+        0.053 &  &  & 0.080 & 0.393 & 0.665 & 0.668 & 0.018 \\ 2 & 1 & 2 & 31.53
+              & 0.010 & 0.219 &  & 0.070 & 0.719 &  & 0.123 &  & 3.125 & 3.044 &
+              \\ 2 & 1 & 3 & 0.44 & 0.007 & 0.251 &  & 0.279 & 1.229 &  & 0.319
+                   &  & 2.705 & 1.120 & 2.510 \\ 3 & 1 & 3 & 0.56 & 0.046 &
+              0.064 & 0.243 & 0.288 & 1.151 & 0.233 & 0.302 &  & 2.809 & 1.086 &
+              2.689 \\ 3 & 2 & 2 & 1.65 & 0.512 & 0.074 & 0.201 & 0.161 & 1.225
+                         & 0.141 & 0.231 & 0.331 & 0.684 & 0.064 &  \\
+        \hline
+    \end{tabular}%
+    }
+\caption{GP hyperparameter values for different autoregressive lags}
+\label{tab:GP_hyperparameters}
+\end{table}
+
+In general, the results of Table~\ref{tab:GP_hyperparameters} show that the
+past outputs are important when predicting future values. Of importance is also
+the past inputs, with the exception of the models with very high variance, where
+the relative importances stay almost constant accross all the inputs. For the
+exogenous inputs, the outside temperature ($w2$) is generally more important
+than the solar irradiation ($w1$). In the case of more autoregressive lags for
+the exogenous inputs, the more recent information is usually more important,
+with a few exceptions {\color{red} Continue this sentence after considering the
+2/1/3 classical GP model}
+
+% TODO: [Hyperparameters] Classical GP parameters choice
+
+
+As for the case of the \acrlong{svgp}, the results for the classical
+\acrshort{gp} (cf. Table~\ref{tab:GP_hyperparameters}) are not necessarily
+representative of the relationships between the regressors of the
+\acrshort{svgp} model due to the fact that the dataset used for training is
+composed of the \textit{inducing variables}, which are not the real data, but a
+set of parameters chosen by the training algorithm in a way to best generate the
+original data.
+
+Therefore to better understand the behaviour of the \acrshort{svgp} models, the
+same computations as in Table~\ref{tab:GP_hyperparameters} have been made,
+presented in Table~\ref{tab:SVGP_hyperparameters}:
+
+\begin{table}[ht]
+%\vspace{-8pt}
+\centering
+    \resizebox{\columnwidth}{!}{%
+    \begin{tabular}{||c c c|c|c c c c c c c c c c c||}
+        \hline
+        \multicolumn{3}{||c|}{Lags} & Variance &\multicolumn{11}{c||}{Kernel
+        lengthscales relative importance} \\
+            $l_w$ & $l_u$ & $l_y$ & $\sigma^2$ &$\lambda_{w1,1}$ & $\lambda_{w1,2}$ &
+            $\lambda_{w1,3}$ & $\lambda_{w2,1}$ & $\lambda_{w2,2}$ &
+            $\lambda_{w2,3}$ & $\lambda_{u1,1}$ & $\lambda_{u1,2}$ &
+            $\lambda_{y1,1}$ & $\lambda_{y1,2}$ & $\lambda_{y1,3}$\\
+        \hline \hline
+                1 & 1 & 1 & 0.2970 & 0.415 &  &  & 0.748 &  &  & 0.675 &  & 0.680 &  &  \\
+        1 & 1 & 2 & 0.2717 & 0.430 &  &  & 0.640 &  &  & 0.687 &  & 0.559 & 0.584 &  \\
+        1 & 1 & 3 & 0.2454 & 0.455 &  &  & 0.589 &  &  & 0.671 &  & 0.522 & 0.512 & 0.529 \\
+        1 & 2 & 1 & 0.2593 & 0.310 &  &  & 0.344 &  &  & 0.534 & 0.509 & 0.597 &  &  \\
+        1 & 2 & 3 & 0.2139 & 0.330 &  &  & 0.368 &  &  & 0.537 & 0.447 & 0.563 & 0.410 & 0.363 \\
+        2 & 1 & 2 & 0.2108 & 0.421 & 0.414 &  & 0.519 & 0.559 &  & 0.680 &  & 0.525 & 0.568 &  \\
+        2 & 1 & 3 & 0.1795 & 0.456 & 0.390 &  & 0.503 & 0.519 &  & 0.666 &  & 0.508 & 0.496 & 0.516 \\
+        3 & 1 & 3 & 0.1322 & 0.432 & 0.370 & 0.389 & 0.463 & 0.484 & 0.491 & 0.666 &  & 0.511 & 0.501 & 0.526 \\
+        3 & 2 & 2 & 0.1228 & 0.329 & 0.317 & 0.325 & 0.334 & 0.337 & 0.331 &
+        0.527 & 0.441 & 0.579 & 0.435 &  \\
+        \hline
+    \end{tabular}%
+    }
+\caption{SVGP hyperparameter values for different autoregressive lags}
+\label{tab:SVGP_hyperparameters}
+\end{table}
+
+The results of Table~\ref{tab:SVGP_hyperparameters} are not very suprising, even
+if very different from the classical \acrshort{gp} case. The kernel variance is
+always of a reasonable value, and the relative importance of the lengthscales is
+relatively constant accross the board. It is certainly harder to interpret these
+results as pertaining to the relevance of the chosen regressors. For the
+\acrshort{svgp} model, the choice of the autoregressive lags has been made
+purely on the values of the loss functions, presented in
+Table~\ref{tab:SVGP_loss_functions}.
+
 \subsection{Loss functions}
 
 The most important metric for measuring the performance of a model is the value
@@ -84,14 +301,18 @@ mean of the obtained result:
 The \acrshort{msll} is approximately zero for simple models and negative for
 better ones.
 
-% TODO: [Hyperparameters] Explain loss table, difference in lags, etc.
+Table~\ref{tab:GP_loss_functions} and Table~\ref{tab:SVGP_loss_functions}
+present the values of the different loss functions for the same lag combinations
+as the ones analyzed in Section~\ref{sec:lengthscales} for the classical
+\acrshort{gp} and the \acrshort{svgp} models respectively: 
+
 \begin{table}[ht]
 %\vspace{-8pt}
 \centering
     \begin{tabular}{||c c c|c c c c||}
         \hline
         \multicolumn{3}{||c|}{Lags} & \multicolumn{4}{c||}{Loss functions}\\
-        w & u & y & RMSE & SMSE & MSLL & LPD\\
+        $l_w$ & $l_u$ & $l_y$ & RMSE & SMSE & MSLL & LPD\\
         \hline \hline
         1 & 1 & 1 & 0.3464 & 0.36394 & 20.74 & 21.70 \\
         1 & 1 & 2 & 0.1415 & 0.06179 & -9.62 & -8.67 \\
@@ -108,39 +329,24 @@ better ones.
 \label{tab:GP_loss_functions}
 \end{table}
 
-
-\begin{table}[ht]
-%\vspace{-8pt}
-\centering
-    \resizebox{\columnwidth}{!}{%
-    \begin{tabular}{||c c c|c|c c c c c c c c c c c||}
-        \hline
-        \multicolumn{3}{||c|}{Lags} & Variance &\multicolumn{11}{c||}{Kernel
-        lengthscales relative importance} \\
-            w & u & y & $\sigma^2$ &$\lambda_{w1,1}$ & $\lambda_{w1,2}$ &
-            $\lambda_{w1,3}$ & $\lambda_{w2,1}$ & $\lambda_{w2,2}$ &
-            $\lambda_{w2,3}$ & $\lambda_{u1,1}$ & $\lambda_{u1,2}$ &
-            $\lambda_{y1,1}$ & $\lambda_{y1,2}$ & $\lambda_{y1,3}$\\
-        \hline \hline
-        1 & 1 & 1 & 0.11 & 0.721 &  &  & 2.633 &  &  & 0.569 &  & 2.645 &  &  \\
-        1 & 1 & 2 & 22.68 & 0.222 &  &  & 0.751 &  &  & 0.134 &  & 3.154 & 3.073
-          &  \\ 1 & 1 & 3 & 0.29 & 0.294 &  &  & 1.303 &  &  & 0.356 &  & 2.352
-          & 1.361 & 2.045 \\ 1 & 2 & 1 & 7.55 & 0.157 &  &  & 0.779 &  &  &
-        0.180 & 0.188 & 0.538 &  &  \\ 1 & 2 & 3 & 22925.40 & 0.018 &  &  &
-        0.053 &  &  & 0.080 & 0.393 & 0.665 & 0.668 & 0.018 \\ 2 & 1 & 2 & 31.53
-              & 0.010 & 0.219 &  & 0.070 & 0.719 &  & 0.123 &  & 3.125 & 3.044 &
-              \\ 2 & 1 & 3 & 0.44 & 0.007 & 0.251 &  & 0.279 & 1.229 &  & 0.319
-                   &  & 2.705 & 1.120 & 2.510 \\ 3 & 1 & 3 & 0.56 & 0.046 &
-              0.064 & 0.243 & 0.288 & 1.151 & 0.233 & 0.302 &  & 2.809 & 1.086 &
-              2.689 \\ 3 & 2 & 2 & 1.65 & 0.512 & 0.074 & 0.201 & 0.161 & 1.225
-                         & 0.141 & 0.231 & 0.331 & 0.684 & 0.064 &  \\
-        \hline
-    \end{tabular}%
-    }
-\caption{GP hyperparameter values for different autoregressive lags}
-\label{tab:GP_hyperparameters}
-\end{table}
-
+For the classical \acrshort{gp} model (cf. Table~\ref{tab:GP_loss_functions}) a
+number of different lag combinations give rise to models with very large
+\acrshort{msll}/\acrshort{lpd} values. This might indicate that those models are
+overconfident, either due to the very large kernel variance parameter, or the
+specific lengthscales combinations. The model with the best
+\acrshort{rmse}/\acrshort{smse} metrics $\mathcal{M}$($l_w = 1$, $l_u = 2$, $l_y
+= 3$) had very bad \acrshort{msll} and \acrshort{lpd} metrics, as well as by far
+the largest variance of all the combinations. On the contrary the
+$\mathcal{M}$($l_w = 3$, $l_u = 1$, $l_y = 3$) model has the best
+\acrshort{msll} and \acrshort{lpd} performance, while still maintaining small
+\acrshort{rmse} and \acrshort{smse} values. The inconvenience of this set of
+lags is the large number of regressors, which leads to much more expensive
+computations. Other good choices for the combinations of lags are
+$\mathcal{M}$($l_w = 2$, $l_u = 1$, $l_y = 3$) and $\mathcal{M}$($l_w = 1$, $l_u
+= 1$, $l_y = 3$), which have good performance on all four metrics, as well as
+being cheaper from a computational perspective. In order to make a more informed
+choice for the best hyperparamerers, the performance of all three combinations
+has been analysed.
 
 \begin{table}[ht]
 %\vspace{-8pt}
@@ -148,7 +354,7 @@ better ones.
     \begin{tabular}{||c c c|c c c c||}
         \hline
         \multicolumn{3}{||c|}{Lags} & \multicolumn{4}{c||}{Loss functions}\\
-        w & u & y & RMSE & SMSE & MSLL & LPD\\
+        $l_w$ & $l_u$ & $l_y$ & RMSE & SMSE & MSLL & LPD\\
         \hline \hline
         1 & 1 & 1 & 0.3253 & 0.3203 & 228.0278 & 228.9843 \\
         1 & 1 & 2 & 0.2507 & 0.1903 & 175.5525 & 176.5075 \\
@@ -166,42 +372,21 @@ better ones.
 \label{tab:SVGP_loss_functions}
 \end{table}
 
-\begin{table}[ht]
-%\vspace{-8pt}
-\centering
-    \resizebox{\columnwidth}{!}{%
-    \begin{tabular}{||c c c|c|c c c c c c c c c c c||}
-        \hline
-        \multicolumn{3}{||c|}{Lags} & Variance &\multicolumn{11}{c||}{Kernel
-        lengthscales relative importance} \\
-            w & u & y & $\sigma^2$ &$\lambda_{w1,1}$ & $\lambda_{w1,2}$ &
-            $\lambda_{w1,3}$ & $\lambda_{w2,1}$ & $\lambda_{w2,2}$ &
-            $\lambda_{w2,3}$ & $\lambda_{u1,1}$ & $\lambda_{u1,2}$ &
-            $\lambda_{y1,1}$ & $\lambda_{y1,2}$ & $\lambda_{y1,3}$\\
-        \hline \hline
-                1 & 1 & 1 & 0.2970 & 0.415 &  &  & 0.748 &  &  & 0.675 &  & 0.680 &  &  \\
-        1 & 1 & 2 & 0.2717 & 0.430 &  &  & 0.640 &  &  & 0.687 &  & 0.559 & 0.584 &  \\
-        1 & 1 & 3 & 0.2454 & 0.455 &  &  & 0.589 &  &  & 0.671 &  & 0.522 & 0.512 & 0.529 \\
-        1 & 2 & 1 & 0.2593 & 0.310 &  &  & 0.344 &  &  & 0.534 & 0.509 & 0.597 &  &  \\
-        1 & 2 & 3 & 0.2139 & 0.330 &  &  & 0.368 &  &  & 0.537 & 0.447 & 0.563 & 0.410 & 0.363 \\
-        2 & 1 & 2 & 0.2108 & 0.421 & 0.414 &  & 0.519 & 0.559 &  & 0.680 &  & 0.525 & 0.568 &  \\
-        2 & 1 & 3 & 0.1795 & 0.456 & 0.390 &  & 0.503 & 0.519 &  & 0.666 &  & 0.508 & 0.496 & 0.516 \\
-        3 & 1 & 3 & 0.1322 & 0.432 & 0.370 & 0.389 & 0.463 & 0.484 & 0.491 & 0.666 &  & 0.511 & 0.501 & 0.526 \\
-        3 & 2 & 2 & 0.1228 & 0.329 & 0.317 & 0.325 & 0.334 & 0.337 & 0.331 &
-        0.527 & 0.441 & 0.579 & 0.435 &  \\
-        \hline
-    \end{tabular}%
-    }
-\caption{SVGP hyperparameter values for different autoregressive lags}
-\label{tab:SVGP_hyperparameters}
-\end{table}
+The results for the \acrshort{svgp} model, presented in
+Table~\ref{tab:SVGP_loss_functions} are much less ambiguous. The
+$\mathcal{M}$($l_w = 1$, $l_u = 2$, $l_y = 3$) model has the best performance
+according to all four metrics, with most of the other combinations scoring much
+worse on the \acrshort{msll} and \acrshort{lpd} loss functions. This has
+therefore been chosen as the model for the full year simulations.
 
-\clearpage
 
 \subsection{Validation of hyperparameters}
 
 % TODO: [Hyperparameters] Validation of hyperparameters
 
+The validation of model parameters has the dual purpose of 
+
+
 \subsubsection{Conventional Gaussian Process}
 
 \begin{figure}[ht]
@@ -222,7 +407,7 @@ better ones.
 \begin{figure}[ht]
     \centering
     \includegraphics[width =
-    \textwidth]{Plots/GP_113_-1pts_test_prediction_20_steps.png}
+    \textwidth]{Plots/GP_113_-1pts_test_prediction_20_steps.pdf}
     \caption{}
     \label{fig:GP_multistep_validation}
 \end{figure}
@@ -248,7 +433,7 @@ better ones.
 \begin{figure}[ht]
     \centering
     \includegraphics[width =
-    \textwidth]{Plots/SVGP_123_test_prediction_20_steps.png}
+    \textwidth]{Plots/SVGP_123_test_prediction_20_steps.pdf}
     \caption{}
     \label{fig:SVGP_multistep_validation}
 \end{figure}
diff --git a/60_The_MPC_Problem.tex b/60_The_MPC_Problem.tex
index 525c30a..0ef5128 100644
--- a/60_The_MPC_Problem.tex
+++ b/60_The_MPC_Problem.tex
@@ -1,6 +1,6 @@
-\section{The MPC Problem}
+\section{The MPC Problem}\label{sec:mpc_problem}
 
-The Optimal Control Problem to be solved was chosen in such a way as to make
+The \acrlong{ocp} to be solved was chosen in such a way as to make
 analysis of the models' performances more straightforward. The objective is
 tracking a defined reference temperature as close as possible, while ensuring
 the heat input stays within the HVAC capacity. The \textit{zero-variance} method
diff --git a/70_Implementation.tex b/70_Implementation.tex
index 73b3e2c..5e82442 100644
--- a/70_Implementation.tex
+++ b/70_Implementation.tex
@@ -1,5 +1,14 @@
 \section{Implementation}
 
+
+\begin{figure}[ht]
+    \centering
+    \includegraphics[width = 0.5\textwidth]{Images/setup_diagram.pdf}
+    \caption{Block diagram of the Simulink plant and Python Controller}
+    \label{fig:setup_diagram}
+\end{figure}
+
+
 % TODO: [Implementation] Reference implementation details for CARNOT and WDB
 
 \subsection{Gaussian Processes}
@@ -12,10 +21,7 @@
 
 
 \subsection{Optimal Control Problem}
-% TODO: [Implementation] Cite CasADi
-% TODO: [Implementation] Cite HSL solvers for using MA27
-
-\subsection{Sparse Implementation of the Optimization Problem}
+\subsubsection{Sparse Implementation of the Optimization Problem}
 
 The optimization problem as presented in
 Equation~\ref{eq:optimal_control_problem} becomes very nonlinear quite fast. In
@@ -48,6 +54,11 @@ $\mathbf{w}$, $\mathbf{u}$, $\mathbf{y}$ (cf. Equation~\ref{eq:components}).
 where X is the matrix of all the system states and W is the matrix of the
 disturbances.
 
+\subsubsection{RENAME: Python implementation of the control problem}
+% TODO: [Implementation] Cite CasADi
+% TODO: [Implementation] Cite HSL solvers for using MA27
+
+
 \subsection{Python server and controller objects}
 
 
diff --git a/99A_GP_hyperparameters_validation.tex b/99A_GP_hyperparameters_validation.tex
new file mode 100644
index 0000000..93c61e3
--- /dev/null
+++ b/99A_GP_hyperparameters_validation.tex
@@ -0,0 +1,60 @@
+\clearpage
+
+\section{Hyperparameters validation for classical GP}
+
+\subsection{213}
+
+\begin{figure}[ht]
+    \centering
+    \includegraphics[width = \textwidth]{Plots/GP_213_training_performance.pdf}
+    \caption{}
+    \label{fig:GP_213_train_validation}
+\end{figure}
+
+\begin{figure}[ht]
+    \centering
+    \includegraphics[width = \textwidth]{Plots/GP_213_test_performance.pdf}
+    \caption{}
+    \label{fig:GP_213_test_validation}
+\end{figure}
+
+
+\begin{figure}[ht]
+    \centering
+    \includegraphics[width =
+    \textwidth]{Plots/GP_213_-1pts_test_prediction_20_steps.pdf}
+    \caption{}
+    \label{fig:GP_213_multistep_validation}
+\end{figure}
+
+
+\clearpage
+
+\subsection{313}
+
+
+\begin{figure}[ht]
+    \centering
+    \includegraphics[width = \textwidth]{Plots/GP_313_training_performance.pdf}
+    \caption{}
+    \label{fig:GP_313_train_validation}
+\end{figure}
+
+\begin{figure}[ht]
+    \centering
+    \includegraphics[width = \textwidth]{Plots/GP_313_test_performance.pdf}
+    \caption{}
+    \label{fig:GP_313_test_validation}
+\end{figure}
+
+
+\begin{figure}[ht]
+    \centering
+    \includegraphics[width =
+    \textwidth]{Plots/GP_313_-1pts_test_prediction_20_steps.pdf}
+    \caption{}
+    \label{fig:GP_313_multistep_validation}
+\end{figure}
+
+
+\clearpage
diff --git a/Images/setup_diagram.pdf b/Images/setup_diagram.pdf
new file mode 100644
index 0000000..2bcd3c0
Binary files /dev/null and b/Images/setup_diagram.pdf differ
diff --git a/Plots/GP_113_-1pts_test_prediction_20_steps.pdf b/Plots/GP_113_-1pts_test_prediction_20_steps.pdf
new file mode 100644
index 0000000..a268f57
Binary files /dev/null and b/Plots/GP_113_-1pts_test_prediction_20_steps.pdf differ
diff --git a/Plots/GP_113_test_performance.pdf b/Plots/GP_113_test_performance.pdf
index 2c8959d..de17f1c 100644
Binary files a/Plots/GP_113_test_performance.pdf and b/Plots/GP_113_test_performance.pdf differ
diff --git a/Plots/GP_113_training_performance.pdf b/Plots/GP_113_training_performance.pdf
index 232ab91..8636ba9 100644
Binary files a/Plots/GP_113_training_performance.pdf and b/Plots/GP_113_training_performance.pdf differ
diff --git a/Plots/GP_213_-1pts_test_prediction_20_steps.pdf b/Plots/GP_213_-1pts_test_prediction_20_steps.pdf
new file mode 100644
index 0000000..6b0db5c
Binary files /dev/null and b/Plots/GP_213_-1pts_test_prediction_20_steps.pdf differ
diff --git a/Plots/GP_213_test_performance.pdf b/Plots/GP_213_test_performance.pdf
new file mode 100644
index 0000000..db9ae42
Binary files /dev/null and b/Plots/GP_213_test_performance.pdf differ
diff --git a/Plots/GP_213_training_performance.pdf b/Plots/GP_213_training_performance.pdf
new file mode 100644
index 0000000..e9b7d8a
Binary files /dev/null and b/Plots/GP_213_training_performance.pdf differ
diff --git a/Plots/GP_313_-1pts_test_prediction_20_steps.pdf b/Plots/GP_313_-1pts_test_prediction_20_steps.pdf
new file mode 100644
index 0000000..62b267b
Binary files /dev/null and b/Plots/GP_313_-1pts_test_prediction_20_steps.pdf differ
diff --git a/Plots/GP_313_test_performance.pdf b/Plots/GP_313_test_performance.pdf
new file mode 100644
index 0000000..13a7559
Binary files /dev/null and b/Plots/GP_313_test_performance.pdf differ
diff --git a/Plots/GP_313_training_performance.pdf b/Plots/GP_313_training_performance.pdf
new file mode 100644
index 0000000..6362139
Binary files /dev/null and b/Plots/GP_313_training_performance.pdf differ
diff --git a/Plots/GP_training_performance.pdf b/Plots/GP_training_performance.pdf
new file mode 100644
index 0000000..7cd92fc
Binary files /dev/null and b/Plots/GP_training_performance.pdf differ
diff --git a/Plots/SVGP_123_test_performance.pdf b/Plots/SVGP_123_test_performance.pdf
index 2c4dc32..0a157fb 100644
Binary files a/Plots/SVGP_123_test_performance.pdf and b/Plots/SVGP_123_test_performance.pdf differ
diff --git a/Plots/SVGP_123_test_prediction_20_steps.pdf b/Plots/SVGP_123_test_prediction_20_steps.pdf
new file mode 100644
index 0000000..f6fac3a
Binary files /dev/null and b/Plots/SVGP_123_test_prediction_20_steps.pdf differ
diff --git a/Plots/SVGP_123_test_prediction_20_steps.png b/Plots/SVGP_123_test_prediction_20_steps.png
deleted file mode 100644
index 5690002..0000000
Binary files a/Plots/SVGP_123_test_prediction_20_steps.png and /dev/null differ
diff --git a/Plots/SVGP_123_training_performance.pdf b/Plots/SVGP_123_training_performance.pdf
index 21966e6..0041b89 100644
Binary files a/Plots/SVGP_123_training_performance.pdf and b/Plots/SVGP_123_training_performance.pdf differ
diff --git a/glossaries.tex b/glossaries.tex
index c909098..85db4cf 100644
--- a/glossaries.tex
+++ b/glossaries.tex
@@ -4,6 +4,7 @@
 
 % Acronyms
 
+\newacronym{hvac}{HVAC}{Heating and Ventilation System}
 \newacronym{dni}{DNI}{Direct Normal Irradiance}
 \newacronym{dhi}{DHI}{Diffuse Horizontal Irradiance}
 \newacronym{ghi}{GHI}{Global Horizontal Irradiance}
@@ -33,3 +34,5 @@
 \newacronym{noe}{NOE}{Nonlinear output error}
 \newacronym{narmax}{NARMAX}{Nonlinear autoregressive and moving average model
 with exogenous input}
+
+\newacronym{ocp}{OCP}{Optimal Control Problem}
diff --git a/main.tex b/main.tex
index 0ee7807..b444bb7 100644
--- a/main.tex
+++ b/main.tex
@@ -136,4 +136,6 @@
 \input{90_Further_Research.tex}
 \input{95_Conclusion.tex}
 \printbibliography
+\appendix
+\input{99A_GP_hyperparameters_validation.tex}
 \end{document}
diff --git a/references.bib b/references.bib
index 0bf016f..c7a0229 100644
--- a/references.bib
+++ b/references.bib
@@ -290,6 +290,24 @@
   langid = {german}
 }
 
+@article{massagrayThermalBuildingModelling2016,
+  title = {Thermal Building Modelling Using {{Gaussian}} Processes},
+  author = {Massa Gray, Francesco and Schmidt, Michael},
+  date = {2016-05-01},
+  journaltitle = {Energy and Buildings},
+  shortjournal = {Energy and Buildings},
+  volume = {119},
+  pages = {119--128},
+  issn = {0378-7788},
+  doi = {10.1016/j.enbuild.2016.02.004},
+  url = {sciencedirect.com/science/article/pii/S0378778816300494},
+  urldate = {2021-06-19},
+  abstract = {This paper analyzes the suitability of Gaussian processes for thermal building modelling by comparing the day-ahead prediction error of the internal air temperature with a grey-box model. The reference building is a single-zone office with a hydronic heating system, modelled in TRNSYS and simulated during the winter and spring periods. Using the output data of the reference building, the parameters of a Gaussian process and of a physics-based grey-box model are identified, with training periods ranging from three days to six weeks. After three weeks of training, the Gaussian processes achieve 27\% lower prediction errors during occupied times compared to the grey-box model. During unoccupied times, however, the Gaussian processes perform consistently worse than the grey-box model. This is due to their large generalization error, especially when faced with untrained ambient temperature values. To reduce the impact of changing weather conditions, adaptive training is applied to the Gaussian processes. When re-training the models every 24h, the prediction error is reduced over 21\% during unoccupied times and over 10\% during occupied times compared to the non-adaptive training case. These results show that the proposed Gaussian process model can correctly describe a building's thermal dynamics. However, in its current form the model is limited to applications where the prediction during occupied times is more relevant.},
+  file = {/home/radu/Zotero/storage/EQNPYPBA/Massa Gray and Schmidt - 2016 - Thermal building modelling using Gaussian processe.pdf},
+  keywords = {Building modelling,Gaussian processes,GP,Grey-box,HVAC,Simulation},
+  langid = {english}
+}
+
 @article{matthewsGPflowGaussianProcess2017,
   title = {{{GPflow}}: {{A Gaussian Process Library}} Using {{TensorFlow}}},
   shorttitle = {{{GPflow}}},