\section{Choice of Hyperparameters} This section will discuss and try to validate the choice of all the hyperparameters necessary for the training of a \acrshort{gp} model to capture the CARNOT building's behaviour. As described in Section~\ref{sec:gp_dynamical_system}, for the purpose of this project the \acrlong{gp} model will be trained using the \acrshort{narx} structure. \subsection{Lengthscales} \subsection{The Kernel} \subsection{Loss functions} The most important metric for measuring the performance of a model is the value of the loss function, computed on a dataset separate from the one used for training. There exist a number of different loss functions, each focusing on different aspects of a model's performance. A selection of loss functions used in identification of Gaussian Process models is presented below~\cite{kocijanModellingControlDynamic2016}. The \acrfull{rmse} is a very commonly used performance measure. As the name suggests, it computes the root of the mean squared error: \begin{equation}\label{eq:rmse} \text{RMSE} = \sqrt{\frac{1}{N}\sum_{i=1}^N \left(y_i - E(\hat{y}_i)\right)^{2}} \end{equation} This performance metric is very useful when training a model whose goal is solely to minimize the difference between the values measured, and the ones predicted by the model. A variant of the \acrshort{mse} is the \acrfull{smse}, which normalizes the \acrlong{mse} by the variance of the output values of the validation dataset. \begin{equation}\label{eq:smse} \text{SMSE} = \frac{1}{N}\frac{\sum_{i=1}^N \left(y_i - E(\hat{y}_i)\right)^{2}}{\sigma_y^2} \end{equation} While the \acrshort{rmse} and the \acrshort{smse} are very good at ensuring the predicted mean value of the Gaussian Process is close to the measured values of the validation dataset, the confidence of the Gaussian Process prediction is completely ignored. In this case two models predicting the same mean values, but having very differnt confidence intervals would be equivalent according to these performance metrics. The \acrfull{lpd} is a performance metric which takes into account not only the the mean value of the GP prediction, but the entire distribution: \begin{equation} \text{LPD} = \frac{1}{2} \ln{\left(2\pi\right)} + \frac{1}{2N} \sum_{i=1}^N\left(\ln{\left(\sigma_i^2\right)} + \frac{\left(y_i - E(\hat{y}_i)\right)^{2}}{\sigma_i^2}\right) \end{equation} where $\sigma_i^2$ is the model's output variance at the \textit{i}-th step. The \acrshort{lpd} scales the error of the mean value prediction $\left(y_i - E(\hat{y}_i)\right)^{2}$ by the variance $\sigma_i^2$. This means that the overconfident models get penalized more than the more conservative models for the same mean prediction error, leading to models that better represent the real system. The \acrfull{msll} is obtained by substacting the loss of the model that predicts using a Gaussian with the mean $E(\boldsymbol{y})$ and variance $\sigma_y^2$ of the measured data from the model \acrshort{lpd} and taking the mean of the obtained result: \begin{equation} \text{MSLL} = \frac{1}{2N}\sum_{i=1}^N\left[ \ln{\left(\sigma_i^2\right) + \frac{\left(y_i - E\left(\hat{y}_i\right)\right)^2}{\sigma_i^2}} \right] - \frac{1}{2N}\sum_{i=1}^N\left[ \ln{\left(\sigma_y^2\right) + \frac{\left(y_i - E\left(\boldsymbol{y}\right)\right)^2}{\sigma_y^2}} \right] \end{equation} The \acrshort{msll} is approximately zero for simple models and negative for better ones. % TODO: [Hyperparameters] Explain loss table, difference in lags, etc. \begin{table}[ht] %\vspace{-8pt} \centering \begin{tabular}{||c c c|c c c c||} \hline \multicolumn{3}{||c|}{Lags} & \multicolumn{4}{c||}{Loss functions}\\ w & u & y & RMSE & SMSE & MSLL & LPD\\ \hline \hline 1 & 1 & 1 & 0.3464 & 0.36394 & 20.74 & 21.70 \\ 1 & 1 & 2 & 0.1415 & 0.06179 & -9.62 & -8.67 \\ 1 & 1 & 3 & 0.0588 & 0.01066 & -8.99 & -8.03 \\ 1 & 2 & 1 & 0.0076 & 0.00017 & 71.83 & 72.79 \\ 1 & 2 & 3 & \textbf{0.0041} & \textbf{0.00005} & 31.25 & 32.21 \\ 2 & 1 & 2 & 0.1445 & 0.06682 & -9.57 & -8.61 \\ 2 & 1 & 3 & 0.0797 & 0.02033 & -10.94 & -9.99 \\ 3 & 1 & 3 & 0.0830 & 0.02219 & \textbf{-11.48} & \textbf{-10.53} \\ 3 & 2 & 2 & 0.0079 & 0.00019 & 58.30 & 59.26 \\ \hline \end{tabular} \caption{GP Loss function values for different autoregressive lags} \label{tab:GP_loss_functions} \end{table} \begin{table}[ht] %\vspace{-8pt} \centering \resizebox{\columnwidth}{!}{% \begin{tabular}{||c c c|c|c c c c c c c c c c c||} \hline \multicolumn{3}{||c|}{Lags} & Variance &\multicolumn{11}{c||}{Kernel lengthscales relative importance} \\ w & u & y & $\sigma^2$ &$\lambda_{w1,1}$ & $\lambda_{w1,2}$ & $\lambda_{w1,3}$ & $\lambda_{w2,1}$ & $\lambda_{w2,2}$ & $\lambda_{w2,3}$ & $\lambda_{u1,1}$ & $\lambda_{u1,2}$ & $\lambda_{y1,1}$ & $\lambda_{y1,2}$ & $\lambda_{y1,3}$\\ \hline \hline 1 & 1 & 1 & 0.11 & 0.721 & & & 2.633 & & & 0.569 & & 2.645 & & \\ 1 & 1 & 2 & 22.68 & 0.222 & & & 0.751 & & & 0.134 & & 3.154 & 3.073 & \\ 1 & 1 & 3 & 0.29 & 0.294 & & & 1.303 & & & 0.356 & & 2.352 & 1.361 & 2.045 \\ 1 & 2 & 1 & 7.55 & 0.157 & & & 0.779 & & & 0.180 & 0.188 & 0.538 & & \\ 1 & 2 & 3 & 22925.40 & 0.018 & & & 0.053 & & & 0.080 & 0.393 & 0.665 & 0.668 & 0.018 \\ 2 & 1 & 2 & 31.53 & 0.010 & 0.219 & & 0.070 & 0.719 & & 0.123 & & 3.125 & 3.044 & \\ 2 & 1 & 3 & 0.44 & 0.007 & 0.251 & & 0.279 & 1.229 & & 0.319 & & 2.705 & 1.120 & 2.510 \\ 3 & 1 & 3 & 0.56 & 0.046 & 0.064 & 0.243 & 0.288 & 1.151 & 0.233 & 0.302 & & 2.809 & 1.086 & 2.689 \\ 3 & 2 & 2 & 1.65 & 0.512 & 0.074 & 0.201 & 0.161 & 1.225 & 0.141 & 0.231 & 0.331 & 0.684 & 0.064 & \\ \hline \end{tabular}% } \caption{GP hyperparameter values for different autoregressive lags} \label{tab:GP_hyperparameters} \end{table} \begin{table}[ht] %\vspace{-8pt} \centering \begin{tabular}{||c c c|c c c c||} \hline \multicolumn{3}{||c|}{Lags} & \multicolumn{4}{c||}{Loss functions}\\ w & u & y & RMSE & SMSE & MSLL & LPD\\ \hline \hline 1 & 1 & 1 & 0.3253 & 0.3203 & 228.0278 & 228.9843 \\ 1 & 1 & 2 & 0.2507 & 0.1903 & 175.5525 & 176.5075 \\ 1 & 1 & 3 & 0.1983 & 0.1192 & 99.7735 & 100.7268 \\ 1 & 2 & 1 & 0.0187 & 0.0012 & -9.5386 & -8.5836 \\ 1 & 2 & 3 & \textbf{0.0182} & \textbf{0.0011} & \textbf{-10.2739} & \textbf{-9.3206} \\ 2 & 1 & 2 & 0.2493 & 0.1884 & 165.0734 & 166.0284 \\ 2 & 1 & 3 & 0.1989 & 0.1200 & 103.6753 & 104.6287 \\ 3 & 1 & 3 & 0.2001 & 0.1214 & 104.4147 & 105.3681 \\ 3 & 2 & 2 & 0.0206 & 0.0014 & -9.9360 & -8.9826 \\ \hline \end{tabular} \caption{SVGP Loss function values for different autoregressive lags} \label{tab:SVGP_loss_functions} \end{table} \begin{table}[ht] %\vspace{-8pt} \centering \resizebox{\columnwidth}{!}{% \begin{tabular}{||c c c|c|c c c c c c c c c c c||} \hline \multicolumn{3}{||c|}{Lags} & Variance &\multicolumn{11}{c||}{Kernel lengthscales relative importance} \\ w & u & y & $\sigma^2$ &$\lambda_{w1,1}$ & $\lambda_{w1,2}$ & $\lambda_{w1,3}$ & $\lambda_{w2,1}$ & $\lambda_{w2,2}$ & $\lambda_{w2,3}$ & $\lambda_{u1,1}$ & $\lambda_{u1,2}$ & $\lambda_{y1,1}$ & $\lambda_{y1,2}$ & $\lambda_{y1,3}$\\ \hline \hline 1 & 1 & 1 & 0.2970 & 0.415 & & & 0.748 & & & 0.675 & & 0.680 & & \\ 1 & 1 & 2 & 0.2717 & 0.430 & & & 0.640 & & & 0.687 & & 0.559 & 0.584 & \\ 1 & 1 & 3 & 0.2454 & 0.455 & & & 0.589 & & & 0.671 & & 0.522 & 0.512 & 0.529 \\ 1 & 2 & 1 & 0.2593 & 0.310 & & & 0.344 & & & 0.534 & 0.509 & 0.597 & & \\ 1 & 2 & 3 & 0.2139 & 0.330 & & & 0.368 & & & 0.537 & 0.447 & 0.563 & 0.410 & 0.363 \\ 2 & 1 & 2 & 0.2108 & 0.421 & 0.414 & & 0.519 & 0.559 & & 0.680 & & 0.525 & 0.568 & \\ 2 & 1 & 3 & 0.1795 & 0.456 & 0.390 & & 0.503 & 0.519 & & 0.666 & & 0.508 & 0.496 & 0.516 \\ 3 & 1 & 3 & 0.1322 & 0.432 & 0.370 & 0.389 & 0.463 & 0.484 & 0.491 & 0.666 & & 0.511 & 0.501 & 0.526 \\ 3 & 2 & 2 & 0.1228 & 0.329 & 0.317 & 0.325 & 0.334 & 0.337 & 0.331 & 0.527 & 0.441 & 0.579 & 0.435 & \\ \hline \end{tabular}% } \caption{SVGP hyperparameter values for different autoregressive lags} \label{tab:SVGP_hyperparameters} \end{table} \clearpage \subsection{Validation of hyperparameters} % TODO: [Hyperparameters] Validation of hyperparameters \subsubsection{Conventional Gaussian Process} \begin{figure}[ht] \centering \includegraphics[width = \textwidth]{Plots/GP_113_training_performance.pdf} \caption{} \label{fig:GP_train_validation} \end{figure} \begin{figure}[ht] \centering \includegraphics[width = \textwidth]{Plots/GP_113_test_performance.pdf} \caption{} \label{fig:GP_test_validation} \end{figure} \begin{figure}[ht] \centering \includegraphics[width = \textwidth]{Plots/GP_113_-1pts_test_prediction_20_steps.png} \caption{} \label{fig:GP_multistep_validation} \end{figure} \clearpage \subsubsection{Sparse and Variational Gaussian Process} \begin{figure}[ht] \centering \includegraphics[width = \textwidth]{Plots/SVGP_123_training_performance.pdf} \caption{} \label{fig:SVGP_train_validation} \end{figure} \begin{figure}[ht] \centering \includegraphics[width = \textwidth]{Plots/SVGP_123_test_performance.pdf} \caption{} \label{fig:SVGP_test_validation} \end{figure} \begin{figure}[ht] \centering \includegraphics[width = \textwidth]{Plots/SVGP_123_test_prediction_20_steps.png} \caption{} \label{fig:SVGP_multistep_validation} \end{figure} \clearpage