In [1]:

```
import yfinance as yf
import pandas as pd
from scipy import stats
import numpy as np
import plotly.express as px
from scipy.optimize import minimize
import plotly.graph_objects as go
import plotly.io as pio
import pathlib
import sys
utils_path = pathlib.Path().absolute().parent.parent
sys.path.append(utils_path.__str__())
import utils.layout as lay
```

In [2]:

```
pio.templates.default = 'simple_white+blog_mra'
```

In finance, the term *volatility* $\sigma$ is measure of dispersion defined the annualized standard deviation of the returns of an investment. Since the standard deviation of monthly returns is not comparable in scale with the standard deviation of daily returns, annualizing both measures is required.
In fact, the *square root of time rule* indicates that the *h*-period log return is $\sqrt{h}$ time the standard deviation of one-period log return:

where $k$ is the annualization factor e.g. $k$=250, 52, 12 for daily, weekly and monthly returns, respectively.

For that to be true, we need to assume that the one-period returns generating process is i.i.d., which implies a constant volatility assumption. Altough this requisite is not realistic in most financial time series, calculating assets risk by annualizing the standard deviation has become market convention among finance practitioners.

To deviate from the classic case, we could drop the i.i.d. assumption and add positive or negative autocorrelation in our returns series. Now, we assume an AR(1) autoregressive process of the form

where $\theta$ is the autocorrelation coefficient.

In this setup, the derived scaling factor would assume the form:

\begin{equation} \hat{k} = AR(1) Scale Factor = \left(h+2\frac{\theta}{(1-\theta)^2}\left[(h-1)(1-\theta)-\theta(1-\theta^{k-1})\right]\right)^{1/2} \end{equation}and the annualization of the standard deviation can be done, as before, as

\begin{equation} \sigma_{year} = \hat{k} \sigma_{day} \end{equation}with $h$ = 250.

Note that the second term in Equation (3) is positive if and only if $ \theta $ is
positive. In other words, positive serial correlation leads to a larger volatility estimate and
negative serial correlation leads to lower volatility estimate, compared with the i.i.d. case.

- Volatility is unobservable; we can only estimate and forecast volatility. There is no absolute "true" volatility measure.
- Volatility captures only the dispersion of the returns distribution (second moment) when normality is assumed; hence, it does not provide a full description of the risks that are taken by the investments unless we assume the investment returns are normally distributed.
- The ‘true’ variance and covariance depend on the model. As a result there is a considerable degree of model risk inherent in the construction of a covariance or correlation

matrix. That is, very different results can be obtained using two different statistical models even when they are based on exactly the same data.

- The estimates of the true covariance matrix are subject to sampling error. Even when two

analysts use the same model to estimate a covariance matrix their estimates will differ if they use different data to estimate the matrix. Both changing the sample period and changing the frequency of the observations will affect the covariance matrix estimate.

(a) The variance of daily returns is 0.001. Assuming 250 risk days per year, what is the
volatility?

(b) The volatility is 36%. What is the standard deviation of weekly returns?

In [3]:

```
sol_a = np.sqrt(0.001*250)
sol_b = 0.36/np.sqrt(52)
print("Solution A -> Volatility = {}".format(sol_a))
print("Solution B -> Std Dev = {}".format(round(sol_b, 2)))
```

Solution A -> Volatility = 0.5 Solution B -> Std Dev = 0.05

In [4]:

```
# Equation 3
def ar1_scale_factor(n, theta):
"""
n = number of returns
theta = autocorrelation coefficient
"""
return (n+2*((theta/(1-theta)**2)*((n-1)*(1-theta)-theta*(1-theta**(n-1)))))
```

In [5]:

```
std_monthly_rets = 0.05
n_rets = 12
volatility_iid = std_monthly_rets*np.sqrt(n_rets)
autocorr = 0.25
scaling_factor = ar1_scale_factor(n_rets, autocorr)
volatility_ar1 = std_monthly_rets*np.sqrt(scaling_factor)
print("Volatility IID: {}".format(round(volatility_iid, 4)))
print("Volatility AR1: {}".format(round(volatility_ar1,4)))
```

Volatility IID: 0.1732 Volatility AR1: 0.2186

The *covariance* between two returns is the first central moment of their joint density function and it can take any real number.

Given that covariances vary in scale with the size of the returns, instead of time like for volatility, we can obtain a standardize measure dividing the covariance of two returns by the product of the their standard deviations. This standardize measure, namely *correlation*, lies between -1 and +1.
Note that zero correlation implies independence only is the two returns have a bivariate normal distribution.

Finance practitioners have always used Pearson's correlation to measure of dependency. However, the metric is not appropriate when two returns have an elliptical joint distribution. Moreover, the assumption of multivariate normal i.i.d returns is not empirically justified, which makes the results inaccurate. In this case, using copula function as measure of dependency improves substantially the accuracy of our results.

$$\begin{array}{llll}
\hline
Asset 1\ volatility & 0.2 & Asset 1 - Asset 2\ Correlation & 0.8 \\
Asset 2\ volatility & 0.1 & Asset 1 - Asset 3\ Correlation & 0.5 \\
Asset 3\ volatility & 0.15 & Asset 2 - Asset 3\ Correlation & 0.3 \\
\hline
\end{array}
$$

Portfolio weights: $$ w = (\frac{1}{6}, \frac{2}{6}, \frac{3}{6}) $$

Portfolio variance:

$$ V(R) = w'Vw $$In [6]:

```
w = np.array([1/6, 2/6, 3/6])
V = np.array([[0.04, 0.016, 0.015], [0.016, 0.01, 0.0045,], [0.015, 0.0045, 0.0225]])
Ptf_var = w.T.dot(V).dot(w)
print("Portfolio Variance: {}".format(Ptf_var))
print("Portfolio Volatility: {}".format(round(np.sqrt(Ptf_var), 5)))
```

Portfolio Variance: 0.013625 Portfolio Volatility: 0.11673

An $h$-day covariance matrix $V_{h}$ is the matrix of variances and covariances of $h$-day returns e.g. h=1 means daily returns. If the returns are i.i.d. and the joint distribution is multivariate normal, then variance and covariance scale with time as

\begin{equation} V = hV_{h} \end{equation}where $h$ is the frequency of the returns used in the covariance calculation.

In [7]:

```
def create_correlation_matrix(cross_corr):
"""
Create a 3x3 correlation matrix with 1s on the diagonal and
fill in the upper and lower triangular parts with cross-correlations.
Args:
cross_corr (numpy.ndarray): A 3x1 array of cross-correlations between three assets.
Returns:
numpy.ndarray: A 3x3 correlation matrix.
"""
n = cross_corr.shape[0]
corr_matrix = np.zeros((n, n))
np.fill_diagonal(corr_matrix, 1)
for i in range(n):
for j in range(i + 1, n):
corr_matrix[i, j] = cross_corr[i, 0]
corr_matrix[j, i] = cross_corr[i, 0]
return corr_matrix
```

In [8]:

```
h = 10
vols = np.array([0.2, 0.1, 0.15])
D = np.diag(vols)
correlations = np.array([[0.8], [0.5], [0.3]])
C = create_correlation_matrix(correlations)
V_annual = D.dot(C).dot(D)
```

$$ V = DCD $$$$ 250-day\ Covariance\ Matrix $$$$
\begin{bmatrix}
0.04 & 0.02 & 0.02 \\
0.02 & 0.01 & 0.01 \\
0.02 & 0.01 & 0.02 \\
\end{bmatrix} $$

In [9]:

```
V_scaled = V_annual / (250/h)
```

$$ 10-day\ Covariance\ Matrix $$$$
\begin{bmatrix}
16.00 & 6.40 & 9.60 \\
6.40 & 4.00 & 3.00 \\
9.60 & 3.00 & 9.00 \\
\end{bmatrix} \cdot 10^{-4}$$

Despite its wide adoption among finance practitioners, Pearson's correlation comes with numerous limitations. Embrechts et al. (2002) explain correlation pitfalls in his famous article "Correlation And Dependence In Risk Management: Properties And Pitfalls". Here we list some of them:

- It is only linear measure of association that is not flexible enough to capture non-linear dependencies. -> $Cov(X, X^2) = 0$
- Correlation is not invariant under transformation of variables -> $Corr(X, Y) \neq Corr(\ln{(X)}, \ln{(Y)})$
- Feasible values for correlation depend on the marginal distributions
- Perfect positive dependence does not imply a correlation of one. With the lognormal variables above,

perfect positive dependence implies a correlation of two-thirds and perfect negative dependence implies a correlation of only −0.09.

- Zero correlation does not imply independence