## Introduction

Over the past decades, the hedge fund sector has experienced significant expansion. Despite a decrease in both the number of hedge funds and their average leverage during the 2007–2008 credit crisis, the current aggregate investment in hedge funds stands at $5.1 trillion (Source: BarclayHedge). Concurrently, with the rapid growth of the hedge fund industry, there has been a heightened investor demand for products that can deliver hedge fund-like returns at a reduced cost and without the typical associated risks, such as illiquidity, lack of transparency, and management-specific risks. In response to this demand, investment banks and asset management firms have introduced investment products, commonly referred to as ‘clones,’ designed to replicate hedge fund returns through the utilization of statistical models or algorithmic trading strategies.

In this context, two main approaches have been adopted by financial researchers to attempt replication: the distribution-matching and the factor-based approach. While for the first one the objective is to recreate the distributional properties of the replicated instrument through constrained optimization (Harris and Mazibas 2011), the factor-based approach uses linear and, sometimes, non-linear approximation to minimise the tracking error between the hedge fund returns and the weighted average return of the factors. These two approaches have exhibited distinct replicating capabilities, each with its own set of pros and cons. Specifically, factor-based in-sample performance appears to be higher, while it performs relatively poorly out-of-sample. In contrast, distribution matching techniques excel at replicating higher moments of hedge fund return distributions—volatility, skewness, and kurtosis—though they may struggle to accurately track the first moment, i.e., the positive average return of hedge funds.

In this article, our focus will be on the distribution-matching approach. We will conduct a replication exercise for the Fund of Hedge Funds index (FOF), a commonly employed benchmark by investors. Additionally, we will delve into the nuances of this method to provide a comprehensive understanding of its application and effectiveness in replicating hedge fund returns.

## The Model

In the paper “Factor-Based Hedge Fund Replication with Risk Constraints” by Harris and Mazibas (2011), the authors propose a method for replicating the monthly returns of hedge fund indices. This approach involves a factor-based model supplemented with a set of risk and return constraints, implicitly targeting all moments of the hedge fund return distribution. The authors incorporate the linear component from the factor-based approach while imposing various constraints to ensure that the replicating portfolio aligns with several risk measures of the hedge fund, including the Conditional Value at Risk, Conditional Drawdown at Risk, and the partial moments of returns. We will limit ourself to the Value at Risk (VaR) case; however, the optimization can be extended to other metrics using a similar approach.

The starting point of the analysis is the definition of an objective function. For our case, the goal is minimize, the tracking error between the FOF index and our replicating portfolio under a set of risk and return constraints:

$$ min f(x) = var(r_{hf,t} – r_{p,t}) $$

*subject to:*

\begin{equation}

\sum_{i=1}^{m} x_{i} = 1, \quad i = 1,…,m \\

\sum_{i=1}^{m} \bar{r_{i}} x_{i} = \bar{r_{hf}} \\

x_{i} \geq 0 \\

VaR_{p} = VaR_{hf}

\end{equation}

where VaR is calculated as 5th percentile of the returns distribution.

Additionally, constraints 1 and 2 are fixed. We then test the following combinations:

- Matching VaR and no short selling (VaRnoSS)
- No short selling (noSS)
- Matching VaR and short selling (VaRandSS)
- Short selling (SS)

In this application, we optimize on a rolling window of 120 months with one month out-of-sample forecast. The window size is chosen to have sufficient data for the optimization and to avoid excessive volatility of the weights over time.

## An example for Fund of Hedge Funds

### Data

Our analysis cover the period Jan 1990 to Sept 2022. We collect monthly data of our target variables

HFRI Fund of Hedge fund index – see Figure 1 – for a total of 394 observations, and we consider the aggregated index HRFI as a benchmark. Although little consensus exists on which and how many factors are optimal to correctly capture the hedge funds’ returns exposure, we follow the approach of Tupitsyn (2014), who proposed an initial set of 14 factors, 6 of which are in common with Hazadovic and Lo (2007). The complete set of factors is shown in Table 1. The selection not only goes in accordance with the past literature but is also based on the economic sense as well as the availability of liquid tradable securities in the market.

### Optimization

In the following code block, we run 4 optimization using different set of constraint as mentioned above. For each optimization we save the resulting weights and the 1-month out-of-sample prediction at each date as shown in Table 2.

```
def sum_weights(guess):
return sum(guess) - 1
def no_short(guess):
return guess
def equal_return(guess, r_hf , r_p):
return r_hf.sum() - r_p.dot(guess).sum()
def VaR(guess, r_hf , r_p):
return np.percentile(r_hf, 5) - np.percentile(r_p.dot(guess), 5)
def obj_fun(guess, r_hf, r_p):
return np.var(r_hf - r_p.dot(guess))
cons = [{'type':'eq', 'fun': sum_weights},
{'type':'ineq', 'fun': no_short},
{'type':'eq', 'fun': equal_return, "args": (r_hf, r_p)},
{'type':'eq', 'fun': VaR, "args": (r_hf, r_p)}
]
```

```
info_pred = []
weights = pd.DataFrame()
window = 120
x = df_all[sel_fct]
y = df_all["FOF"]
for row in range(window, len(x)):
X_roll = x.iloc[row-window:row, :]
Y_roll = y[row-window:row]
date = x.index.values[row]
res = minimize(obj_fun, x0=guess, args=(Y_roll, X_roll,), constraints=cons)
pred = x.iloc[row, :].dot(res.x)
e = y[date] - pred
info_pred.append([date, y.values[row], pred, e])
df_row = list(res.x)
df_row.append(date)
weights = pd.concat([weights, pd.Series(df_row)], axis=1)
weights = weights.T
df_pred = pd.DataFrame(info_pred, columns=["Date", "FOF", "Pred", "Error"]).set_index("Date")
```

Table 2 shows the output of the rolling optimization. One hand, we have the time series of 1-month predictions and the associated prediction error, on the other, the portfolio rolling weights.

## Results

Table 4 displays the summary statistics for all combinations of imposed constraints. It is interesting to note that, although annualized volatilities (second column from the right) are all higher than the FOF, the difference is not substantial. Additionally, if we consider that the average yearly returns of the clones are almost double the index, we can be quite satisfied with our results.

Figure 4 illustrates how the portfolio weights change over time, a crucial check for assessing rebalancing costs. Here, we observe that when we impose the equal VaR condition, we experience higher volatility with more frequent sudden jumps.

Finally we conclude, by showing the backtest results for the 4 replicating strategies. We conclude that the through this optimization methodology, we managed to achieve interesting results: a slightly higher out-of-sample volatility but also higher average returns for our prediction.

## Conclusion

In this articled, we explored a hedge fund replication strategy that integrates the factor-based methodology with a set of risk and performance constraints. Employing this approach, we aim to emulate the monthly returns of ten hedge fund strategy indices by utilizing long-only and long-short positions in a diverse range of equity, interest rate, exchange rate, and commodity indices. All these indices are tradable using liquid and investible instruments such as futures, options, and ETFs. Through out-of-sample tests, we demonstrate that our approach produces replicating portfolios with the potential to closely mimic both the risk-adjusted performance and distributional characteristics of the hedge fund indices they are designed to replicate. In the next article, we will focus on factor-based replication models.

*References:*

Full code available here

Follow the LinkedIn page