// Insight

Transformer covariance for ETFs: the right target, the missing evidence

November 25, 20244 min read

covariancetransformersasset-allocation

Covariance estimation is the weak link in mean-variance optimization. This working paper aims a transformer at it, forecasting the covariance and semi-covariance matrices that drive a minimum-variance ETF allocation. The target is exactly right: the matrix the optimizer trusts is the thing that is hardest to estimate and easiest to get wrong. The evidence behind the claim is where a quant has to slow down.

The idea worth taking seriously is the semi-covariance focus. A standard covariance matrix treats upside and downside co-movement the same. A semi-covariance matrix counts only the co-movements below the mean, which is the risk an investor actually fears. Forecasting that matrix, and allocating on it, targets downside risk directly rather than as a byproduct of total variance.

The mechanism is more clever than feeding returns straight to a transformer. The paper computes the covariance and semi-covariance matrices first, flattens each one’s lower triangle into a vector, and trains the transformer to forecast the next period’s vector from the history of past ones. It compares four architectures, vanilla Transformer, Autoformer, Informer, and Reformer, then forces the predicted matrix to be positive-semidefinite so the optimizer can use it. The forecast matrix feeds a minimum-variance optimization. Autoformer, with its trend-and-seasonal decomposition, came out ahead.

Forecasting the risk matrix, then optimizing on it

The transformer forecasts the next matrix from the history of past matrices rather than from raw returns. A positive-semidefinite fix makes the forecast usable by the optimizer. Autoformer scored best over the one-month test.

Why the optimizer makes this hard

Mean-variance optimization is notoriously sensitive to its covariance input. Small errors in the estimated matrix produce large, confident, wrong shifts in the weights, because the optimizer treats the estimate as truth and pushes to the corners. This is why decades of quant research went into taming the matrix, not forecasting it harder. Ledoit-Wolf shrinkage, factor-model covariance, and related methods all exist to reduce estimation error, because reducing error beats chasing a sharper forecast that the optimizer will amplify.

A transformer that forecasts the matrix is betting it can predict co-movements well enough that the optimizer’s sensitivity becomes an advantage rather than a trap. That is a strong bet. It can pay off. It can just as easily feed the optimizer a confident forecast that is wrong in a new way, producing turnover and drawdown that a shrinkage estimator would have avoided.

What the evidence does not yet show

This is a working paper, marked under revision. The results are the kind that should make a quant cautious rather than excited. The paper does report full tables. Semi-covariance portfolios beat covariance ones across the board, and Autoformer posts the strongest numbers, a 7.23% return and a Sortino above 12 on the covariance variant. Those figures look spectacular. They also come from a test period of one month.

That one-month window is the deepest problem. A month of out-of-sample data is an anecdote rather than evidence. A Sortino above 12 over four weeks is exactly the kind of number that vanishes on a longer sample. The other gaps compound it. The baseline is a naive sample method, last month’s matrix carried forward, rather than the shrinkage estimator a desk actually competes against. There is no transaction-cost model. A forecast that shifts weights aggressively can generate turnover that eats the entire edge once costs are counted. There is no turnover analysis at all. A downside-aware matrix that rebalances violently is not downside-aware in practice.

Until those gaps are closed, the result is a promising idea resting on a month of data. The semi-covariance framing is genuinely good. The engineering around it, the matrix forecasting and the positive-semidefinite fix, is sound. The proof that any of it survives a real out-of-sample stretch, against shrinkage, after costs, is not here yet.

How I would test it

Against the incumbent, with costs, out of sample. The experiment that would convince a desk is a walk-forward backtest of transformer semi-covariance against Ledoit-Wolf shrinkage and a factor-model covariance, on the same ETF universe, with a realistic cost model and reported turnover. Judge it on net, risk-adjusted return after costs rather than on a one-month Sortino. If the transformer forecast still wins that comparison, it is a real contribution. If it does not, it is one more case of a sharper forecast that the optimizer turned into noise. None of this is a reason to dismiss the paper. The semi-covariance target is the right one. The authors are aiming at a real weakness rather than a fashionable architecture for its own sake. It is a reason to withhold the verdict until the backtest that matters has been run.

Semi-covariance is the right matrix to forecast for downside risk. Whether a transformer forecast of it beats shrinkage once you count turnover and costs is the question this working paper has not yet answered.

Working on AI that needs to ship?

I help funds, fintechs, and data teams take AI from prototype to production.

Get in touch Read the book →