# 15+ Non-Parametric Models in Finance & Trading

Non-parametric models in finance are valuable for their flexibility and adaptability to various types of financial data.

They’re particularly useful in scenarios where the underlying data doesn’t conform to standard distributional assumptions (e.g., normal distribution).

We’ll cover many examples of non-parametric mathematical models in finance.

## Key Takeaways – Non-Parametric Models in Finance & Trading

- Non-parametric models in finance provide flexibility and adaptability.

- They don’t assume a pre-existing functional form or distribution (i.e., they don’t assume any particular form for the relationship between input variables and the output).
- They effectively handling data that defies standard distributional assumptions.
- These models, including LOESS/LOWESS, historical VaR, and kernel density estimation (among many others), excel in capturing non-linear trends, tail risks, and complex relationships in financial datasets.
- Non-parametric methods require substantial data and computational resources to be effective.

## Local Regression Models (LOESS/LOWESS)

Standing for Local Polynomial Regression or Locally Weighted Scatterplot Smoothing, these models smooth time-series data by fitting simple models (like linear or quadratic) to localized subsets of the data.

This approach is effective for capturing the non-linear trends and patterns in financial time series without assuming a global functional form.

## Non-Parametric Risk Measures

Techniques like historical Value at Risk (VaR) and Expected Shortfall (ES) use actual historical data to estimate the risk of financial instruments.

These models do not rely on any distributional assumptions and are directly based on the *empirical* distribution of returns.

This makes them suitable for better capturing tail risks in financial markets.

Tail risks are generally a problem with standard assumptions of the normal distribution.

Because it’s thin-tailed relative to most financial returns, normal distributions generally underestimate tail risk, and better models are needed – either fatter-tailed parametric models (e.g., Levy alpha-stable distribution) or non-parametric models that can be calibrated to the actual data.

## Density Estimation Techniques

This approach involves estimating the probability density function of a financial variable, like asset returns, directly from the data.

Methods such as **kernel density estimation** are used to create a smooth estimate of the density function, providing a more accurate reflection of the underlying data distribution.

These are most reliable when:

- the data doesn’t fit a standard parametric model
- you know that the past/current data will be like future data

## Decision Trees and Random Forests

These machine learning techniques are used for classification and regression tasks in finance.

Random forests, which combine multiple decision trees, are useful for capturing complex relationships in financial datasets.

## Non-Parametric Tests

Tests like the **Kolmogorov-Smirnov test**, **Mann-Whitney U test**, and **Kruskal-Wallis test** are used to compare samples or test hypotheses without assuming a specific distribution.

These are valuable in financial research for:

- testing anomalies
- market efficiency, or
- comparing return distributions across different instruments

## Non-Parametric Portfolio Optimization

Traditional portfolio optimization methods often rely on assumptions about the distribution of returns.

(For example, when standard deviation is used, that presumes a normal distribution of returns.)

Non-parametric techniques, however, use the actual return distribution – either through historical data or resampling techniques – to optimize portfolios.

This provides a more realistic assessment of risks and returns.

What are these non-parametric portfolio optimization models?

Let’s categorize them:

### Historical Simulation

Uses actual historical return data to model portfolio risk and return.

Parametric ways make assumptions about the distribution of returns.

### Resampling Techniques

Involves repeatedly sampling from historical return data to construct a range of possible portfolio outcomes for optimization.

### Bootstrap Methods

Generates numerous resamples of historical return data to estimate the distribution of portfolio returns and risks.

### Monte Carlo Simulation

Simulates a wide range of possible return scenarios for assets in the portfolio.

Then optimizes allocation based on risk and return (and any other objectives).

### Mean-CVaR Optimization

Focuses on optimizing the Conditional Value at Risk (CVaR) – a measure that takes into account extreme losses in the tail of the distribution.

### Robust Optimization

Aims to create portfolios that perform well under various market conditions, including worst-case scenarios (again, without relying on specific return distribution assumptions).

### Empirical Bayes Methods

Combines historical data with Bayesian statistics to estimate return distributions, allowing for dynamic updating of beliefs based on new data.

### K-Nearest Neighbors (KNN) in Portfolio Construction

Covered in the next section, the KNN algorithm identifies assets with similar return characteristics for applications like, e.g., time series, diversification, and risk management.

### Quantile Regression for Asset Allocation

Employs quantile regression to understand the relationships between assets under different market conditions.

Helps in tailoring portfolios that are resilient to extreme market movements.

### Decision Trees for Asset Selection

Uses decision tree algorithms to select assets based on a variety of financial indicators.

For example, decision trees might categorize stocks into risk profiles based on indicators like P/E ratio, market cap, and sector performance.

## K-Nearest Neighbors (KNN)

This algorithm is used for both classification and regression in finance.

It predicts the output for a new data point based on the outputs of the “K” closest points in the training set.

This method is useful for predicting financial time series or for things like credit scoring in the banking sector.

## Wavelet Transform Analysis

This technique decomposes financial time series data into different frequency components.

It provides a more nuanced understanding of data characteristics over time.

It helps detect hidden patterns, trends, and abrupt changes in financial markets.

Wavelet transform analysis is applied in analyzing and forecasting volatile financial markets, such as stock prices or currency exchange rates, where data characteristics can change rapidly over time.

## Artificial Neural Networks (ANNs)

ANNs are a set of algorithms modeled loosely after the human brain.

They’re designed to recognize patterns.

They interpret sensory data through machine perception, labeling, and clustering raw input.

In finance, ANNs are used for predictive modeling in stock market analysis, credit scoring, and algorithmic trading.

They help identify complex non-linear relationships between variables.

## Support Vector Machines (SVM)

SVMs are supervised learning models that analyze data for classification and regression analysis.

They’re effective in high dimensional spaces (i.e., many factors affecting the output) and in cases where the number of dimensions exceeds the number of samples.

SVMs are employed in financial modeling for predicting asset trends and credit risk analysis.

## Splines and Generalized Additive Models (GAMs)

Splines are a series of polynomial segments strung together, ensuring smoothness at each point where the segments meet.

GAMs extend linear models by allowing non-linear functions of predictor variables while maintaining interpretability.

In simple terms, they’re basically when you want to fit data to a smooth curve but aren’t sure of the underlying structure of the data points.

They help make predictions or analyze trends.

These models are used in yield curve analysis, for fitting non-linear patterns in asset prices, and in modeling interest rate changes.

## Quantile Regression

This technique estimates the median or other quantiles of the response variable’s conditional distribution as a function of the predictor variables.

It gives a more comprehensive analysis of the relationship between variables.

It’s particularly useful in financial risk management – where understanding the tails of the distribution (extreme losses or gains) might be viewed as more important than the mean or median of the distribution.

## Multivariate Adaptive Regression Splines (MARS)

MARS is a non-parametric regression technique that automatically models non-linearities and interactions between variables.

It does so by fitting piecewise linear regressions, which makes it very flexible.

MARS is used for complex financial modeling tasks like predicting stock returns, bond ratings, and economic indicators where relationships between variables are not linear or well-defined.

## Non-Parametric Volatility Models

These models, such as the realized volatility using high-frequency data, don’t assume a specific functional form for the volatility of asset returns.

They can capture the clustering of volatility and leverage effects more effectively.

Such models can be used in derivative pricing, risk management, and in constructing volatility indices in financial markets.

## Dynamic Time Warping (DTW)

DTW is an algorithm for measuring similarity between different temporal sequences which may vary in speed.

It’s a flexible method that allows for stretching and compressing of the time series to find a match.

In finance, DTW can be used for analyzing and comparing financial time series, like stock price movements, for pattern recognition and anomaly detection.

For example, how does a stock’s movement vary with the overall index (e.g., GOOG and the S&P 500)?

How do two similar stocks (e.g., Ford and GM) co-vary?

What can be learned from that and can it be traded?

## Isotonic Regression

This is a regression technique that fits a non-decreasing function to data.

This is useful when you know the response should be a non-decreasing function of some predictor variables.

In simpler terms, it involves creating a model that follows a rule: as one variable increases, the other variable does not decrease.

This approach is particularly helpful when you already have an understanding or assumption that in a certain situation – as one thing grows or progresses, the other thing should either stay the same or grow as well, but it should never reduce.

To use a non-finance example, imagine you are looking at the relationship between the amount of time spent studying and exam scores.

Intuitively, as study time increases, you would expect exam scores to either increase or at least not decrease. Isotonic regression would be a suitable method to model this kind of relationship.

It’s particularly useful in financial modeling where a natural order is expected, sometimes in yield curves (but not always) or in cumulative return profiles of investment portfolios (e.g., more time in the market = higher returns).

## Ensemble Methods

Ensemble methods combine multiple techniques to improve on the results of using just one.

Techniques like Bagging and Boosting combine the decisions from multiple models to improve the overall performance.

They’re also non-parametric as they don’t assume a specific form for the model but build it from the data or the underlying cause-effect mechanics.

These methods are used in risk modeling, credit scoring, and algorithmic trading (where robustness and accuracy are most important).

## Conclusion

Non-parametric models and techniques offer a more nuanced and flexible approach to financial data analysis – particularly in situations where the underlying data is complex or doesn’t fit well with traditional parametric models.

They’re used in risk management, portfolio optimization, market analysis, and predictive modeling.

Nonetheless, they can be data-intensive and computationally demanding, which are important considerations in their application.

Also, be wary of building models using historical data with the intention of using them on a forward basis.

In markets, the past isn’t necessarily a reliable indicator of the future.

Focusing on the underlying cause-effect relationships is best.