Last Updated on 10 February, 2024 by Rejaul Karim
In Sample and Out of Sample Testing are critical concepts in statistical modeling and machine learning that are used to evaluate the accuracy of a model’s predictions. In simple terms, in-sample testing involves evaluating a model using the data it was trained on, while out-of-sample testing involves evaluating the model on data it has never seen before.
The purpose of this article is to provide a comprehensive overview of these concepts, including their definition, importance, best practices, limitations, and real-world applications. Whether you are a data scientist, financial analyst, or just someone interested in understanding these concepts better, this article will serve as an essential resource.
Short Definition: In sample and out of sample testing is when data is split into two sets of which one is used for testing and the other is used for validation.
If traders were left with the option of using only ONE robustness testing method, most would not hesitate a second to choose in sample and out of sample testing. In fact, this method is so useful that it has been tweaked into at least two other stand-alone concepts; Walk forward testing, and Incubation. In this article you will learn what in sample and out of sample testing is, and why it works. Let us begin!
This article is an extension of our article on curve fitting. If you have not read it, we recommend you do so here.
Some words on backtesting
When backtesting an idea, we like use a lot of market data. The more of it we have access to, the better we can decide the robustness of our strategies. In general, we want to have somewhere between 5-20 years of data to work with, to ensure that changing character of the market is represented in the results, as well as to provide a good sample size.
What most beginning traders do is to test their idea on ALL data available to them in belief that large quantities of data are enough to ensure the validity of their observation. Those who know the concept of curve fitting, will understand that this is incorrect. Most likely, what they have done is to fit an idea to market noise, resulting in immediate failure once traded live.
In sample and out of sample testing
This is where in sample and out of sample testing comes into play as a great method to discover curve fit strategies BEFORE putting money at risk. It is all very simple:
1) Divide all data into two pieces.
2) Do all testing on one of the data pieces.
3) Once done testing, verify your findings on the other data piece.
The piece of data used for testing is called in sample and the piece used for validation is called out of sample. Hence, “In sample and out of sample testing”.
In order to understand better what is in sample and out of sample testing, we will backtest an idea using this very method. Our backtest will be carried out on the Soy bean meal futures market in the following steps:
- Setting data ranges for the in sample and out of sample period.
- Testing our idea on the in-sample data.
- Tweaking our strategy until we feel done.
- Validating our edge on the out of sample.
This is the data we will use:
Bar size Daily
Data range: 2009-2019
In sample: 2009-2017 (8 years)
Out of sample: 2017-2019 (2 years)
1. Setting data ranges for in sample.
In the picture above, you can see how I have set up Tradestation. In this stage data between 2017 and 2019 must be excluded. Otherwise you will not be able to use it later.
You can click on the image to enlarge it.
2. Testing our idea on the in-sample data.
Now when we have loaded all data it is time to test our idea. In this demonstration we will be investigating what happens if you buy when the RSI2-indicator crosses over 50 and sell after 5 days.
After inserting the strategy and loading our results, we get the following equity curve.
This looks quite alright, but we want something better, so we try to tweak it a little bit by running an optimization and see what values work best.
We find that it was better to wait a little longer before selling, so we will not sell after 10 days instead of 5.
This is starting to look quite alright. However, we want better performance, so we will try to add a filter!
After having tried many different indicators and setups, we find that applying the RSI2 indicator to the day prior to the signal, and requiring it to be over 15, works well.
This looks much better!
At this stage, we are satisfied with the performance, and decide to leave it here. The strategy is ready for the out of sample verification.
3. Validating on out of sample data.
This was why we saved some of the data as out of sample. Below you can see how our strategy performed on the out of sample data.
As you can see, it does not fail miserably but makes no new equity highs in the out of sample data. If the market has not changed between our in sample and out of sample periods, the only thing we did was to curve fit our strategy to market noise.
It is clear that this strategy failed, which might not be that fun to realize. Especially not if you have put hours of hard work into developing it. Nevertheless, it is much better than losing money trading it live!
Why does this work?
The main premise of out of sample testing is that true market behavior will be consistent throughout both data sets, while random market noise will not. Therefore, an edge fit to random market noise will not work in the out of sample, while the opposite will be true for edges based on true market behavior. However, no method is idiot proof and so applies to out of sample testing. A curve fit edge could very well pass out of nothing but luck!
Dangers and drawbacks
Even if in sample and out of sample testing can be a great tool to be able to discern curve fit edges from true ones, it can be misused. The most common thing that many do, and that should be avoided, is that they convert out of sample data to in sample data without realizing it.
What often happens, is that traders validate their idea on out of sample data, only to find that it has failed. Upon that realization, they return to the in sample data, tweak their strategy, and test it again on the out of sample data.
Effectively, what they have done is to convert their out of sample data to in sample data. Out of sample data needs to be unseen not to lose its value!
Another important point to keep in mind, is that every trader who performs many backtests, will soon memorize what the market has done at certain times. He will become biased in his strategy creation. If this knowledge is used to fit the edge to the out of sample portion of the data before viewing it, the out of sample could lose its value as validation without you realizing it.
Explanation of the terms “in-sample” and “out-of-sample” with examples
In statistical modeling, the term “in-sample” refers to data that is used to fit or train a model. On the other hand, “out-of-sample” data refers to data that is held back from the model training process and used to evaluate the performance of the model. For example, in a study analyzing the returns of a stock, the data for the past 3 years can be used as in-sample data to train a stock price prediction model, while the data for the next year can be used as out-of-sample data to evaluate the model’s performance.
Comparison of in-sample and out-of-sample evaluation in statistical modeling
In-sample evaluation is used to determine how well a model fits the training data, while out-of-sample evaluation is used to assess the model’s ability to make accurate predictions on new, unseen data. The main difference between the two is that in-sample evaluation is subject to overfitting, while out-of-sample evaluation is a more reliable indicator of a model’s true performance.
Importance of out-of-sample evaluation in financial modeling and portfolio optimization
In financial modeling and portfolio optimization, out-of-sample evaluation is critical because it helps to assess the risk and potential return of investment strategies. By evaluating a model’s performance on unseen data, financial practitioners can determine whether a strategy will perform well in the future and make informed investment decisions.
Limitations of in-sample evaluation and why out-of-sample evaluation is more reliable
In-sample evaluation has several limitations, including overfitting, selection bias, and a lack of generalizability. Overfitting occurs when a model is too complex and fits the training data too closely, leading to poor performance on new data. Selection bias arises when the sample is not representative of the population, leading to incorrect conclusions. Out-of-sample evaluation is more reliable because it avoids these limitations by testing the model on unseen data, providing a more accurate estimate of the model’s performance.
Best practices for splitting data into in-sample and out-of-sample sets
To ensure accurate evaluation of a statistical model, it is important to split the data into in-sample and out-of-sample sets in a random and representative manner. One common practice is to use a ratio of 80/20 or 70/30, where 80% or 70% of the data is used for training and the remaining 20% or 30% is used for testing. It is also important to ensure that the in-sample and out-of-sample sets are similar in terms of distribution, variance, and other characteristics to avoid selection bias.
The impact of sample size on in-sample and out-of-sample accuracy
The sample size has a significant impact on the accuracy of in-sample and out-of-sample evaluations. A larger sample size can lead to more accurate results, as it provides a more representative picture of the population. However, a larger sample size also increases the risk of overfitting, so a trade-off must be made between sample size and model complexity.
Real-world applications of in-sample and out-of-sample evaluations
Real-world applications of in-sample and out-of-sample evaluations in fields like data science and machine learning. In-sample and out-of-sample evaluations are widely used in various fields, including data science and machine learning. For example, in a machine learning project, the in-sample data can be used to train a model to recognize handwritten digits, while the out-of-sample data can be used to evaluate the model’s accuracy in recognizing new, unseen images.
What are the best practices for in-sample and out-of-sample testing?
Best practices include splitting data into in-sample and out-of-sample sets randomly and ensuring they are representative of the overall dataset. It’s also important to maintain similarity in distribution, variance, and other characteristics between the two sets to avoid bias.
What is the impact of sample size on in-sample and out-of-sample accuracy?
Sample size significantly affects the accuracy of evaluations. A larger sample size generally leads to more accurate results but increases the risk of overfitting. Thus, a balance between sample size and model complexity is crucial.
How can in-sample and out-of-sample testing be applied in real-world scenarios?
In-sample and out-of-sample testing are widely used in fields like data science and machine learning for various applications, such as predicting stock prices or recognizing patterns in data.
Out of sample and in sample testing is one of the best methods available to discern curve fit strategies from real ones. However, traders can easily be tempted to alternate between in sample and out of sample data while testing, which could be devastating. Nonetheless, as long as the dangers covered in this article are taken into consideration, out of sample testing will be an invaluable tool in all forms of strategy creation and testing on market data.
Here you can read more about algotrading in our archives.