In sample and out of sample testing is when data is split into two sets of which one is used for testing and the other is used for validation.
This article is an extension of our article on curve fitting. If you have not read it, we recommend you do so here.
If traders were left with the option of using only ONE robustness testing method, most would not hesitate a second to choose in sample and out of sample testing. In fact, this method is so useful that it has been tweaked into at least two other stand-alone concepts; Walk forward testing, and Incubation. In this article you will learn what in sample and out of sample testing is, and why it works. Let us begin!
Some words on backtesting
When backtesting an idea, we like use a lot of market data. The more of it we have access to, the better we can decide the robustness of our strategies. In general, we want to have somewhere between 5-20 years of data to work with, to ensure that changing character of the market is represented in the results, as well as to provide a good sample size.
What most beginning traders do is to test their idea on ALL data available to them in belief that large quantities of data are enough to ensure the validity of their observation. Those who know the concept of curve fitting, will understand that this is incorrect. Most likely, what they have done is to fit an idea to market noise, resulting in immediate failure once traded live.
In sample and out of sample testing
This is where in sample and out of sample testing comes into play as a great method to discover curve fit strategies BEFORE putting money at risk. It is all very simple:
1) Divide all data into two pieces.
2) Do all testing on one of the data pieces.
3) Once done testing, verify your findings on the other data piece.
The piece of data used for testing is called in sample and the piece used for validation is called out of sample. Hence, “In sample and out of sample testing”.
In order to understand better what is in sample and out of sample testing, we will backtest an idea using this very method. Our backtest will be carried out on the Soy bean meal futures market in the following steps:
- Setting data ranges for the in sample and out of sample period.
- Testing our idea on the in-sample data.
- Tweaking our strategy until we feel done.
- Validating our edge on the out of sample.
This is the data we will use:
Bar size Daily
Data range: 2009-2019
In sample: 2009-2017 (8 years)
Out of sample: 2017-2019 (2 years)
1. Setting data ranges for in sample.
In the picture above, you can see how I have set up Tradestation. In this stage data between 2017 and 2019 must be excluded. Otherwise you will not be able to use it later.
You can click on the image to enlarge it.
2. Testing our idea on the in-sample data.
Now when we have loaded all data it is time to test our idea. In this demonstration we will be investigating what happens if you buy when the RSI2-indicator crosses over 50 and sell after 5 days.
After inserting the strategy and loading our results, we get the following equity curve.
This looks quite alright, but we want something better, so we try to tweak it a little bit by running an optimization and see what values work best.
We find that it was better to wait a little longer before selling, so we will not sell after 10 days instead of 5.
This is starting to look quite alright. However, we want better performance, so we will try to add a filter!
After having tried many different indicators and setups, we find that applying the RSI2 indicator to the day prior to the signal, and requiring it to be over 15, works well.
This looks much better!
At this stage, we are satisfied with the performance, and decide to leave it here. The strategy is ready for the out of sample verification.
3. Validating on out of sample data.
This was why we saved some of the data as out of sample. Below you can see how our strategy performed on the out of sample data.
As you can see, it does not fail miserably but makes no new equity highs in the out of sample data. If the market has not changed between our in sample and out of sample periods, the only thing we did was to curve fit our strategy to market noise.
It is clear that this strategy failed, which might not be that fun to realize. Especially not if you have put hours of hard work into developing it. Nevertheless, it is much better than losing money trading it live!
Why does this work?
The main premise of out of sample testing is that true market behavior will be consistent throughout both data sets, while random market noise will not. Therefore, an edge fit to random market noise will not work in the out of sample, while the opposite will be true for edges based on true market behavior. However, no method is idiot proof and so applies to out of sample testing. A curve fit edge could very well pass out of nothing but luck!
Dangers and drawbacks
Even if in sample and out of sample testing can be a great tool to be able to discern curve fit edges from true ones, it can be misused. The most common thing that many do, and that should be avoided, is that they convert out of sample data to in sample data without realizing it.
What often happens, is that traders validate their idea on out of sample data, only to find that it has failed. Upon that realization, they return to the in sample data, tweak their strategy, and test it again on the out of sample data.
Effectively, what they have done is to convert their out of sample data to in sample data. Out of sample data needs to be unseen not to lose its value!
Another important point to keep in mind, is that every trader who performs many backtests, will soon memorize what the market has done at certain times. He will become biased in his strategy creation. If this knowledge is used to fit the edge to the out of sample portion of the data before viewing it, the out of sample could lose its value as validation without you realizing it.
Out of sample and in sample testing is one of the best methods available to discern curve fit strategies from real ones. However, traders can easily be tempted to alternate between in sample and out of sample data while testing, which could be devastating. Nonetheless, as long as the dangers covered in this article are taken into consideration, out of sample testing will be an invaluable tool in all forms of strategy creation and testing on market data.