Swing Trading Signals

Since 2013

  • 100% Quantified, data-driven and Backtested
  • We always show our results!
  • Signals every day via our site or email
  • Cancel at any time!

The Importance of Good Data Sets When Backtesting Strategies (Garbage In Equals Garbage Out)

Last Updated on 17 February, 2024 by Trading System

Good data is important in trading. After ten years of day trading, I have experienced the expensive way the importance of good data. The famous saying “garbage in, garbage out” is indeed true.

Your backtest is only as good as the data you are testing on. Make sure you are backtesting on reliable and “clean” data. In the long run, it pays off to spend money on a good data source for backtesting.

I have probably lost tens of thousands of dollars on trading strategies that are based on “garbage”. Sad but true.

(Before we go on we’d like to mention that we have a backtesting course that covers all aspects of how to backtest.)

Yahoo!finance is often wrong

Problem is, it is not much you can do about it. Or is it? By writing this blog I’ve been contacted by several people. Yesterday one sent me his own data on SPY which he has downloaded from Interactive Brokers (IB) himself. I’ll do some testing on this dataset to see the differences between that and EOD data from Yahoo!finance. No doubt the dataset downloaded from IB is better than what you get from many providers.

First, I’ll show you two errors in SPY that I still can remember (in Yahoo!Finance):

The Importance of Good Data Sets When Backtesting Strategies (Garbage In Equals Garbage Out)

This example is from the 30th of November 2011.

It’s correct that it was a big gap-up opening, but the low is completely wrong. Even in a paid data feed such as IQFeed.net this low price is included (on EOD data, not intraday data). In many strategies, if you rely on the low of the day to set profit targets, this will turn out to be a huge winner. But in reality, this low trade never happened. The fact is that this day had a low that was only some 20 cents lower than the open! Not 2 dollars as shown here.

Here is the second example:

The Importance of Good Data Sets When Backtesting Strategies (Garbage In Equals Garbage Out)

This one is from the 9th of April 2012. It shows the gap is filled, but it’s fake. The high of the day was 75 cents higher than the open, not close to 2 dollars as shown in the chart! Fading the gap this turns out to be a fake huge winner.

Worth noting is that the CLOSE is basically 100% right. OPEN is also reasonably correct. It is the HIGH and LOW prices of the day which are sometimes (very) wrong.

A comparison between two data providers: Yahoo!finance and Interactive Brokers

Below is a comparison of the quotes comparing the manually downloaded quotes from IB and the EOD quotes from Yahoo!finance. It shows the percentage difference between the OPEN to HIGH and OPEN to LOW (the OPEN to HIGH from IB is deducted from the OPEN to HIGH from Yahoo!finance).

The first bar shows that Yahoo!finance has a lot of high quotes that are a lot higher than IB’s. The second chart shows the same attributes: The low in Yahoo!finance is a lot lower than IB’s.

The Importance of Good Data Sets When Backtesting Strategies (Garbage In Equals Garbage Out)

The question is: are these differences so brutal that it makes a theoretically good strategy useless?

Yesterday I wrote about opening gaps in SPY. And yes, the results are a lot worse. This morning I tested on all three options: EOD from Yahoo/Finance, intraday data collected from IB, and intraday data from IQFeed.net. All three yields significantly different numbers! When using EOD data from IQFeed I basically get the same result as in Yahoo!finance.

Conclusion about good data sets when backtesting:

So the conclusion must be: if you’re testing on only the CLOSE and OPEN data, you’re (mostly) on solid ground no matter your data provider.

If you’re using the HIGH and LOW on EOD, you must be careful. Always test the strategies by paper trading: Just on the quotes you actually see, or trade as small as you can for a period.


What does “garbage in, garbage out” mean in trading?

“Garbage in, garbage out” refers to the principle that the output of a system is only as good as the quality of the input. In trading, if you use poor-quality data for backtesting, the results and conclusions drawn from the test are likely to be inaccurate and unreliable.

How can I ensure good data for backtesting?

Good data is crucial in trading because the accuracy of your backtest results depends on the quality of the data you are testing on. To ensure good data for backtesting, invest in a reliable and clean data source. Spending money on a reputable data provider is a long-term investment that pays off by providing accurate information for testing trading strategies.

Is Yahoo! Finance a reliable data source for backtesting?

While Yahoo! Finance is a popular financial data platform, it may not always provide accurate intraday data, especially for the high and low prices. In the examples given, there were discrepancies in the low and high prices, highlighting the importance of verifying data from such sources.

How do discrepancies in high and low prices affect trading strategy results?

Discrepancies in high and low prices can significantly impact the results of a trading strategy, especially if the strategy relies on these prices for setting profit targets or identifying trends. It’s crucial to be cautious when using high and low prices in backtesting.

What is the impact of data differences on backtested strategies?

Data differences, especially in high and low prices, can lead to variations in backtested strategy results. The impact may make a theoretically good strategy less effective in real-world trading. It’s essential to validate strategies through paper trading and by using accurate data.

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

Monthly Trading Strategy Club

$42 Per Strategy


Login to Your Account

Signup Here
Lost Password