Most SEOs consider Google Search Console (GSC) their own source of truth and trust that the data is accurate. What if I told you that GSC doesn’t tell you all the keywords you’re getting traffic from? In fact, the tool doesn’t show a deadline for nearly half of the clicks.
These hidden term cases account for 46.08% of all clicks in our study. The study includes one month of data on 146,741 websites and nearly 9 billion total clicks.
Let’s dive in.
First, I want to thank Mauricio Fernandez from our backend team very much for helping me extract this data.
This is a scatter chart where each dot represents one of the 146,741 websites. Shows the missing click rate and overall site traffic.
As you can see, some sites don’t have terms with associated clicks and others have all their own data. Each site is different, and the amount of missing data varies across the dataset.
There are a couple of points I want to talk about because of their meaning. There is a site (1) with 100 million clicks that is missing 90.3% of the data. There is another site (2) with 63 million clicks that are missing terms for only 2.27% of their clicks. As you can see, the data varies a lot!
Another way to show the variation in missing click data is to examine the distribution of the amount of missing data in the dataset. There are many sites in every single bucket. You will have a hard time guessing how much data is missing from a site.
You see lots of sites in between and a big 95% -100% spike in missing clicks. Many sites are missing about half of their data, but a large number of sites are missing most of the data.
What I think might be interesting is to group sites based on the traffic they receive. In the box plot below, you’ll see that both low-traffic and high-traffic sites tend to lose more data. Sites in intermediate buckets tend to have less missing data.
Data generally improves with more traffic. But after around 10 million clicks, the data starts to deteriorate noticeably.
In case you are seeing box plots for the first time, here’s how you should read them:
The small lines on the edges represent the minimum and maximum values. And 50% of all values fall into the highlighted areas. The line in that area is the median value.
At this point, you might think we made a mistake with the data. That we have only totaled the 1,000 rows shown in the GSC interface that are exportable to get the data, which is why it is missing so much.
But is not so. We extracted this data via the API, which allows us to get all the data and still a lot of it is missing!
I know everyone’s primary concern will be the amount of missing data from their site, so I want to provide you with a way to check that. The easiest way to see how many clicks go to terms that Google doesn’t show you is to use the GSC connector in Google Data Studio.
I’ve created a Data Studio report that you can copy to check for missing data for your website. This uses data from the last 12 months. About half of the data is missing for my personal site at the time of writing.
Create your copy of the report and add your GSC data as a source. That’s how:
- At the top right, click on the three dots and then click “Make a copy”.
- In the “New Data Source” drop-down menu, select the GSC data source for the site you are interested in.
- If the site is not available, select “Create data source”. Search for “Search Console” and click on it.
- Click the GSC property you want to use> click “Site Impressions”> click “Web”. Then in the upper right corner, click “Connect”.
- In the upper right corner, click “Add to report”.
- Click “Copy Report”.
I’d like to have some self-reported user data for this. If you want to share, tweet your “Grand Total” numbers from no. 1 and from n. 2 a @patrickstox And @ahrefs. Or just PM me on Twitter and I’ll aggregate the self-reported data to share here at a later time. I suspect that most of the data reported by users corroborates the study data showing that the missing amount varies between sites.
Google provides some reasons for this discrepancy:
To protect user privacy, the performance report does not show all data. For example, we may not keep track of some queries that are made a very limited number of times or those that contain personal or sensitive information.
I don’t believe for a second that nearly half of the searches on all of these sites have been private. This leaves the reason why some of the queries are run a limited number of times, often called long tail keywords. Google may have underestimated it just a little. Either way, the missing 46.08% is much higher than I expected.
We know it 15% of all Google searches have never been seen before. I’m sure Google stores these queries. Otherwise, it won’t be able to process that statistic.
However, I assume the team behind GSC has limited resources and doesn’t care about storing or exposing any data. It’s just the amount of data that’s missing is surprising to me and it might come as a shock to you.
Final thoughts
You can understand the types of terms that drive traffic to a page by using the performance report in GSC or by checking the Organic keywords report in Ahrefs’ Site Explorer. Hidden data in GSC likely includes terms similar to the terms listed here.
For example, Google is missing data on 35% of clicks for our keyword research post. In the United States, there are 327 terms listed in GSC and 426 in Ahrefs.
In all, 178 of these are duplicated in the datasets, but this leaves a number of unique terms in each dataset. While we can’t say for sure what the missing terms are, they are likely similar to the terms included in these reports.
write me on Twitter if you have any questions.