The Marketer’s Guide to Google Index Issues & Statistics

6
531
An overview of indexation in Google search results.

A lot of SEO content is focused around ranking well in desired search results. That makes sense, as you can’t start to drive traffic to your site from organic search without ranking well in search results, and you can’t start to drive leads and sales for your business without driving organic traffic. If you’re a marketing executive, director, manager, etc. that last bit is likely what you are (and should be) primarily concerned with.

But that all skips a step.

In order to rank for desired terms, you first have to get into Google’s index. Once upon a time getting your site into the index was somewhat difficult. “Site submission” was a common service and there were a number of popular tools that helped you let Google know you have a website.

Google has gotten very good at finding new sites. If you’ve tweeted about your site or sent out a press release or virtually anyone has linked to your site Google probably knows your exist. But if you’ve tried to drive search traffic you’ve likely run into some variety of Google indexation issues as your site grows. For non-SEOs (and frequently for SEOs as well) a lot of these issues can be very confusing and frustrating. Virtually every time I walk through an SEO audit with a client there’s at least some confusion about indexation issues, duplicate content, the best way to remove pages from Google’s index, etc.

In this article I’ll try to help a marketing generalist (someone with a basic understanding of SEO who is responsible for driving more traffic to their website but may not be knee-deep in Screaming Frog crawls and link analysis on a day-to-day basis) understand:

  • How Google indexing works
  • How to interpret different index stats or “counts” from Google.com and your Google Search Console account
  • How to actually fix common indexation issues (such as not having pages indexed, or having pages you don’t want indexed leaking into the index)

Let’s start at the beginning.

How Does Google Indexing Work?

Google’s search engine is very complex. An in-depth look at how Google finds, stores, and prioritizes pages is well beyond the scope of this article.

At a high level Google is working to find (or crawl) as many useful pages as possible, store (or index) those pages to return for relevant searches, then is working to return the proper pages in a way that best satisfies a searcher’s search query (and maybe in a way that helps Alphabet’s bottom line as well, but that’s another discussion).

Again at a high (and oversimplified) level: you want to get good stuff (the pages on your site that are high quality, useful for searchers, and likely to drive desirable actions for your business) into the index and keep bad stuff (pages that are low value and/or are thin or duplicated that will hurt you more than help you in driving relevant traffic to your site) out of the index.

I’ll focus the rest of this article on analyzing what pages from your site are in the index as well as what you – as a marketer – can do to take control and better optimize what is and isn’t indexed, but there are a number of resources for learning more about how Google indexing works, including:

Understanding Index Stats

The index statistics you see about your own site are often pretty confusing. I’ll walk through the two most common ways you’ll typically see or be shown statistics about how many pages on your site are included in Google’s index.

Index Stats on Google.com

One of the most common ways to see how many and which of your pages are indexed in Google is to actually go to Google.com and type in site:yoursite.com. Let’s look at what Google is showing in the index for a site that I’ve written for in the past, SearchEngineLand.com:

Google index stats in search results

About 30,700 results – that’s quite a few! If this were your site and you were checking your index stats for the first time you might be either pretty excited (great lots of my posts are being indexed!) or pretty alarmed (wait a second I don’t have that many pages on my site!).

As you start to dig through the pages that are indexed and click through to additional pages of results (ten – or far fewer with ads and other featured Google content dominating a lot of SERPs – is the default of course, but as I’m frequently digging through SERPs I like to change my results per page to 100) something pretty odd happens. With my settings at 100 results per page when I scroll to the bottom of the search result for SEL’s site search I see 8 links:

A screenshot of pagination in Google search results

8 x 100 is clearly not quite “About 30,700.” Odd: I thought there were over 30,000 results. If I click the link to the 8th page of results things get even more confusing:

Screenshot of the last page of Google search results

And if I click the “repeat the search with the omitted results included” link and click back through to the last page I see something similar.

But Search Engine Land is an extremely trusted site that posts several new pieces of content a day and has for several years, so that can’t be all of the pages Google has indexed, right?

It’s definitely not. Google has actually been seen testing dropping this altogether and has explicitly said these numbers are not to be completely trusted for several years (that video is from 2010!). And this problem gets worse the larger your site gets:

So where else can we look to get indexation counts and to understand which pages on our site are indexed?

Index Stats in Google Search Console (Formerly Webmaster Tools)

Within your site’s Google Search Console account there is additional data about how your site is being indexed (if you don’t have a GSC account you can see how to set one up here).

Obviously we don’t have access to Search Engine Land’s GSC account here but let’s look at the account for a site that my company owns.

There’s an entire sub-section of Google Search Console that’s dedicated to Google Index data.

Index Status in Google Search Console

Index status in GSC

This data tends to be a lot more accurate, and gives you some trend data as well. But what if you come to this screen and see a number of indexed pages that seems way too low (you’d likely also see Search Analytics stats that are dramatically under-reported)?

A common issue has to do with how Google Search Console deals with sub-domains, www and non-www versions of your site, and http and https versions of your site.

If your Google Search Console numbers are extremely low, check the very specific URL associated with the site in the top navigation:

GSC website

If you have moved your site over to https or if you have a subdomain (eg http://info.measuredsem.com), you need to add those as separate sites. If you’ve switched from http://www.measuredsem.com to http://measuredsem.com or you support both you need to set your preferred domain within your account.

Additionally there’s another place you can get information about how your site is indexed within Google Search Console.

Sitemaps in Google Search Console

Within the sitemap section of Google Search Console you can submit an XML sitemap for your site, and then get a sense of how many of the pages you’ve submitted are actually in the index (as well as how that number changes over time):

Sitemaps and indexation in Google Search Console

The challenge here is that while you can look within your XML sitemap to see which pages you’ve submitted, you don’t necessarily have the level of detail you may want to answer specific questions (like whether large swaths of pages are or aren’t indexed).

Which leads us to the first of our actionable Google indexation tips.

5 Actionable Google Indexation Tips

So now you know a bit more about how Google’s index works and some of the tools you can use for a quick check as to how your site is doing within the Google index. What about actually solving specific index-related issues? Based on my work with clients and some research around the topic, here are the five biggest questions / issues I’ve found in relation to Google indexation:

1.      How to tell which specific pages are NOT indexed

Pages that are not in Google’s index won’t be showing up in search results, so one of the first things you may want to figure out is “which pages on my site aren’t indexed?” Unfortunately most of the methods listed above – while helpful for seeing which pages are indexed on your site – don’t do a great job of telling you which pages aren’t indexed. For a very small site this may be pretty easy to spot, but if you even have an active blog you likely have enough pages that “eyeballing” which pages are missing from your site operator search isn’t a reasonable option. I like to use two tools for this process:

Step One: Crawl Your Site with Screaming Frog

Screaming Frog is generally one of my most used apps through any SEO site audit process, and here it can give you a picture of what pages are present on your site:

Get a list of URLs from Screaming Frog

Screaming Frog is a super useful / powerful SEO tool, but for our purposes here we just want to crawl the site and filter for HTML pages, then filter that list for any of the pages on our site that we want to see in the index. We’ll get to dealing with pages we don’t want in a minute.

From there, I’ll use another tool that’s incredibly helpful in any technical SEO audit: URL Profiler.

URL Profiler is another extremely powerful SEO tool, but again here we’re going to use it for a pretty narrow purpose: finding out which of the URLs that are on our site (which we just exported from our crawl) are actually indexed:

Google index check with URL Profiler

If you have a larger site, this may require you to get some proxies to check indexation. If you’re not overly technical that can sound a bit intimidating, but it’s incredibly easy. It’ll take you a few minutes and requires no technical expertise whatsoever (beyond copy / paste skills).

You sometimes have to run these URLs through a couple of times and should leave some time for larger crawls, but what you’ll eventually end up with is a list of all of the pages that are not indexed on your site.

OK so now you know which pages aren’t indexed on your site, how do you go about getting those pages indexed?

2.      How to get something (your whole site, a new page, an existing page that’s not indexed) indexed

Getting a new site indexed used to be an industry to itself:

These days, if you have a legit website and business your home page and your overall domain should be indexed very quickly. You can do some simple things like send out a Tweet, get a link from another site, or just submit your URL to Google for free. Many sites with no content and no external links / tweets / etc. will get indexed without any effort (recently my company bought 50 domains and put up very simple place holder pages on each, and 28 of them were indexed before we did any kind of promotion at all).

If you have an existing site and have a set of pages you want to have indexed that aren’t currently (for instance if you went through the above process and now have a list of pages that are high value that you’d like searchers to be able to find that are not currently indexed) there are a few options at your disposal.

A.      Fetch & Submit to Index Via Google Search Console

If you have a small number of URLs you can submit them each to Google Console for indexation. This is a fairly simple process, you can start by logging into your Google Search Console account and looking at the left navigation under crawl and clicking into Fetch as Google:

Fetch as google in GSC

Next, you can enter the URL you want to submit, click fetch, and you’ll be given the option to request indexing:

Screenshot of requesting indexation in GSC

Finally you can submit either the URL itself or the URL and those linked to from the page. For our purposes since we have a specific set of URLs we’d like to see indexed (and because we have a limited number of submissions – 500 single URLs and only 10 multiple URL submissions – per month) we’ll submit the URL to the index:

GSC final indexing request screen

Lastly you should see that your indexing request went through:

GSC request indexing confirmation

 

If you’re working through a list of URLs, you can then give it a couple of days and run the same list through URL Profiler again and see how your efforts impacted indexation.

B.      Share Your URLs Socially

Another means of improving indexation on key pages is to share your content via social networks. If you have a segment of pages that are valuable to your audience generally, you may be able to find a way to share them socially so that they’re discovered by search engines (particularly if you have popular social accounts).

For instance if you have a swath of product pages on your site detailing specific features of your product, you could queue up 1 tweet a week / every couple of days to share a specific feature page “Did you know {product} could help with {thing feature helps with}? {link}”

C.      Fix the Underlying Issues!

If you work through the above steps and still have large volumes of pages that you want indexed not showing up in Google’s index, you likely have a foundational SEO issue with your site. You’ll want to investigate:

  • Link Equity – Do you have more pages on your site than the link equity (number and authority of links pointed to your site) can support? This may mean that deeper pages aren’t going to be crawled and indexed until you find ways to build links to your domains (and potentially find ways to get links and shares for your deeper pages).
  • Site Architecture – Your site’s information architecture is a topic that’s beyond the scope of this article, but you may have pages that are several clicks from your site’s home page and are difficult for search engines to reach. Again this is something to investigate (and/or potentially hire an experienced SEO to investigate).
  • Sitemap – Finally, if you haven’t already you may want to submit a dynamic XML sitemap to Google Search Console to help drive better indexation of your site.

3.      How to Keep Pages You Don’t Want Indexed Out of the Index

Another common issue for marketers is that you want to keep a specific page out of Google’s index. Maybe it’s a duplicate of an existing page, a very thin page that has some use to users but wouldn’t for searchers, or possibly it’s something with private information you don’t want in Google’s index.

Whatever the reason, there are a few core methods for keeping content out of Google’s index.

1.      Meta No Index Tag

In many instances the preferred method of keeping a page out of Google’s index is to add a Meta No Index tag, from Google’s documentation on the subject:

A meta no index tag example

The thing that is great about the no index tag is that it instructs Google to remove pages from the index, so if you have a swath of content that is already indexed, this is likely your preferred method of deindexing your content. The robots disallow directive will keep Google from crawling the page, but will not necessarily remove it from the index if it’s already there.

Conversely as Google engineer Gary Illyes points out:

Google needs to be able to crawl your page to remove it from the index via this method, so you have to have the page be accessible, and also may have to wait until its crawled (or Fetch as Google to request it be crawled / reconsidered).

2.      Robots.txt Disallow

If you have a section of your site that is new and hasn’t yet been indexed (like a staging site, a subdomain that is under construction but not ready for primetime, etc.) or resources that you don’t want Google to crawl, you can use the robots disallow directive.

Again adding this directive will not necessarily cause your content to be removed from the index if its already appearing there. In fact it can lead to a result that’s indexed and just has a suboptimal description.

An important warning with use of disallow is to be sure not to disallow more than you intended to – be careful not to block subsections of your site where there may be valuable content you want searchers to be able to access. You can test changes to your robots file within Google Search Console with the robot tester tool.

Removing URLs & Excluding URL Parameters via Search Console

Finally, if you have parameters that are being added by your site’s content management system (maybe because of filtered search results, pagination, or similar) that are being indexed that you’d like removed, you can also give Google more information about those parameters or request that specific URLs be removed by removing URLs temporarily:

Remove URLs in GSC

Or by identifying a specific parameter:

Exclude parameters in GSC

And then giving Google more information about it:

Add a parameter - additional info in GSC

Google’s John Mueller has said that this functions similarly to “permanently” noindexing content so this may be a good option for one-off URLs. Ideally, though, in most cases rather than leverage a temporary removal you’ll want to dig in and address the core issues. What is it about the technical structure of your site that’s creating the need for pages to be deindexed? Why are you (or are you) suffering from “index bloat” in the first place?

4.      What is “Index Bloat” and How Do I Fix It?

Index bloat is effectively when you have unnecessary pages indexed by Google that are not likely to drive relevant traffic to your site in response to users’ queries. This is an issue because it forces search engines to spend time crawling and indexing low value pages (which could use up your “crawl budget”), and may cause you to serve low-value pages in some search results (leading to a poor user experience and poor engagement metrics). Having a lot of thin / largely duplicated content that has terrible engagement metrics can cause your site to appear to be low quality in Google’s eyes.

In addition to using the tools and processes above to analyze which pages are and aren’t currently in the index, then using more of them to remove lower quality and lower value pages from the index, here are two great resources on the topic:

Here again an important note is not to “cut too deep.” Before you start to whack large sections of your site from Google’s index, look in Analytics (or grab the URLs and run them through URL Profiler) to make sure you’re not cutting off traffic and leads / sales from these pages.

5.      What Tools Can Help with Monitoring Indexation (ie what are the best “Google Index Checkers?”)

As I’ve mentioned ad nauseum here my personal preference is to leverage URL Profiler as a Google Index Checker, but here are some additional options:

BONUS: Mobile Indexation Resources

Mobile & app indexation specifically can be slightly different beasts than traditional indexation, so if you’re experiencing issues there here are some additional mobile-focused indexation resources:

What did we miss? What other Google indexation issues have you seen / what tips can you share?

6 COMMENTS

  1. What if inside the wallet was a “note to self” that read: “put mom-and-pop store on the corner out of business so that a Jaguar dealership can be built and I can continue to rake in the millions I do every ye8a1#r22&;?I….I think I take the money, and then plant the wallet at the scene of some embarassing sexual crime.And I think I don’t feel like a schuck at all for doing it.So if you do find a wallet, I suggest Googling the hell out of the person who owns it :}Ah, moral relativism….I love thee only on the internets.

  2. Hey! This is my 1st comment here so I just wanted to give a quick shout out and say I truly enjoy reading through your articles. Can you suggest any other blogs/websites/forums that go over the same topics? Thank you so much!

  3. This blog is definitely rather handy since I’m at the moment creating an internet floral website – although I am only starting out therefore it’s really fairly small, nothing like this site. Can link to a few of the posts here as they are quite. Thanks much. Zoey Olsen

  4. Hey! I simply wish to give a huge thumbs up for the nice information you have got here on this post. I will likely be coming back to your blog for more soon.

LEAVE A REPLY

Please enter your comment!
Please enter your name here