Google doesn’t index all your content.
Do you know which pages on your blog are not indexed on Google?
Do you know which pages on your website are indexed but shouldn’t be?
In this post, we take you through the process of figuring out which posts are not indexed by Google. There is no simple way of doing this but, if you follow the process, you can do it. I’m geeking out a bit because it’s quite technical!
1. Check your sitemap
The sitemap tells Google what to index. Google may crawl through your site and find extra pages to index, but a good starting point is to find out what you are telling it to index.
If you’re using a WordPress tool like Yoast WordPress SEO, this will build the sitemap for you. You’ll probably have a sitemap for your pages and a sitemap for your posts.
The following example shows that we’re telling Google there are 216 posts to index.
Highlight all this information and copy it into a spreadsheet in Google. When you have it in a spreadsheet, remove any columns that are not the URLs of your posts.
Note: If all your blog content is in /blog, then it’s going to be easy to filter out just the blog posts. If your blog posts and pages are all in one directory then you’re better off comparing all pages/posts to see which ones are not indexed.
2. Check Google Webmaster Tools
In Google Webmaster Tools, it will show you how many of your posts were indexed by Google. I’ve never seen this at 100% but you do want to see the majority of your posts indexed.
This example shows a gap where there are a good proportion of posts that are not indexed:
3. Get Your Google Listings
Now you want to go to Google and find out what it has indexed. The ‘site’ command in Google can list out the posts that are indexed by Google. It’s not 100% accurate, so there could be other posts that are indexed that are not on that list, but it will be pretty close to the truth.
Before you run the site command, you need to change the settings in Google so it displays 100 results on a page at a time instead of 10. We are going to extract the contents of Google search and it’s easier to extract them in bigger batches rather than 10 at a time.
To temporarily change your results so it displays a listing of 100 web pages at a time, select the ‘settings’ option at the very bottom right of the Google.com page. Then, select the ‘search settings’.
On this screen, adjust the dial so that Google will display 100 results at a time instead of 10.
Next, you need to install a bookmarklet in your browser that will help you to extract just the page names from the results. Go to this page and drag the bookmarklet to your browser bar -> Here
In Google, type in site:”URL of your website”, without the quotation marks (e.g. site:www.razorsocial.com). This will show you up to 100 results of pages indexed on your site. When you have that listing, click on the bookmarklet you installed. This will extract only the web addresses from the Google listing!
When you get this listing, copy it to the same spreadsheet where you have the list of pages from the sitemap. Repeat the above until you have all your pages in the spreadsheet.
5. Compare the Results
You should now have two listings of web addresses in your spreadsheet. The first column is what you tell Google to index and the second is what Google has actually indexed!
Go through the list and pick out the posts that are in the sitemap but not the Google listings.
From this list, go to Google and search for these posts; even if they are not found using the ‘site’ command, they may still be indexed.
If you have a very long list of posts, you won’t be able to compare the list manually. Because of this, you’ll need to figure out a good Excel formula that extracts pages where there isn’t a direct match in both columns. If anyone wants to share in the comments how to do this, feel free.
What to do with Your Results
When you have a list of pages that are not indexed, there are a couple of things to consider:
a) If it is a poor quality post that is not offering any value then delete it.
b) If it’s a good post that should be indexed, then link to it from other posts on your site. This will help Google pick it up.
I had hoped there was an easier way to figure out what is not indexed by Google, but this was the simplest solution. If you know of another way, please share!
By going through the process above, you do learn more about your site and you’ll probably identify other issues that are worth considering.
After tidying up my sitemap, there are now only three posts not indexed!
I would love to hear your thoughts/comments below!