How to Find Out Which of Your Blog Posts are Not Indexed by Google

blog-post-indexingGoogle doesn’t index all your content.

Do you know which pages on your blog are not indexed on Google?

Do you know which pages on your website are indexed but shouldn’t be?

In this post, we take you through the process of figuring out which posts are not indexed by Google.   There is no simple way of doing this but, if you follow the process, you can do it.  I’m geeking out a bit because it’s quite technical!

 

Geek2

 

1.  Check your sitemap

The sitemap tells Google what to index.  Google may crawl through your site and find extra pages to index, but a good starting point is to find out what you are telling it to index.

If you’re using a WordPress tool like Yoast WordPress SEO, this will build the sitemap for you.  You’ll probably have a sitemap for your pages and a sitemap for your posts.

The following example shows that we’re telling Google there are 216 posts to index.

 

blog post indexing

This shows the blog post you are telling Google to index

 

Highlight all this information and copy it into a spreadsheet in Google. When you have it in a spreadsheet, remove any columns that are not the URLs of your posts.

Note:  If all your blog content is in /blog, then it’s going to be easy to filter out just the blog posts.  If your blog posts and pages are all in one directory then you’re better off comparing all pages/posts to see which ones are not indexed.

 2.  Check Google Webmaster Tools

In Google Webmaster Tools, it will show you how many of your posts were indexed by Google.  I’ve never seen this at 100% but you do want to see the majority of your posts indexed.

This example shows a gap where there are a good proportion of posts that are not indexed:

 

google webmaster sitemap

This shows the number of posts submitted in the sitemap and the ones actually indexed

3.  Get Your Google Listings

Now you want to go to Google and find out what it has indexed.  The ‘site’ command in Google can list out the posts that are indexed by Google.  It’s not 100% accurate, so there could be other posts that are indexed that are not on that list, but it will be pretty close to the truth.

Before you run the site command, you need to change the settings in Google so it displays 100 results on a page at a time instead of 10.  We are going to extract the contents of Google search and it’s easier to extract them in bigger batches rather than 10 at a time.

To temporarily change your results so it displays a listing of 100 web pages at a time, select the ‘settings’ option at the very bottom right of the Google.com page.  Then, select the ‘search settings’.

 

Google search paramaters

Adjust the search settings

 

On this screen, adjust the dial so that Google will display 100 results at a time instead of 10.

 

Google number of search results

Adjust this setting to 100

Next, you need to install a bookmarklet in your browser that will help you to extract just the page names from the results.  Go to this page and drag the bookmarklet to your browser bar ->  Here

In Google, type in site:”URL of your website”, without the quotation marks (e.g. site:www.razorsocial.com).  This will show you up to 100 results of pages indexed on your site.  When you have that listing, click on the bookmarklet you installed.  This will extract only the web addresses from the Google listing!

 

Google web address only

This will extract just the web address

 

When you get this listing, copy it to the same spreadsheet where you have the list of pages from the sitemap.  Repeat the above until you have all your pages in the spreadsheet.

5.  Compare the Results

You should now have two listings of web addresses in your spreadsheet.  The first column is what you tell Google to index and the second is what Google has actually indexed!

Go through the list and pick out the posts that are in the sitemap but not the Google listings.

From this list, go to Google and search for these posts; even if they are not found using the ‘site’ command, they may still be indexed.

If you have a very long list of posts, you won’t be able to compare the list manually.  Because of this, you’ll need to figure out a good Excel formula that extracts pages where there isn’t a direct match in both columns.  If anyone wants to share in the comments how to do this, feel free.

What to do with Your Results

When you have a list of pages that are not indexed, there are a couple of things to consider:

a) If it is a poor quality post that is not offering any value then delete it.

b) If it’s a good post that should be indexed, then link to it from other posts on your site.  This will help Google pick it up.

Summary

I had hoped there was an easier way to figure out what is not indexed by Google, but this was the simplest solution. If you know of another way, please share!

By going through the process above, you do learn more about your site and you’ll probably identify other issues that are worth considering.

After tidying up my sitemap, there are now only three posts not indexed!

post sitemap after

Very close to 100% indexing

 

 

I would love to hear your thoughts/comments below!

  • Poo@beautyandmakeupmatters

    Great post Ian.and thanks for all the details n pictorials.. Truly helpful.. I will plan to clean up sitemap this way !”

    Kudos to u for ur effort n this post .. Have a great week ahead :-)

    • http://www.razorsocial.com/ Ian Cleary

      Thank you so much, glad it was useful!

  • Brendan McCoy

    Great post Ian!
    Can you check the Bookmarklets link?
    I’m not sure which is the correct tool on the OnlineSales site.

    • http://www.razorsocial.com/ Ian Cleary

      Hi Brendan, you drag the button to your browser and then you can click on it to see the simple results (the button says ‘simple google results). I hope all is great Brendan!.

  • http://90DayEntrepreneur.com/ Brandon Schaefer

    Nice insider info… going to give it a try later today!

    • http://www.razorsocial.com/ Ian Cleary

      Thanks Brandon!

  • http://nickyjameson.com/ Nicky Jameson

    Thanks for the useful article. FYI – it is Labour Day in Canada as well.

    • http://www.razorsocial.com/ Ian Cleary

      Thanks Nicky, happy Labour day!

  • http://term.li/1eFEesN Puru @ Terminusapp.com

    That’s a very useful bookmarklet. I’ve used a similar script in the past, but this looks much better.

    It’s very useful in many cases. e.g. if you are interested in finding your potential customers, search for “your industry keyword” and put all those results in a spreadsheet. This can then be sent to a VA for pruning with clear instructions on what you are looking for. Same for finding blogs, guest post opportunities, etc.

    • http://www.razorsocial.com/ Ian Cleary

      Thanks Puru!

  • http://inforisticblog.com Toby XtremelyFavored Allen

    I really got so much information from this article. Thanks Ian

    • http://www.razorsocial.com/ Ian Cleary

      Delighted to hear that Toby!

  • http://www.selectstrategies.com Emer O’Donnell

    Cheers Ian – extremely useful post

    • http://www.razorsocial.com/ Ian Cleary

      Thanks Emer, I hope all is great!

  • http://www.buzzintown.com/ buzzintown

    Hi Ian,

    I tried changing Google search result settings but i could not. The dial could not be moved so that Google will display 100 results at a time. I was signed in in Google.

    • http://frantic-naturalist.com/ Vernon Swanepoel

      You need to turn off “Google Instant predictions”

  • Praveen Prabhakaran

    Very informative article Ian. Actually, I had this issue with pages indexed and submitted mismatch and wondered what was I doing wrong. Your article cleared my doubt that 100% pages don’t get indexed first time and has to be done manually. Thanks for explaining how to get it done. Shared and Bookmarked!

    • http://www.razorsocial.com/ Ian Cleary

      Thank you so much Praveen, glad it was helpful

  • http://frantic-naturalist.com/ Vernon Swanepoel

    Nice post, Ian. I learned a lot with the Google excercise. I’ve never really looked beyond the first 10 pages returned and doing so taught me a lot. I can see a few things that need a bit of working on.

    One key thing I discovered that Google is pointing to the www version of half of my pages (despite a rel canonical telling it not to). So I did a rewrite in my .htaccess. (maybe showing my ignorance, but I’m not a real web developer).

    The point being, just looking at the first 100 pages google gives you back for your site is a useful exercise in more ways than one.

    • http://www.razorsocial.com/ Ian Cleary

      Thank you Vernon, too often we don’t analyze our own results and there’s always room for some tidying up!!

  • user256

    Is it not easier to just use scrapebox?

    • http://www.razorsocial.com/ Ian Cleary

      Yes you could use this but it’s a paid for tool.

      • user256

        Well fair point I appreciate that it’s a paid tool but so is screaming frog, so is your preferred backlink monitoring tool and like these they’re pretty much essential. Plus if you use the BHW discount at http://www.scrapebox.com/bhw you can get a lifetime license for less than £50 and imo if you’re serious about seo you’ll use it constantly, PR checks, keyword research, link building….. anyway just my two cents.

        • http://www.razorsocial.com/ Ian Cleary

          Great point, thank you!

  • Marie McCooey

    Thanks Ian for a great article filled with useful information!

    • http://www.razorsocial.com/ Ian Cleary

      Thanks Marie!

  • Jill Holtz

    Ian, very useful as usual. Can I ask where you wrote “link to it from other posts on your site. This will help Google pick it up.” – if you’ve already done this and Google hasn’t indexed it is there anything else you can do to make sure it is indexed? As that seemed to be what you were saying would solve the lack of indexing?

    • http://www.razorsocial.com/ Ian Cleary

      Hi Jill, if Google is picking up the first article it should follow the links to index the second article. Try linking it to it from another article. Or, write a guest post on a site that’s higher in authority and link to it from there. Finally make sure that it’s mentioned in the sitemap! Ian

  • http://www.orglamix.com Cheri @Orglamix

    Excellent post. I was looking for a solution to find out which of my post were published. This did it. Thx.

    • http://www.razorsocial.com/ Ian Cleary

      Thank you!

  • rainer

    thx ian, great article. I am going to translate it in german and post it in a new wp-blog for Q&A.

    • http://www.razorsocial.com/ Ian Cleary

      Thanks!