Here is the second in my series of blogs giving some quick tips on places to start looking at your SEO. Although these are quick to look into, they may lead to finding additional things on your site that need addressing, so the five minutes may lead to a longer investigation and or fix!
The second tip I have for you is to check your indexed pages. I can’t tell you how many SEO problems I’ve found over the years by just looking at the pages that are indexed.
So for those of you who are not sure how to start this amazing information seeking process, you need to enter site:yourdomain.co.uk into Google. It works on Bing as well, but in the UK, Google is the engine that really matters, so I’m concentrating on problem finding there.
What you will get is a list of pages on your site that Google has indexed. Note that for some of the below to work properly, you will need to be logged out from Google, or you will get personalised results.
Things to look for:
- The number of indexed pages
You should have at least a rough idea of the number of pages you think you have on your site. If this number differs wildly from that reported by Google then you have a problem. Under-indexation may indicate that Google doesn’t like parts of your site, but more commonly you will see a page count that is vastly higher than your expected number and this is often an indication of duplicate content.
- What appears on the first pages
Whilst it isn’t an exact science, the pages in a site: search appear roughly in order of importance to Google. Check the first page or two to make sure that the pages listed are ones that you would expect to be ranking well for your site. On the whole, expect to find your top level navigation and any popular pages (such as blogs) that have been extensively shared/linked to.
- What appears at the end
Page through to the very end of the results. You may see a message like this:
This means that you have some pages that Google considers a bit irrelevant/duplicate/low value. Do not panic if you see this message; it isn’t necessarily an indication of a major problem and is common on large sites. If you see this message you should click on the link and then page through again. What you want to look at on these last pages is what is being indexed. You will often find duplicate content using this method – common ones are print versions of pages, tag pages of blogs, search results pages being indexed, pages with query strings e.g. ?somevalue=somethingmeaningless, orphaned pages, pages with non-unique title tags, hacked pages, PDFs that shouldn’t be indexed or spider traps.
- Duplicate/unoptimised title tags and meta descriptions
For some sites, a site: search shows up fairly quickly that there are some issues you need to look at regarding the optimisation of title tags and meta descriptions, it will depend on how much on-page SEO you have already done how easy they are to spot with this method (and for larger sites, there are better options!)
- Disallowed pages
With Google’s new slightly clearer notification of pages in robots.txt , if you have any pages indexed that Google thinks are important then it will show the message ‘A description for this result is not available because of this site’s robots.txt – learn more’. Paging through a site search (at least for smaller sites) can show up these messages.
- Rich Snippets
If you utilise rich snippets or other Microformats on your site, these may show up in a site: search, as should things like review ratings and counts if set up properly. You may also see other information that Google pulls off the page, such as author photos or blog publish dates.
- Canonical issues
If you search for your site without the www, you may find issues with non-www pages being indexed, or even secure pages (https). You may also find development or test versions of your site being indexed. All of these are things I have found on live sites in the past.
I recommend that you do this search quite often, so that you become familiar with the way that your site appears and make it easier to identify any problems. Of course if you do find problems that you subsequently resolve, you’ll want to come back and make sure that Google is processing the changes. You may find that if you ban low value pages using noindex or robots.txt it may be a long time before Google removes them as the revisit frequency from Googlebot can be a long period. You might have to look into ways to highlight this content to get it visited.