On every website there are sections which you don’t want to be indexed – for example your admin page, or duplicate pages which are for consumers or marketing campaigns only. Some of these pages could cause you issues with how your site is being seen by Google and as such you want to make sure that Google can’t see them.
There are ways of making sure these pages aren’t indexed, but to do this accurately you need to make sure that you are aware of what Google can see and that you use all the tools available to help you protect your site.
Making sure you have a good robots.txt file for your site is an important element of ensuring you’re indexed and that Google can see the pages you want to see indexed.
Why use a robots.txt file?
Even if there is very little on your site which needs to be blocked, a robots.txt file should still be used. This file is also used to promote your sitemap.xml file with a line at the bottom as per:
Additionally to ensure Google doesn’t see the pages of your site which you don’t want indexed you can use this file to do this.
Blocking pages or folders on your site can easily be done by adding them to your robots.txt file so you can make sure your admin panel – or other pages – doesn’t end up in the search engine results.
If you have a development site (or another duplicate site such as a secure site with https) then a unique robots.txt file can be used to block the whole of the offending site and not create duplication in the results.
Why a robots.txt file isn’t enough
Often you will notice in the search engine results that a page you have blocked from view is still being indexed – despite being listed in your robots.txt file. This happens when a page has already been indexed. You will see these results as showing no meta description but instead the site will show a line which reads: ‘A description for this result is not available because of this site’s robots.txt – learn more’ as shown below
This means that you have successfully blocked the page but that Google has not removed it.
Using Google Webmaster Tools to block pages
The next step when this happens is to use Google Webmaster Tools to remove your page. Under the optimisation menu there is a URLs option where you can submit a request to remove a page from the search results.
If you request removal of a page you should have already marked the page as a being one you don’t want Google to see in your robots.txt file.
Generally when you submit one of these requests the page is removed within a couple of hours.
To ensure your pages aren’t showing up in the search results when you don’t want them to, make sure you are using both Google Webmaster Tools and your robots.txt file together. As a result, Google will only show the pages you want to be visible.