Today, the major networks, Google, Yahoo and Live, have combined forces to approve a new specification to allow webmasters to identify the “master” or “canonical” version of a page.
Years ago, search engines used to boast about how many pages on the internet they had indexed. They put crawling technologies together that were designed to discover all the pages on the web. In fact, they are still at it, not many months ago I reported on how Google was crawling through forms and drop downs in a bid to discover additional pages.
This thirst for pages has resulted in the indexing of so many variants of webpages that the search engines had to come up with ways of organising that content; recognising when the same page is being returned by different URL strings. In general, this has been highly successful, but it has also been the downfall of some sites, where their pages are similar, but not quite identical, so they get caught in this duplication filter. This is the “duplicate content penalty” (which is a bit of a misnomer as described by Google here) however, it often feels to site owners like a penalty rather than a filter, which is why the term persists.
Yahoo’s Site Explorer has the best features allowing webmasters to remove parts of URL strings that are spurious. But the new meta tag allows this to be specified on the website as another way of pointing it out.
The Google blog entry has a few Q&As about some specific uses. However the blog comments indicate that there are still a few queries about how this will be detected and utilised, particularly in unusual cases.
The Live blog points out most accurately how to you can use this to resolve canonicalization caused by www and non-www versions and duplicate versions of your home page.
From an SEO agency point of view, we know that there are always a few oddball sites where we can’t remove every single instance of duplicate pages with different URLs. Sometimes these are a result of poor original site or CMS design and sometimes they are due to tracking issues created by marketing requirements.
But should this tag be used as a general rule? Well, all the engines say they are only using this as a “hint” to guide them, so if you get things horribly wrong (e.g. by “Saving as” from a previous page to create a new page and leaving the tag intact) then they should be able to sort it out. However, it should be used with caution, but is still a great tool. It should also be great for eliminating extraneous attribute information such as affiliate entry points or Google Analytics tracking codes that get crawled. And if you have got some pesky session IDs that you can’t eliminate, this should put paid to them for good.