Stephan Spencer's Scatterings

The Scattered Wisdom of a scientist turned web marketing virtuoso

September 2010
S M T W T F S
 << <   > >>
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30    

Canonical Tag Not Yet Reliable

I'm a big fan of the new canonical tag (er, element, to be more technically correct). It's a powerful tool for dealing with duplicate content. But it's not exactly reliable yet. Google wants us use it as if it were. Unquestionably, it's a signal. But it can be ignored, even when it should clearly by obeyed.

Case in point: Northernsafety.com. Many thousands of non-canonical URLs are indexed. For example click on some of the listings on
this SERP and compare the URLs you were led to by Google to what's listed as the canonical URL in the HTML source of these pages. You'll see that the parameters OPC and PFM are present in the URLs in the search listings but are not present in the canonical link element. Hmmm.

I know Google uses the element as a strong hint rather than an absolute directive, however it sounded like from Matt's video that it's about as strong a hint as a 301 redirect. If that were the case, I wouldn't have expected to see this behavior. This example I found doesn't look to me to be an "edge case," and I don't see any reason why Google shouldn't trust or adhere to the canonical tag in this particular situation. So what gives?

If you're thinking that perhaps the canonical tags were just added and didn't have time to kick in yet, take a look at the Cached links on some of those search listings. Some of these pages were cached way back in March and yet still have the canonical tag present in the Cached version. Certainly 2+ months is ample time for Google to canonicalize these pages??

I like canonical tags and I use them. But I always prefer 301 redirects over canonical tags, as 301s are pretty much *always* obeyed.

The lesson here: I wouldn't bet my business on the canonical tag being obeyed by Google.

Posted by Stephan Spencer on 05/19/2009 | Permalink

Comments (11)| Comments RSS | Filed under: Search Engines            

11 comments

  1. Yea 301 is still the best in the fight against duplicates.

    Comment by Maxcy [Visitor] · http://www.indenty.nl — 05/20/09 @ 04:39


  2. Stephan have you seen an example of a URL using the canonical tag that has been removed from the index in the same way that a 301'd URL would be?

    Comment by Andrew Shotland [Visitor] · http://www.localseoguide.com — 05/20/09 @ 22:16


  3. Hey Stephan,

    I have noticed the same thing even on very trusted sites / places where the canonical tag should be trusted - I see the top search result for podcasting:
    http://www.google.co.uk/search?q=podcasting

    being the wikipedia page which is not the canonical version according to the canonical tag (the 'podcast' page is).

    I discovered this while researching my SMX London "give it up" presentation - it's very interesting to see them talking as though it is implemented when it very clearly appears not to be.

    Comment by Will Critchlow [Visitor] Email · http://www.distilled.co.uk/blog/ — 05/21/09 @ 03:25


  4. One of the problems with using SEO commands in Google (such as site: and inurl:) together is that you don't often get a real representation of what is showing up in the SERPs, Stephan. In this case, a search for "electrical arc protective clothing" http://www.google.com/search?q=electrical+arc+protective+clothing&hl=en&rlz=1T4GGLL_en&start=10&sa=N shows that the top northernsafety.com page is the canonical-defined version. what Google shows with these hacks is useful information regarding what they still have in their indexes, but an actual search without SEO commands seems to deliver the right url, no?

    Comment by chris boggs [Visitor] · http://www.rosetta.com — 05/21/09 @ 10:54


  5. @Chris - my wikipedia example shows it happening 'in the wild' as it were...

    Comment by Will Critchlow [Visitor] Email · http://www.distilled.co.uk/blog — 05/21/09 @ 14:35


  6. I feel the same as Chris. I think true unreliability would be a URL showing up for a real query when it has a canonical href pointing elsewhere. If/when that happens, I'll be worried.

    Having them show up in site: queries isn't particularly comforting, but for me it doesn't count them out yet. Good post, as always.

    Comment by Erik Dafforn [Visitor] Email · http://www.google.com/profiles/edafforn — 05/21/09 @ 19:19


  7. @Will,

    I appreciate your example. From what I can see, the cached version of /wiki/Podcasting (http://72.14.205.104/search?q=cache%3Ahttp%3A%2F%2Fen.wikipedia.org%2Fwiki%2FPodcasting&strip=1) from May 15 does not have the canonical element inside it; but notice that while my URL asks for the cache of /wiki/Podcasting, the Google box at the top of the page says it's giving the cache of /wiki/Podcast.

    But if you click through to the cached version of /wiki/Podcasting from the SERP you gave, the cache *does* show the canonical element.

    That, and Wiki's own weird internal redirection (e.g., when you're on /wiki/Podcasting it says you've been "redirected from Podcasting" -- meaning you're seeing /wiki/Podcast content on the /wiki/Podcasting URL) makes me think there are more variables at play here than just the canonical element's failure, although I'm not sure exactly what those variables are.

    Comment by Erik Dafforn [Visitor] Email · http://www.google.com/profiles/edafforn — 05/21/09 @ 23:00


  8. Its interesting to see others perceptions via the post here, certainly we are running some tests on a live site at present, and I will be happy to share any learnings that come about as a result.

    As regards the canonical 'tag', where it differs is that it isnt the 301 redirect - which by its nature is a redirect and thus means users have to adhere. That isnt always realistic in commercial spheres - particularly with large organisations (no matter how hard you try), and the use of a page level element which does not impact on user experience gives us a potentially useful tool to use (obviously if it works as Matt Cutts mentioned).

    Given the fact it has been agreed to by the big 3, one can't help but think its only a matter of time before things correctly apply themself - surely !!!

    Comment by Peter Young [Visitor] · http://holisticsearch.co.uk — 05/22/09 @ 02:48


  9. @Erik,

    There's a simple explanation as to why you can't see a canonical tag in the cache URL that you supplied. Your URL includes &strip=1 for the Text Only version of the cache. To make the page text-only, Google strips all <link> tags from the HTML as well as <img> tags.

    So Will's example of nonfunctioning canonical tags inside Wikipedia still stands as valid.

    Comment by Stephan Spencer [Member] Email — 06/02/09 @ 01:52


  10. Thanks Stephan, that makes sense. I still think there's something funky going on with Wikipedia's "predirects", but what does concern me is the query for [northern safety], which should be a pretty cut-and-dried usage of the canonical URL, but isn't. (I'm showing the root in the SERP)


    UPDATE: I've been running the [northern safety] query on and off today, and I've just now seen /index.cfm show up in the SERP for the first time today. Before that it was /. So I'm not sure what that means either.

    Regardless of the outcome, I appreciate you starting and hosting this conversation.

    Comment by Erik Dafforn [Visitor] · http://www.google.com/profiles/edafforn — 06/02/09 @ 13:09


  11. As always, Stephan's post is right on.

    I made a comment over on SEOmoz's post (tinyurl. com/l67rdf) which contained the top 5 SEO requests to dev teams.

    Every SEO keeps praising the canonical tag. It works, but not as well as it could.

    We have 37+ market based subdomains. Each market creates their own unique content but we also produce some "national" content that is available for them to use if they choose.

    We use the canonical tag on national content to specify a market that should be seen as the content originator. This is an amazing resolution to a potential duplicate content issue right? Well not exactly, since Google only uses the tag as a suggestion, it tends to be hit or miss. After much tracking and evaluation, Google still seems to rely more heavily on internal linking.

    Fortunately, 99% of our URLs are canonicalized already and we don’t have to worry about tracking tags, or other DUST issues. I would imagine that the canonical tag works better for these situations.

    So it’s cool, but not that cool.

    Comment by Stroseo [Visitor] Email · http://stroseo.com — 07/23/09 @ 15:15


This post has 1 feedback awaiting moderation...

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)