Canonicalization tells search engines that if I have multiple pages on my site that are duplicate, which version of the page I want Google to rank and appear within search results. Businesses sometimes have several versions of the same page on their site.
For example, if I’m an eCommerce shop that sells all kinds of drums, I might have a way that users can filter down to get the exact type of drums they are looking for. Further, they might drill down and find the perfect maple, snare drum with Remo head and silver rims, they want to purchase.
After drilling down, using filters to find the exact product, the URL might look like this:
If you cut off the tail end of the URL: ?color=maple&rims=silver&head=remo
Then you have the exact same page, which could pose an SEO red flag of having duplicate content.
Maybe you have an exclusive promotional button on your site (that has an analytics tracking parameter) and the customer clicks on it and it goes to: www.SlickDrums.com/snare-drum.html?utm_campaign=10
Again this could pose an issue of duplicate content.
Luckily, Google is quite smart at detecting which page is the version you want to show up with in search engines, but has there ever been a time when you had a URL show up for in search that wasn’t the version you intended to show up for. For example, maybe slickdrums.com/index.php showed up instead of the intended version, www.slickdrums.com.
To avoid all the hassle of having to fix these types of funky URLs that show up in search, you should implement a Rel=canonical tag on all pages which then points to the definitive version of the page. Google has some great pointers on Rel=canonical tags.
Basically, having a Rel=canonical tag on a page, suggests to Google which page they should be ranking, indexing, and passing link juice (which is factor in determining search engine rankings) to. One way to implement these rel=canonical tags is by automatically placing them in the meta data on the definite version of the page immediately after creation of the page. This way, down the road, if you have other versions of the exact same page, the developer or webmaster will always know which version of the page you want to be indexed.
Another important reason to have a rel=canonical on every page that points to the definitive version is that if you are gaining links to the different versions of the page, that the link juice will all point towards the definitive version.
By implementing these tags automatically when you first create the page, you help the webmaster know which page he/she needs to point the rel=canonical tag to without them having to run around trying to find this information. This could also potentially help you stave off people that are looking to scrape your content. Theoretically, these canonical tags will tell Google to rank your page and not the page that scraped your content. Search engines are pretty smart about knowing which page was the original creator of the page, but it is always best to be safe.
One concern people might have about putting a rel=canonical tag on a page pointing to itself, is they don’t want to confuse/manipulate/spam search engines. I confirmed with Matt Cutts, the head of Google’s Webspam team, that this is an okay process.
Bonus Tip: This one is from Dr. Pete at SEOmoz: make sure that all of your internal links are pointing to the same version of the URL that is your canonical URL. You don’t want to have internal links pointing to one version and your canonical tag pointing to a different URL. Make them consistent.
Have you ever had a situation where the unintended version of the page was showing up in search and you took actions to make sure the intended version was indexed? How did you do that?
(photo credit: http://www.flickr.com/photos/barjack/5991491036/)