Append Tracking Information Without Creating Duplicate Content
I mentioned towards the end of my Search Engine Land article about redirects how you can use the hash or pound symbol (#) in a URL to append tracking information.
Why do this? Because it would prevent duplicate content (ie. the same page at multiple URLs that look unique to the engines), and it would aggregate all link juice to the one canonical URL.
The # in an URL is usually used for sending visitors to an anchor within the page they are on (e.g. “Jump to top of page” or “Jump to Table of Contents”).
Appending tracking information to URLs with a # works from an SEO perspective because search engines ignore the # and everything after it. This effectively collapses the tracked URLs together.
Let’s take a look at a concrete example to see how this plays out. Imagine you linked to your “About Us” page from your blog and that link pointed to:
www.mythicalcompany.com/aboutus.php#blog
and from your site-wide footer on your ecommerce site you linked to:
www.mythicalcompany.com/aboutus.php#footer
Both URLs would be interpreted by Google and the other engines as:
www.mythicalcompany.com/aboutus.php
Yet the full URL (www.mythicalcompany.com/aboutus.php#footer) is available to any client-side JavaScripts. So you could write a script that would pull what’s after the # and insert it into a cookie or otherwise send it to your server and/or web analytics.
Note that the full URL will NOT show up in your log files, because web browsers only use what’s after the # to jump to the anchor within the page, and that’s done locally within the browser. In other words the browser doesn’t send the full URL, so the anchor information (i.e. any text after the #) is not stored within environment variables like REQUEST_URI. Thus you can not use a hash for passing parameters in your URL for use by your PHP (or ASP or whatever) scripts (at least not directly).
If you have a stats package that uses log file analysis, hash-containing URLs won’t pass the anchor to your server logs. A workaround is to write and then include a client-side script that sends a ping via a URL with the necessary tracking appended via a query string. That ping URL would have the info appended but any content returned from that URL would be ignored by your script. That way the stats package can pick up the tracking info from query string parameters as normal — but through the second URL requested by your script, not the first one originally requested by the web browser. Make sense?
Arrrgh… Google Still Isn’t Recognizing Underscores as Word Separators in URLs
Although it isn’t a primary “signal” like the title tag or anchor text, keywords in your URLs can help with your Google rankings. But ONLY if Google can see the actual words in the URL. Turns out that separating the words in a URL with hyphens allowed Google to see the individual words, but using underscores did not. And this, unfortunately, continues to be the case today.
Not quite two years ago at WordCamp, Matt Cutts made the following statement that Google was imminently going to be treating underscores as word separators:
The interesting thing is we used to treat underscores as if they were like word A underscore word B, we would glom that together and we would index that as A underscore B, so if you just searched for the word A, we wouldn’t return your post. Ah… We’re in the process of changing that. We might have already changed that. So dashes and underscores are almost exactly the same.
You can hear the above statement for yourself in this video of Matt’s talk, at around the 17 minute mark.
I excitedly wrote about it in a post for the News.com Blog, since historically keywords separated by underscores didn’t look like separate words to Google, and this would save a lot of folks a lot of time if they were embarking on a URL rewriting project to fix their underscore problem.
Unfortunately I jumped the gun a bit, because Google still has not made the switch to recognizing underscores as word separators like they do with hyphens.
Your next question might be “But are you sure??” Yup. When I spoke to Matt in February at SMX West, he confirmed that underscores were NOT treated as word separators. According to Matt, this change is still in their queue but unlikely to happen before summer. My interpretation: don’t hold your breath, it’s between summer and never.
Why didn’t they roll out that change? Certainly it’s clear it’s not a priority. Google engineers are focused on improving relevancy and improving the searcher’s user experience. I would guess that this particular tweak to their algorithm isn’t going to do much for their users.
So, in your URLs, keep favoring hyphens over underscores for the foreseeable future.
And here’s one gotcha to be aware of: don’t use an underscore to separate a lookup ID from hyphenated keywords. For example, a URL like http://www.example.com/1234_nike-pegasus-running-shoes.html may at first glance appear to be search engine optimal, but the keyword “nike” is not visible to Google as a separate word. The keyword is actually understood by Google to be “1234_nike”, not “nike”.
By the way, although I favor the hyphen, there are other word separators accepted by Google, such as the dot (.), the plus sign (+), and the “escaped” space character (%20).




