Stephan Spencer's Scatterings

The Scattered Wisdom of a scientist turned web marketing virtuoso

November 2008
S M T W T F S
 << <   > >>
            1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30            

The Problem with Embedding Tracking Codes in your URLs

The problem with embedding a tracking code into URLs to track referrals from particular marketing campaigns or from particular partners is that inevitably those URLs end up in other places, such as in the search engines. Thus your referral numbers become overinflated.

Case in point: Google's "Inside AdSense" Blog. A couple days ago I searched Google for [inside adsense] and was surprised to find that the #1 result was not http://adsense.blogspot.com. It was the URL with a utm_source and some other stuff appended at the end of the URL (i.e. the URL was something like http://adsense.blogspot.com/? utm_source=aso&utm_campaign=ww-en_US-et-asfe&medium=et). Unfortunately I didn't record the exact URL at the time, and today Google is back to returning what it should be returning for the top result: http://adsense.blogspot.com (without any utm_source or query string). I bet the Analytics folks at Google will be scratching their heads at the spike in popularity of the "ASO" (or whatever it was) referral source when they look back at the month of May (unless of course they've read this blog post!).

Example #2: CBS News. Check this out... Run the query [site:www.cbsnews.com inurl:source=rss] on Google. Google returns 27,900 pages. You'll see that all of those pages have a source=RSS in the URL. Even though I don't believe Google's numbers of results to be even remotely accurate, still there are a heck of a lot of pages there, and those pages are bringing in some amount of traffic from Google searchers. When they do, the referral source is being wrongly attributed to the site's RSS feed. I wonder if CBS News realize this? Probably not.

So, if you must use the URL's query string to track your referral sources, then at least make sure that you aren't ever serving those links to search engine spiders. Drop the referral source from all links when spiders come to visit. Don't worry; the search engines say this sort of "cloaking" is totally okay.

That will ensure your own site isn't providing source coded links for the spiders to explore. But what to do about other sites that are linking to you? I suggest that you 301 redirect all traffic to URLs with tracking codes to the corresponding URL without the tracking code. You should see that your source coded pages in the search engines' indices should drop away to nothing over time (or at least get relegated to "supplemental hell").

Posted by Stephan Spencer on 05/08/2006 | Permalink

Comments (3)| Comments RSS | Filed under: Search Engines, Web Analytics , , , ,            

What should be your corporate blog's URL?

A reader emailed me with the following question:

I was wondering if you have a POV, on if a blog should live on a corporate domain name (ex. company.com) or if it would be better to have the domain name be different from the corp. (ex. companyblog.com)?

That's a great question.

My answer is this: if the blog will get more links by being at an arm's length from the corporate site, then I'd have it on a totally separate domain.

Let me supply a hypothetical example... If a life insurance company has a blog about health and wellness and it's at www.stayinghealthy.com, that will garner many more links than one at blog.lifeinsuranceco.com, IMHO.

This may seem like an oversimplification, since I haven't discussed the branding implications, but I believe the "link-ability" of the blog is what will give the blog a long productive life in the blogosphere. Anything else is peripheral.

Posted by Stephan Spencer on 03/19/2006 | Permalink

Comments (5)| Comments RSS | Filed under: Search Engines, Blogging ,            

Are you a member of the Invisible Web Club?

Despite the increasing use of search engine friendly URLs, custom meta tags and cleaner navigation, there's a lot of web content out there that is inaccessible to the search engines.

Perhaps it is in the "too hard basket" for many designers. In which case, perhaps you need a new designer! After all (repeat after me) You Are the Customer.

There's a nice article from SearchEnginePosition.com that discusses the topic: The Invisible Web Still Exists

Too many websites are not search engine friendly or are not properly optimized; in other words, are active members of the Invisible Web Club.

How do you get out of this club? I'm glad you asked:

  1. Rewrite "dynamic" URLs to remove question marks, ampersands, and equals signs from them
  2. Remove frames if you have them
  3. Don't have links that rely on JavaScript to function
  4. Don't embed navigation elements in Flash or Java
  5. Provide alternate link-based navigation to content that is behind search boxes or fill-in forms (that includes JavaScripted pulldown lists!)
Posted by Stephan Spencer on 02/08/2006 | Permalink

Comments (3)| Comments RSS | Filed under: Usability, Search Engines , , ,            

Blog SEO Tip #2: Your URLs

Dynamic URLs can impede the search engine spiders from fully spidering and indexing your blog. Err on the side of caution and use "rewritten" URLs. The excellent (and free!) blogging software WordPress supports URL rewriting, so you can have nice, search engine friendly URLs. Better still, the WordPress URLs contain hyphens rather than underscores (like TypePad uses), since underscores are not considered to be word separators by Google.

If you ever switch blog platforms, it's imperative that the old permalink URLs still work. That's because there will be numerous deep links into specific post pages in your blog from other bloggers, and that provides your blog with that all-important "link gain" (e.g. Google's PageRank). You wouldn't want to lose that!

Recently we assisted BusinessBlogConsulting.com with the conversion from TypePad to WordPress 2.0. As part of the migration, we ensured that the new WordPress permalink URLs were consistent with the old TypePad permalink URLs. That means that the old posts still have underscores in them. However, for new posts, the permalink URLs will contain full words and hyphens not underscores.

If for some reason you have to change the URLs, then at least redirect the old URLs to the new ones, and make sure you do it as a permanent (301 style) redirect. That way the link gain passes on to the new URL.

Also make sure you 301 redirect requests for pages from your domain without the www (e.g. http://businessblogconsulting.com/category/adverblogs/) to the corresponding page on your www URL (e.g. http://www.businessblogconsulting.com/category/adverblogs/). This will eliminate duplicate pages in the search engine indices and consolidate link gain. Otherwise when people link to http://businessblogconsulting.com without the www it creates another site for the search engines to visit and explore.

Posted by Stephan Spencer on 01/10/2006 | Permalink

Comments (0)| Comments RSS | Filed under: Search Engines, Blogging , , ,            

Googlebot, parameters and dynamic sites

I previously mentioned that Matt Cutts from Google gave some advice to webmasters of dynamic (database driven) web sites.

For one thing, Matt advised that if you have a dynamic web site, you should minimize the number of parameters in the URL. You’re very safe if you have fewer than 2 parameters. Keep the values of those parameters to fewer than 5 digits. And don’t name a parameter id. That's because Google is suspicious of that parameter being a session ID or something other than a key field. Even if it's the only parameter in your URLs, try not to use it. Particularly if that variable's value is long (like 5 digits or more). sid would be a bad choice too because it could stand for session ID as much as it could stand for a key field like story ID. It doesn't mean that your pages won't be indexed if you use this parameter name; it just means those pages would be at a greater risk of not being included. You should be fine though if your pages are all already in Google.

Matt also mentioned something that should be a bit alarming to anyone with a dynamic site. Googlebot sometimes tries variations of URLs by dropping parameters. Meaning that Googlebot may experiment with removing name-value pairs from the query string portion of your URLs (i.e. the part of the URL that follows after the question mark) and seeing if the pages still load. I understand the reason for this to be that if these variant pages still show the same content as the page at the original URL, it gives Googlebot an indication that the omitted parameters are superfluous in the query string. So for example, a URL such as this:

www.bigyellow.com/cgi-bin/php/cities/unitedstates/mtg_detail.php?
xsrc=&PID=36575&S=NY&T=&MTG=PR

might be shortened by Googlebot to:

www.bigyellow.com/cgi-bin/php/cities/unitedstates/mtg_detail.php?
xsrc=&PID=36575&S=NY&T=

and

www.bigyellow.com/cgi-bin/php/cities/unitedstates/mtg_detail.php?
xsrc=&PID=36575

and

www.bigyellow.com/cgi-bin/php/cities/unitedstates/mtg_detail.php?
S=NY&T=&MTG=PR

etc.

Then these URL variations would get spidered and compared with each other. I've heard of big websites getting hit by this and it causing big problems for the website in question. Don't get all worried about Googlebot doing this to your site if you're a not a big and important site. Matt stated that Google only does this deep level analysis on big, quality sites. Anyone been subjected to this? And if so, what damage or inconvenience did it inflict on you?

Posted by Stephan Spencer on 08/27/2005 | Permalink

Comments (0)| Comments RSS | Filed under: Search Engines , , , , , , ,            

Spiders like Googlebot choke on Session IDs

Many ecommerce sites have session IDs or user IDs in the URL of their pages. This tends to cause either the pages to not get indexed by search engines like Google, or to cause the pages to get included many times over and over, clogging up the index with duplicates (this phenonemon is called a "spider trap"). Furthermore, having all these duplicates in the index causes the site's importance score, known as PageRank, to be spread out across all these duplicates (this phenonemon is called "PageRank dilution").

Ironically, Googlebot regularly gets caught in a spider trap while spidering one of its own sites - the Google Store (where they sell branded caps, shirts, umbrellas, etc.). The URLs of the store are not very search engine friendly: they and are overly complex, and include session IDs. This has resulted in 3,440 duplicate copies of the Accessories page and 3,420 copies of the Office page, for example.

If you have a dynamic, database-driven website and you want to avoid your own site becoming a spider trap, you'll need to keep your URLs simple. Try to avoid having any ?, &, or = characters in the URLs. And try to keep the number of "parameters" to a minimum. With URLs and search engine friendliness, less is more.

Posted by Stephan Spencer on 06/25/2004 | Permalink

Comments (1)| Comments RSS | Filed under: General , , , , , , , ,