Stephan Spencer's Scatterings

The Scattered Wisdom of a scientist turned web marketing virtuoso

July 2009
S M T W T F S
 << <   > >>
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31  

The New Age of Computational Engines

I have to say, I am impressed with Wolfram Alpha. I think it's a game changer. It provides a powerful new way of interacting with the large repositories of data available on the Web. For instance, instead of googling for "number of google employees" (incidentally, it isn't until the 4th result down that you get the answer), then googling for "number of yahoo employees", then doing the math to compute the ratio, you would simply input into Wolfram Alpha "google/yahoo employees". (The answer is 1.487:1, if you're curious.)

Welcome to the brave new world of computational engines.

What's a computational engine, you ask? The best definition I can think of is: an online data mining and analysis tool.

What can a computational engine do? A lot. It can segment the population by gender ("u.s. male population, u.s. female population"). It can tell you what that ratio is ("u.s. male population / u.s. female population"). It can graph the growth of the U.S. population over the last several decades ("population u.s."). And it can calculate population density in the U.S. ("population density u.s.").

It's a simple matter to do head-to-head comparisons and generate comparative charts. Just separate the terms with commas. For example, type in "google, yahoo" and you'll get a bunch of charts and graphs comparing the two companies' financials, stock performance and price history.

And wow can you drill down into the data easily. For example, start with the query "google.com" and you'll see all sorts of pertinent facts about the site and the company. To see a report of all the subdomains of google.com, click on the "Subdomains" link. From there you can click on "More subdomains" to get a more exhaustive list:
Subdomains of Google.com

I just wish I could have typed "subdomains of google.com" or "google.com subdomains" to get to the answer. Neither of those queries works.

Wolfram Alpha can even tell you how long you'll live. I queried "life expectancy age 38 male u.s." and it returned 77.54 years. Then I queried my birthdate and learned that was 38.45 years ago. Then "77.54 - 38.45 years" returned not only 39.09 years, but also 14,268 days -- which feels a lot longer to me! Finally, "39.09 years from now" gives the time and date of my demise: 5:31:06 pm CDT on Thursday June 25, 2048." I'm loading that in my iPhone's calendar with an alarm 10 minutes beforehand, so at least I won't get caught offguard. ;)

I also tried "(77.54 - 38.34) years from now" but Wolfram Alpha choked on that one. However "now + (77.54-38.34) years" did work.

If you're curious which countries have the longest life expectancy (or shortest), type in "life expectancy". Here's the answer:
Top countries by life expectancy

Perhaps I can buy myself a bit of extra time by moving to Macau? Exactly how much time is anybody's guess. Oh wait, Wolfram Alpha can answer this too!

Not only is the output interesting, the presentation of it is really slick, with great-looking charts and graphs. Note that the charts are rendered as images, not as text. If you want to copy and paste the data within the chart, simply click on it and a "Copyable plaintext" popup box will display.

I find the overly critical comparisons with Google unfair. Remember, Wolfram Alpha is a computational engine, not a search engine. Comparing Wolfram Alpha to Google is like comparing a cell phone to a TV remote. Sure a cell phone and TV remote may both be about the same size and they both have buttons, but the functions they perform are vastly different.

And it's very early days. We need to cut them some slack. Yes it is frustrating to get so many "Wolfram Alpha isn't sure what to do with your input" messages, but when Google debuted in 1997 it was pretty rough too, right?

Watch Stephen Wolfram's screencast demonstration before trying to use the engine. Otherwise it'll frustrate you when you get so many failed queries.

One piece of feedback I would offer to the engineers at Wolfram Alpha is to provide segmentation options to users. In other words, suggest the various ways the requested data can be sliced and diced. For example, the following queries all work properly:

  • life expectancy male
  • life expectancy age 38
  • life expectancy u.s.
  • life expectancy u.s. age 38 male
  • ldl 100 nonsmoker age 38 male u.s.

but these queries do not:

  • life expectancy nonsmoker
  • life expectancy ldl 150
  • life expectancy wisconsin
  • life expectancy wisconsin age 38 ldl 150 male nonsmoker

even though Wolfram Alpha is properly interpreting the syntax of the query and its components ("life expectancy", "male", "age 38", "u.s.", "nonsmoker", "wisconsin", and "ldl 100"). I kept running into trouble when I attempted to further refine the life expectancies from U.S. residents to Wisconsin residents, from males to non-smoking males with slightly high cholesterol levels. Teasing out subgroups within a population could be facilitated by an intuitive visual interface for viewing and selecting from the available segmentation properties. Or by better error messages, like: "Life expectancy data is not available segmented by state, only by country. Please try a broader query, like life expectancy u.s.".

Posted by Stephan Spencer on 05/23/2009 | Permalink

Comments (4)| Comments RSS | Filed under: Search Engines            

Talking Like a Google Insider

Using Google engineers' terminology will help you look like a search industry insider. For example, talk about "signals" rather than SEO "factors". Describe weak, undifferentiated content as "thin" (as in a "thin affiliate"). Work "canonicalization" into a sentence at least once every 5 minutes. Share your enthusiasm for "shingles" (yeah, NOT the disease). Speak in TLAs (three letter acronyms) like QDD (query deserves diversity) and QDF (query deserves freshness). And so on.

At Google's Searchology conference this month, some new buzzwords were bandied about. Here are a few pulled from this post and this post by Matt Cutts:

  • Chameleon = internal Google codename for the algo that does mid-page suggestions (like search for "labor" and get in the middle of the SERPs "See results for labor and delivery")
  • Spellmeleon = internal Google codename for the algo that preempts the first natural result with 2 results from what Google believes is the correct spelling of your query (like search for "ipodd" and get "Did you mean: ipod Top 2 results shown"
  • Google Squared = a not yet launched Google Labs project that returns search results in a structured format (i.e. as a spreadsheet). Search for "small dogs" and get a matrix with breeds, descriptions, sizes, weights, origins, etc.
  • Rich snippets = search listings with addition info in the snippet, such as star rating and number of reviews. Google gets this extra data hReview and hCard microformats - simply put, it's semantic, agreed-upon markup in your HTML pages. Kinda reminiscent of Yahoo's SearchMonkey. More about it here. (Incidentally, Dries - of Drupal fame - has an interesting take on what this could mean for SEO.)
Posted by Stephan Spencer on 05/21/2009 | Permalink

Comments (1)| Comments RSS | Filed under: Search Engines            

Canonical Tag Not Yet Reliable

I'm a big fan of the new canonical tag (er, element, to be more technically correct). It's a powerful tool for dealing with duplicate content. But it's not exactly reliable yet. Google wants us use it as if it were. Unquestionably, it's a signal. But it can be ignored, even when it should clearly by obeyed.

Case in point: Northernsafety.com. Many thousands of non-canonical URLs are indexed. For example click on some of the listings on
this SERP and compare the URLs you were led to by Google to what's listed as the canonical URL in the HTML source of these pages. You'll see that the parameters OPC and PFM are present in the URLs in the search listings but are not present in the canonical link element. Hmmm.

I know Google uses the element as a strong hint rather than an absolute directive, however it sounded like from Matt's video that it's about as strong a hint as a 301 redirect. If that were the case, I wouldn't have expected to see this behavior. This example I found doesn't look to me to be an "edge case," and I don't see any reason why Google shouldn't trust or adhere to the canonical tag in this particular situation. So what gives?

If you're thinking that perhaps the canonical tags were just added and didn't have time to kick in yet, take a look at the Cached links on some of those search listings. Some of these pages were cached way back in March and yet still have the canonical tag present in the Cached version. Certainly 2+ months is ample time for Google to canonicalize these pages??

I like canonical tags and I use them. But I always prefer 301 redirects over canonical tags, as 301s are pretty much *always* obeyed.

The lesson here: I wouldn't bet my business on the canonical tag being obeyed by Google.

Posted by Stephan Spencer on 05/19/2009 | Permalink

Comments (10)| Comments RSS | Filed under: Search Engines            

Arrrgh... Google Still Isn't Recognizing Underscores as Word Separators in URLs

Although it isn't a primary "signal" like the title tag or anchor text, keywords in your URLs can help with your Google rankings. But ONLY if Google can see the actual words in the URL. Turns out that separating the words in a URL with hyphens allowed Google to see the individual words, but using underscores did not. And this, unfortunately, continues to be the case today.

Not quite two years ago at WordCamp, Matt Cutts made the following statement that Google was imminently going to be treating underscores as word separators:

The interesting thing is we used to treat underscores as if they were like word A underscore word B, we would glom that together and we would index that as A underscore B, so if you just searched for the word A, we wouldn't return your post. Ah... We're in the process of changing that. We might have already changed that. So dashes and underscores are almost exactly the same.

You can hear the above statement for yourself in this video of Matt's talk, at around the 17 minute mark.

I excitedly wrote about it in a post for the News.com Blog, since historically keywords separated by underscores didn't look like separate words to Google, and this would save a lot of folks a lot of time if they were embarking on a URL rewriting project to fix their underscore problem.

Unfortunately I jumped the gun a bit, because Google still has not made the switch to recognizing underscores as word separators like they do with hyphens.

Your next question might be "But are you sure??" Yup. When I spoke to Matt in February at SMX West, he confirmed that underscores were NOT treated as word separators. According to Matt, this change is still in their queue but unlikely to happen before summer. My interpretation: don't hold your breath, it's between summer and never. ;)

Why didn't they roll out that change? Certainly it's clear it's not a priority. Google engineers are focused on improving relevancy and improving the searcher's user experience. I would guess that this particular tweak to their algorithm isn't going to do much for their users.

So, in your URLs, keep favoring hyphens over underscores for the foreseeable future.

And here's one gotcha to be aware of: don't use an underscore to separate a lookup ID from hyphenated keywords. For example, a URL like http://www.example.com/1234_nike-pegasus-running-shoes.html may at first glance appear to be search engine optimal, but the keyword "nike" is not visible to Google as a separate word. The keyword is actually understood by Google to be "1234_nike", not "nike".

By the way, although I favor the hyphen, there are other word separators accepted by Google, such as the dot (.), the plus sign (+), and the "escaped" space character (%20).

Posted by Stephan Spencer on 04/19/2009 | Permalink

Comments (5)| Comments RSS | Filed under: Search Engines            

"Thin Slicing", a Powerful SEO Tactic

In my Search Engine Land column last week, I describe a powerful SEO tactic that we at Netconcepts call "thin slicing". The term originally comes from Malcolm Gladwell (as used in his best seller Blink) and has no origins in the online world.

Gladwell uses the term in the context of "rapid cognition"; where one makes snap judgments in their field of expertise. Surprisingly, those snap judgments are often times more accurate than considered opinion, i.e. assessments that have been labored over. The important caveat: it only holds true for experts, not for amateurs.

We've co-opted the term and applied it to SEO. In that context, thin slicing is a tactic referring to mass optimization across a large number of pages, done quickly, and confined to just one or more high value elements (such as title tags). It relies on the gut-level instinct of the search engine marketer. Spare the in-depth keyword research and analysis and just take a guess, then move on. When you have a daunting number of pages to get through, deciding on synonyms, verb tenses and word order should rely on your intuition. Trying to optimize every element on every page perfectly is not scalable and will only sap your energy. "Thin slicing" could be done on title tags, keyword URLs, H1 headings, or meta descriptions. You'd monitor for impact, and then refine based on those results.

There are two approaches to thin slicing, and which one you use depends very much on your web site's infrastructure and what it supports.

  • One is through your a forms-based web interface in your admin. We refer to this as "mass edit" capability. WordPress supports mass editing of title tags and URLs ("post slugs", more accurately) - IF you have our free SEO Title Tag plugin installed. Through its mass edit screen, you can optimize all title tags across your blog - all your posts, category pages, tag pages etc., without having to go to each post's Edit screen individually.

    One feature we found invaluable when using web forms for thin slicing was to make the number of rows displayed per page user-configurable. Some users will want to display hundreds of records per screen, others will want much fewer, as too big of a web page will cause their web browser to crash or time out.

  • The other approach is "bulk uploading", where you import an updated list of title tags (or H1s or whatever) into your website's underlying database. You start with a database export in CSV (comma separated values) format of your current title tags -- along with the corresponding item ID numbers for each record, of course. Load the CSV file into Microsoft Excel and do your title tag optimization in the spreadsheet. Then upload the optimized title tags back into the database.

    Note that if your database does not have a field for the title tag, you'll have to create it and re-code your site to override the programmatic title with the contents of this new field when it is populated with data.

    Rather than having to maneuver through phpMyAdmin or rely on your database administrator, have a CSV file upload function built into the admin interface of your content management system (CMS).

When we added the "bulk upload" capability to our GravityStream proxy admin, our optimizers and those at our clients and partner resellers experienced a nice boost in productivity. So we can attest to the fact that "thin slicing" works.

Whether you prefer working in Excel or within a "mass edit" view in your CMS' admin interface, "thin slicing" is a great tactic to add to your SEO toolchest.

Posted by Stephan Spencer on 01/30/2009 | Permalink

Comments (16)| Comments RSS | Filed under: Search Engines            

SEO workarounds for Country Selectors as the Home Page

On my first visit to EMC.com last week, I thought to myself "Uh oh, that's not going to be good for their SEO". It was a country selector. The only content on the page was a long list of countries. No keyword-rich copy. No keyword-rich links.

EMC.com Global Country Selector

But then I took a deeper look. I did a Google search for "cache:www.emc.com" and was pleased to see the EMC US site's home page, not the Country Selector page! EMC had done their homework on SEO and were detecting the bots and waving them on. Googlebot doesn't have to select a country. Good for you, EMC!

Contrast that approach to Lenovo's global country selector. A Google search for "cache:www.lenovo.com" reveals, um, nothing. Yikes, no home page indexed! Nothing for "cache:lenovo.com" either. Then I visited the site masquerading as Googlebot, using lwp-request (one of my trusty power user command-line tools):

lwp-request -H "User-Agent: Googlebot/2.1 (+http://www.googlebot.com/bot.html)" -S lenovo.com

I saw the reason for Lenovo.com not having a home page in Google: bots were being directed to the Country Selector page using the wrong kind of redirect -- a 302 instead of a 301. Not only were bots getting forced through a cookies-based country selector (mistake #1) made worse by the issue of the 302 (mistake #2), but also the URLs are not being canonicalized (i.e. there was no www present in the URL "http://lenovo.com/planetwide/select/selector.htm". Indeed, none of the site is canonicalized. "http://lenovo.com/us/en/index.html" should 301 to "http://www.lenovo.com/us/en/index.html". Or vice versa if you prefer your site's URLs sans www.

What would I do differently if I were the sysadmin at Lenovo? I'd detect for Googlebot and send Googlebot directly to the U.S. site via a 301 redirect. Or alternatively, I'd make the home page URL ("http://www.lenovo.com/") respond with the country selector for humans and the US home page for bots without doing a redirect at all. That would mean the US home page would live at "/" (rather than "/us/en/index.html") for everyone except for humans who have no cookie set with their country preference, and of course, crawlers. Those visitors to / with the cookie set to another country would get redirected to the previously chosen country, which would not live on lenovo.com but on the corresponding country code TLD (such as lenovo.co.uk, lenovo.fr, lenovo.com.au). And I'd 301 non-www URLs to their www counterparts (more on this here).

Posted by Stephan Spencer on 01/08/2009 | Permalink

Comments (8)| Comments RSS | Filed under: Search Engines            

2 Days of SEO Training from Yours Truly!

Yes, you "heard" right! Two FULL days of SEO training from yours truly, coming soon to a city near you -- or not, if you don't live near Las Vegas, Chicago or Washington DC ;).

This is truly a first. In my 14 years since founding Netconcepts, I have yet to run this long of, and in-depth of, a public SEO workshop. Until now!

Brought to you by the American Marketing Association, as part of their excellent "Training Series".

It will hit Las Vegas February 23rd & 24th, Chicago March 10th & 11th, and Washington DC April 21st & 22nd.

In the two days I intend to cover the following topics in some depth:

  • Anatomy of a Search Engine -- Spiders, Indices & Algorithms, Market Share & Trends
  • Inside the Head of the Searcher -- Searcher Behavior & Intent
  • Hands-on Keyword Research & Keyword Portfolio Management
  • SEO Copywriting -- Optimizing Your Content
  • HTML Optimization -- Make Your HTML "Sing"
  • Search Friendly Site Architecture, Design, Navigation & Internal Hierarchical Linking Structures
  • Technical Optimization -- URLs, Redirects, Tracking Parameters, Flash, JavaScript/AJAX and more
  • Link Building -- Tools & Tactics for Acquiring Valuable, Relevant Links Sustainably
  • Social Media Marketing –- Leveraging Online Communities to Create Links & Buzz
  • Paid Search Fundamentals & Achieving Synergies with SEO
  • Search Analytics -- Metrics that Drive ROI
  • Tools of the Trade -- The Essential Tools & Resources for your SEO & Paid Search Toolkit
  • Vertical Search -- Local Search, News Search, Product Search, Image Search, Video Search, Blog Search, Mobile Search
  • Worst Practices -- Beyond the “Best Practices” to the Dark Side of “Black Hat” Spam & Other Deadly Mistakes
  • Site Clinic & Interactive Site Reviews -- Apply Your Knowledge by Auditing Fellow Attendees' Websites

Want more details or to register? Head over to MarketingPower.com. Or download the PDF brochure.

Posted by Stephan Spencer on 01/07/2009 | Permalink

Comments (4)| Comments RSS | Filed under: Search Engines, Shameless Self-Promotion            

My 30 minute WordPress SEO Training Video

If I had the time I'd prepare a professionally produced training video on search engine optimizing your WordPress blog/site, but I don't, so instead I'll just direct you to the video that John Pozadzides recorded of my 30 minute long presentation at WordCamp San Francisco from several months ago here, or simply watch the embedded video below:

Enjoy! Feedback is welcome.

Posted by Stephan Spencer on 10/31/2008 | Permalink

Comments (16)| Comments RSS | Filed under: Search Engines, Blogging            

My SEO Session at Startonomics: Watch the Video

I spoke today at the Startonomics conference on the topic of SEO. I only had a half hour to speak, so I had to cram a lot of critical info into a small amount of time. Here's the video of my session:

Some folks came up afterwards to ask for clarification on several of the many recommendations I made during the course of my half-hour:

One question was on nofollows. Why would you nofollow some of your internal links? The answer: because your PageRank gets divvied up amongst all your links and thus having fewer links that receive PageRank means that those followed links will get a larger share of PageRank. So on the dogster.com home page, the latest blog posts of the moment don't necessarily deserve as much PageRank as the dog breed pages do.

Another clarification: when improving a URL iteratively, be sure to 301 the previous URLs to the latest iteration, because you don't want to lose the PageRank you've "earned" from bloggers etc. through the course of your URL tests.

Startonomics live streamed the entire conference on ustream.tv. So you could have watched it all live for free. What a great model for running a conference! The also put all the presentation decks onto Slideshare.net. Each session had a blog post on the Startonomics blog dedicated to it, with the ustream video and Slideshare slide deck embedded along with some key points summarized in the blog post text. So it's possible by pouring over all the videos and Powerpoints and blog post summaries to get more education out of this conference than by attending it in person! Of course you miss out on the networking, which is also hugely valuable. One of the panelists on the last panel gave out this money saving tip: hang out as a non-registered person (loiterer?) in the halls at key events like the Web 2.0 Summit. You'll make great networking connections without the conference fee investment.

Posted by Stephan Spencer on 10/02/2008 | Permalink

Comments (8)| Comments RSS | Filed under: Search Engines ,            

Should You Follow Google's New Recommendations on Dynamic URLs? Probably Not

You may have already seen my article on Search Engine Land "Making Sense of Google’s New Dynamic URL Recommendations", but if you haven't, I'll recap some key points about Google's new recommendations on dynamic URLs and URL rewriting and why I don't advise you follow these recommendations.

As much as I'd love to believe that Google no longer needs webmasters to clean up their URLs for Googlebot, the hard truth of the matter is that Googlebot STILL stumbles across the same content at varying URLs and mistakenly indexes all copies -- even returning one version of these URLs with some queries and other versions with other queries. In fact that's the whole premise of my colleague Brian Klais' Search Engine Land article from earlier this month, that guided navigation systems create numerous URL pathways to the same content, and Googlebot isn't very good at detecting this and compensating for the duplication and PageRank dilution effects. What it all boils down to is this: what's confusing for Googlebot ultimately becomes confusing for searchers, thus leading to a lose-lose-lose -- for Google, for its users, and for you the site owner.

Given this, I dispute the assertion in the aforementioned post from the Google Webmaster Central Blog, that webmasters should "feel free to serve us [Google] your standard dynamic URL and we will automatically find the parameters which are unnecessary." That's gambling with your rankings, and personally I don't like the odds.

Let's have a look at a concrete example to prove my point. Just last month I spoke at the Shop.org Annual Summit, on a site clinic session where I gave impromptu critiques of sites volunteered by audience members. One such site was MEC.ca. A great site for users, not so great for Googlebot. It didn't take long for me to spot the duplicate content and PageRank dilution issues. Digging through site:www.mec.ca results revealed pages with jsessionid and bmUID parameters. Indeed, 102000 results (estimated) for and 96400 results (estimated) for site:www.mec.ca inurl:jsessionid!

Let's focus in on a specific page of MEC.ca: the "Biodegradable Shopping Bag" page, of which there are 15 copies in Google's index. Clearly Googlebot is confused.

This confusion is further evidenced by the fact that a search on "biodegradable shopping bag" returns a different mec.ca URL (on page 1) than a search on "biodegradable shopping bags" (page 4 of the SERPs) -- yet they are both the same (duplicate) page of content.

I would counsel MEC.ca that maintaining status quo and leaving things in the hands of Googlebot to eventually (maybe) sort out is not a viable solution.

Let's review some pertinent facts about dynamic URLs, along with my evidence:

FACT: URLs with session IDs or user IDs don't always get properly identified by Google, resulting in duplicate content and PageRank dilution.

EVIDENCE: The above-mentioned example from mec.ca.

FACT: URLs with keywords in them rank better in the SERPs than those with product IDs. So a rewritten URL like www.domain.com/blue-widgets will outperform www.domain.com/product.asp?productID=123 for a search on "blue widgets" -- all else being equal. This is true not just in Google, but in other engines as well.

EVIDENCE: We've conducted numerous experiments for clients to prove the rankings benefit to ourselves, but we can't publish these tests unfortunately (we are restricted due to client confidentiality). I encourage you to conduct your own tests. A Microsoft engineer just last month confirmed that keyword URLs provide a boost in Live Search.

FACT: Short URLs have a better clickthrough rate in Google SERPs than long URLs.

EVIDENCE: This effect was found through user testing that was commissioned by MarketingSherpa. MarketingSherpa found that short URLs get clicked on twice as often as long URLs (given that the position rank is equal).

FACT: Keyword URLs are more user-friendly, and thus probably better at enticing clicks in the SERPs by searchers.

EVIDENCE: Keywords within a URL that match the search query are bolded, providing additional emphasis to the search listing.

So, given the above facts, would you rewrite your complex dynamic URLs to look static and keyword-rich? I sure would!

Then what are Googlers' Juliane Stiller and Kaspar Szymanski trying to accomplish with the aforementioned blog post? My hunch is that Google is finding an alarmingly large number of improperly implemented URL rewrites that are confusing Googlebot even more and exacerbating the duplicate content situation. If superfluous parameters -- e.g. session IDs, user IDs, flags that don't substantially affect the content displayed, tracking parameters -- get mistakenly embedded into the filename/filepath, then Googlebot will have an even harder time identifying those superfluous parameters and aggregating the duplicates. And what if parameters are embedded in the filepath in inconsistent order (e.g. www. example.com/c-clothing/shirts-mens/ and www.example.com/shirts-mens/c-clothing/)? That's another nightmare scenario for Googlebot. On top of all that, when Googlebot still finds links to the old (non-rewritten) URLs, your well-intentioned URL rewriting actually presents Google with yet another duplicate to deal with. It can be a real mess. The lesson here is to hire a professional when embarking on a URL rewriting project, NOT to leave your URLs dynamic and your website in the hands of fate.

Posted by Stephan Spencer on 10/02/2008 | Permalink

Comments (0)| Comments RSS | Filed under: General, Search Engines