Stephan Spencer's Scatterings

The Scattered Wisdom of a scientist turned web marketing virtuoso

October 2008
S M T W T F S
 << <   > >>
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31  

Tricks for viewing cloaked content

There are two types of cloaking: user-agent based and IP based (also known by the euphamism "IP delivery"). Cloakers try to cover their tracks by making it difficult to examine the version meant only for spiders. They do this with a "noarchive" command embedded within the meta tags. Googlebot will obey that directive and not archive the page, which then causes the "Cached" link in that page's search listing to disappear.

So getting a view behind the curtain to see what is being served to the spider can be a bit tricky. If the type of cloaking is solely user-agent based, you can use the User Agent Switcher extension for Firefox. Just create the following user-agent under Tools > User Agent Switcher > Options > Options > User Agents:

Description: Googlebot
User Agent: Googlebot/2.1 (+http://www.googlebot.com/bot.html)

Then switch to that user agent by selecting Googlebot under Tools > User Agent Switcher.

But that won't work if the cloaker is doing IP delivery. If there's no "Cached" link in the SERPs, you might think you're out of luck. But you may not be!

A lot of times, Google's "Translate This Page" functionality can be used to view the cloaked content, because many cloakers don't bother to differentiate between the bot coming in for the purpose of translating or coming in for the purpose of crawling. Either way, it uses the same range of Google IP addresses. Thus, when a cloaker is doing IP delivery they tend to serve up the Googlebot-only version of the page to the Translate tool. This loophole can be plugged, but many cloakers miss this.

And I bet you didn't know that you can actually set the Translation language to English even if the source document is in English! You simply set it in the URL, like so:

http://translate.google.com/translate?hl=en&sl=en&u=URL&sa=X&oi=translate&resnum=9&ct=result

(Above, replace URL with the actual URL of the page you want to view)

That way, when you are reviewing someone's cloaked page, you can see the page in English instead of having to see the page in a foreign language. 

You can also sometimes use this trick to view paid content. i.e. if you're too cheap to pay for content from sites like WebmasterWorld where that content has been placed behind a registration wall and removed from Google's cache.

Example

Do pay for WebmasterWorld, though. Do right by Brett.

Posted by Stephan Spencer on 02/07/2007 | Permalink

Comments (4)| Comments RSS | Filed under: Search Engines ,            

Good cloaking: straight from the search engines' mouths

I'm here at Search Engine Strategies Chicago, and today at the "Meet the Crawlers" session I asked the distinguished panel of representatives from the four major search engines the question:

What is your current official position on simplifying the URLs selectively for bots like Googlebot, Yahoo Slurp, etc. by user-agent detection in order to drop session IDs and other superfluous parameters from the URL? Do you consider it cloaking? And if so, is it good cloaking or bad cloaking?

The panel, which included Ramaz Naam from MSN Search, Tim Mayer from Yahoo!, Charles Martin from Google, and Kaushal Kurapati from Ask Jeeves, gave me and the audience their definitive answer. But before they did, Ramez from MSN Search asked for clarification:

Will the same page content display to the user if that user types into their browser the URL that was given to the bot?

I responded with a "Yes," then all four search engines all confirmed individually:

No problem.

Then Charles Martin from Google jumped in again with:

Please do that!

So there you have it. Whether or not you call this technique cloaking or not, the search engines don't mind it, and in fact encourage it!

Posted by Stephan Spencer on 12/08/2005 | Permalink

Comments (7)| Comments RSS | Filed under: Search Engines ,