Stephan Spencer's Scatterings

The Scattered Wisdom of a scientist turned web marketing virtuoso

October 2008
S M T W T F S
 << <   > >>
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31  

Ask Jeeves wants your Robots.txt!

David Naylor from Bronco, who was one of the speakers at the Organic Listings Forum session at the Search Engine Strategies conference, advised site owners to have a robots.txt file, even if it's just an empty file, because Ask Jeeves' spider seems to favor web sites that have one.

Anyone noticed an improvement with your presence in Ask Jeeves after creating a robots.txt file?

Of course there's also the side benefit that you'll eliminate all those "File Not Found" error messages for robots.txt in your server error log, which tend to overwhelm the error log, making it harder to spot more concerning error messages. That assumes of course that you actually examine your error log on occasion. ;-)

Posted by Stephan Spencer on 12/10/2005 | Permalink

Comments (2)| Comments RSS | Filed under: Search Engines , , , ,            

Underdog, Teoma, does it differently; authorities, hubs, and topical relevance

From an SEO standpoint, there is consensus among experts - Google, Yahoo and MSN are it. However, there’s a yappy little underdog called Teoma, which, despite its size, is a good contender in the search engine stakes. Teoma, which means "expert" in Gaelic, is owned by Ask Jeeves and powers the algorithmic search results on their properties (like Ask.com, Excite.com).

Yes, there's a big technology difference between Teoma and the other "big three" but Teoma does it differently with its localized approach. As Ammon Johns explained it in the MarketingProfs Thought Leaders Summit on SEO:

PageRank and link popularity is a bit like going out into the street and asking everyone who the best scientist is — you are going to get the obvious names: Albert Einstein, Stephen Hawking. They’re popular answers.

Teoma looks within the topic. It finds the authority sites within the topic related to "scientists" and then asks "Who is the best scientist?" Chances are, Teoma is going to come up with names you have never heard of before, but are actually much better answers. It gives you the specialist answer instead of the popular answer.

Another difference with Teoma is that it is keyword-dependent. So when you type "blue widgets" into that search box, it pulls the community together and conducts a local search which refines and finds the authoritative sites on that particular subject.

The model of organizing the Web into topical communities and pinpointing the authorities (pages that have garnered a lot of inbound links from reputable, topically-relevant pages) and the hubs (those pages that link to a lot of reputable, topically-relevant pages) is an important model to grasp (read more about it in Mike Grehan's paper on topic distillation), because I predict that all the major engines will become keyword-dependent over time. If you grasp this concept now and are picky about the sites you garner links from and link to, then you'll be doing a lot to futureproof your SEO.

Coverage of SES San Jose: Search Engine Q&A On Links

I'm a bit behind on my conference session blogging. Waaay too many parties going on; doesn't leave much time for blogging. The Google Dance last night. Yahoo! party at Great America the night before. And tonight I've got another party to go to. Yesterday I spoke on RSS. I'll post a recap on that session later.

I just attended "Search Engine Q&A On Links", which was great. Lots of useful advice from Google and Yahoo! about linking (nobody seemed to want to ask poor Ask Jeeves any questions). It was funny how obviously diametrically opposed the engines were to the immediately prior session on "Buying and Selling Links". It's hard to reconcile the two different sets of advice. Matt in the hallway before this session was adamant: "Don't buy links!"

Anyways, without any further ado, here's the session recap:

Kaushal Kurapati from Ask Jeeves:
Be cautious of: reciprocal links and purchasing links
Avoid: link farms, cloaking pages, invisible or hidden links that trick the crawler
Become an authority on a subject
Focus on your busines and content. Rest will follow. [I say: "yeah, right..."]
Teoma uses subject specific popularity: garner respect in your industry, subject-specific text based links can be understood. (hubs and authorities model)

Tim Mayer from Yahoo!:
Here's some important news!! Yahoo! has just launched a brand new service: Site Explorer from Yahoo! Search. Stop scraping the Yahoo site for backlink results and use Site Explorer instead. Access via an API is offered too. And you can export as a CSV file.
Yahoo has 19.2 billion web objects in its index. Over 20 billion objects, when you include the audio and video.
Plans to use community to improve search quality. Social search = within a trusted network, where someone within your network vouches for a site.
Create natural linking strategies. when things start to look unnatural, is when you'll start getting into trouble. We look at intent (linking to plasma TVs, diamonds, and Viagra all on the same page) and extent (i.e. what looks normal. Having everything on the page as links or 200 links on the page is too much!)
Yahoo! offers a much more comprehensive sample of backlinks than Google, but not a complete set of backlinks. New system (Site Explorer) will be reasonably comprehensive, in his opinion the most comprehensive out there.
It's unnatural to link to sitemap-1 sitemap-2 sitemap-3 sitemap-4 sitemap-5. If you are doing this, you're headed in the wrong direction.

Matt Cutts from Google:
Good links are earned links, links that are based on editorial discretion.
Create services that really useful. e.g newsletters, an article a day, syndicate through RSS (attribute my article and give me a link). start a blog.
Matt launched his blog today: mattcutts.com
Think outside the box.
Only SEOs and librarians do backlink searches. Historically we decided to dedicate a subset of our servers to backlinks. Only a sampling of backlinks would be displayed but only for a threshold of PageRank 4 or higher pages. A suggestion was made to show backlinks for lower PageRank pages too. We liked that idea so we now show a random sampling of backlinks, including low PageRank scoring pages too. We show twice as many backlinks as shown before, but still it's only a sampling of the backlinks.
In graph theory, a clique in every node in the graph is very unnatural. So don't link to every single node in your network of sites; it'll get flagged.
For dynamic sites, you're very safe if you have fewer than 2 parameters; keep the values of those parameters to fewer than 5 digits, and don't name a parameter "id". Googlebot sometimes tries variations of URLs by dropping parameters, but we only do that deep level analysis on big, quality sites.
Another good approach that alltheweb came up with: spider would always go 1 dynamic page deep from a static page.
Search engines only grab 100k or 200k or 500k so be careful loading up a huge page with a lot of links.
PageRank isn't as important as SOME people make it out to be. BUT it's NOT like "PageRank? Oh yeah let's shuffle that one under the rug! That was sooo 4 years ago!"
"BO" = backlink obsession
We export PageRank only once every 3 months or so.

Technorati tag: Search Engine Strategies

When will Google, Yahoo, MSN, and Ask Jeeves start indexing RSS feeds properly?

I find it a bit unbelievable that the major search engines — Google, Yahoo!, MSN Search, and Ask Jeeves — still don't offer RSS feed searching combined with RSS search results feeds as part of their Web search. Specialized RSS feed search engines like Feedster, PubSub and Technorati have risen to the occasion, filling the void left by the major engines' inaction. Bloglines, the AskJeeves-owned company, has announced a blog/RSS search engine service that'll compete with Feedster, PubSub, and Technorati, but still that's a far cry from embedding RSS search right into the Web search box.

Here's how each of the majors handles RSS feeds:

Google:
screenshot of search listing of an RSS feed in Google
another screenshot of search listing of an RSS feed in Google

  • has URLs of valid RSS feeds in its index (due to links that point to those feeds)
  • doesn't recognize the XML file format of RSS feeds (as you can read on the excerpted screenshots above)
  • only rarely indexes the feed (I base that not just on the fact that nearly all RSS feeds are shown in Google results with no title or snippet as in the first screenshot above, but also because, out of 64,000 RSS feed files hosted by feeds.feedburner.com, only 19 are shown to contain the word cheese, the last 2 of which show up in the results only because cheese appears in links pointing to the feed; yet the same search on Yahoo! shows over 400. So clearly a lot of files that should have matched are missing from the Google search results.)
  • only rarely caches the XML (see example) with most caches being blank (like this)
  • associates words in links pointing to the page (as demonstrated with this search)
  • doesn't allow refining of your query with the operators — filetype:rss, filetype:xml, or filetype:rdf

Yahoo:
screenshot of search listing of an RSS feed in Yahoo!

  • has URLs of valid RSS feeds in its index
  • indexes the feed (Evidenced by above screenshot, which was a match for a search on text contained within the feed. Also, ResearchBuzz found this to be the case too.)
  • caches the XML (see example)
  • doesn't display the "Add to My Yahoo!" link for RSS feed listings (this is a disappointing omission, as Yahoo! displays this link on listings for HTML pages that have an associated RSS feed but not for the listing of the RSS feed itself)
  • associates words in links pointing to the page
  • doesn't allow refining of your query with the operators — filetype:rss, filetype:xml, or filetype:rdf

MSN Search:

  • doesn't have URLs of valid RSS feeds in its index (Evidence of this: not a single feed out of 64,000 feeds at feeds.feedburner.com is displayed, even though there are links that point to those feeds. Note that the couple feeds that are displayed are not valid feeds but error pages outputted in HTML.)
  • doesn’t recognize the XML file format of RSS feeds (file type is displayed in the search listing after Cached link when it's a recognized non-HTML file type)
  • doesn't index the feed
  • doesn't cache the XML
  • doesn't allow refining of your query with the operators — filetype:rss, filetype:xml, or filetype:rdf

Teoma (Ask Jeeves):
screenshot of search listing of an RSS feed in Teoma

  • has URLs of valid RSS feeds in its index
  • indexes the feed
  • (View Cached feature not supported by Teoma)
  • associates words in links pointing to the page
  • (filetype: operator not supported by Teoma)

As you can see from my little comparison, MSN Search is the farthest behind when it comes to RSS feed indexing. Hopefully Scoble will read this and tell the MSN Search team to get on the ball. ;-)

Even though the major engines have been slow to make RSS an integral part of their indices, I predict that the engines will, within the next year or so, wake from their slumber and overtake and even acquire their specialized RSS feed search engine competitors.

What that will mean for web marketers is that search engine optimizing RSS feeds will become a science unto itself (currently it's limited mainly to optimizing the item titles for purposes of link text on syndicating sites) and that the feeds that are not optimized will get drowned out by those that are.

Posted by Stephan Spencer on 06/17/2005 | Permalink

Comments (7)| Comments RSS | Filed under: Search Engines, RSS Marketing , , , , , , ,