19th March 2007

Do as I say…

posted in Google, Webmastering |

Pot:Kettle is black…not what I do.

Google made an announcement on their webmaster’s blog spawning a lot of speculation over the death of scraped content.

Notably from the post:

…These techniques are usually accomplished by abusing qlweb style catalogues or by scraping content from sources known for good, valid content, like Wikipedia or the Open Directory Project.

These methods violate Google’s webmaster guidelines. Purely scraped content, even from high quality sources, does not provide any added value to your users. It’s worthwhile to take the time to create original content that sets your site apart. This will keep your visitors coming back and will provide useful search results. …

I take exception to this post purely on the basis that in the second paragraph they seem to infer that wiki is a “high quality source”, just because a site ranks in the top 5 for just about every query in Google doesn’t mean its high quality. It does mean that they’ve successfully designed their site to get a ton of Google-Love but as many, many people have discussed, these wiki results are ruining Google’s quality.

I am however hoping that this is just a shot across the bow of webmasters, a warning that changes are coming. Hopefully they are going to be true to their word and start removing these RSS feed generated sites, wiki clones, affiliate sites, shopping comparison junk, etc.

I’ll believe it when I see some of the biggest offenders like Google Directory and Google News deindexed. Currently The Google Directory enjoys 783,000 indexed pages and Google news has 13,200 pages.

Other than slapping up a view of the pagerank and reducing the competition by removing links to the other search engines can you tell me the value that Google has added to these pages?

DMOZ Original

Google’s Scraped Version

This entry was posted on Monday, March 19th, 2007 at 12:51 pm and is filed under Google, Webmastering. You can follow any responses to this entry through the RSS 2.0 feed. All comments are subject to my NoFollow policy. Both comments and pings are currently closed.

There are currently 5 responses to “Do as I say…”

Why not let me know what you think by adding your own comment! All the cool kids are doing it.

  1. 1 MyAvatars 0.2 On March 20th, 2007, Halfdeck said:

    Sorry JLH, but I’m one of those pro-Wikipedia drones so I’m gonna have to disagree :)

    “I take exception to this post purely on the basis that in the second paragraph they seem to infer that wiki is a “high quality source””

    No man, read again: “sources known for good, valid content, like Wikipedia.”

    People bitch about Wikipedia stub pages and inaccuracies, but lets face it, all websites are error prone and every guy and gal that’s bitching would love to be in Wikipedia’s shoes right now.

    Wikipedia is the new SPAM …

    aka Site Positioned Above Mine.

    Let’s take “coffee” for example, where Wikipedia ranks 1st and Starbucks ranks #2. Is that so wrong?

    On the Wikipeda page, you got: 1) History of coffee (apprently stemming from the highlands of Ethiopia), 2) Where the word “coffee” comes from (Kaffa, Ethiopia) 3) Coffee seed types (coffea arabica, coffea canephora) … Shall I go on? Processing and roasting, Coffee in art and religion..

    Here’s a snippet, just to drive home my point:

    Coffee is a widely consumed beverage prepared from the roasted seeds—commonly referred to as beans—of the coffee plant. It is usually served hot but can also be served cold. A typical 7 fluid ounce (ca. 207 mL) cup of coffee contains 80–140 milligrams of caffeine, depending on the bean and method of roasting and preparation.[1] Some people drink coffee “black” (plain)

    What about Starbucks?

    http://www.starbucks.com/

    Can we say 99.9% flash?

    Look at the rest of the sites on page one. None of them talk about history of coffee or anything else about coffee in much detail. They’re all trying to sell their damn products =D

    http: //www. peets.com/Default.asp?rdir=1&
    http: //www. nationalgeographic.com/coffee/
    http: //www. coffeereview.com/ (finally, some on-page text)
    http: //www. coffeegeek.com/ (alotta links but no info on the page)
    http: //www. gevalia.com/Gevalia/ (flash)
    http: //www. cariboucoffee.com/
    http: // coffeeuniverse.com/ (directory)
    https: //www. dunkindonuts.com/ (flash)

    I say Wikipedia deserves its #1 spot for “coffee” IF people are looking for information instead of coffee beans or coffee makers to buy.

    Like Rand posted yesterday about catering to the “linkerati”, Wikipedia gets that right to a T. Bloggers link to it like its an addiction. Starbucks, on the other hand, markets to consumers, so its much less of a link magnet.

    3/20/07 JLH mod: I broke the links so that’s Halfdecks post would appear (His link of course is still good!)

  2. 2 MyAvatars 0.2 On March 20th, 2007, JLH said:

    Well, I wouldn’t go so far to say that ALL of wiki is crap, just that it dominates the SERPS and is something I filter out because I find it generally worthless. Now if someone just got here from Mars and is wondering what coffee is, then I’d guess its a great resource. I find it about as valuable as using the dictionary, it will help define the term, but if you want some real meat you’re going to have to read a book about the subject.

    As far as SPAM (Sites Positioned Above Mine) goes, none of my commercial adventures have been adversely affected by the wiki so I’m not in that category. The internet as a pure resource was doomed the day someone made $1 off of it since then commercial pressures will always push those types of sites to the top, they have the motivation.

    As you said this is the one site people bitch about the most lately, you are right. We’ve gone through this with other sites as well that seemed to dominate Googles results such as About.com, Answers.com, Amazon, Ebay, etc. People get blind to the same results after a while and start going to the next one which, I believe, eventually is reflected in the results. I’d suspect wiki’s world dominance won’t last forever.

    I still take to heart their decision to NOFOLLOW every external reference they have (and I’m not one of them so I don’t have a dog in this fight) as a declaration that they don’t trust their own sources. If they fix that, then my view of the site may change a bit.

    What about the main point of the article? Google is scraping content on their own, presenting it, and indexing it without much modification, contrary to the blog post I referenced.

  3. 3 MyAvatars 0.2 On March 20th, 2007, Halfdeck said:

    “What about the main point of the article?”

    I doubt Google needs to scrape to maintain its directory (instead I think it has access to DMOZ’s DB), but I agree Google should discontinue its directory. You’re right its a dupe and I’ve never used it.

    “if you want some real meat you’re going to have to read a book about the subject.”

    Sure, but compare the amount of information on the Wikipedia page to the Starbucks home page (or any other page in the top 10), and its not surprising where Wikipedia ends up. Maybe coffee wasn’t the best example, but do you really have the time to sit and read a book? :) Wikipedia IMO is a starting point, not a destination, and its a good starting point as any. Besides, if Wikipedia was completely crap, 2,122,992 urls wouldn’t be linking in.

    “since then commercial pressures will always push those types of sites to the top”

    Wikipedia, YouTube, and Myspace are all counterexamples of that idea.

    As for Wikipedia’s nofollow, I agree with Rand Fishkin. I’m biased though; I had only one link in there generating a measely ~800 uniques/month. But I also think Wikipedia’s nofollow policy is like Google assuming all links on the web are paid links. Hopefully, they’ll figure out a way to remove some of their nofollow over time.

  4. 4 MyAvatars 0.2 On March 20th, 2007, JLH said:

    I’ll have to take you for your word on it, which is pretty good, about what Rand said. I can’t/won’t read his stuff anymore as he has a regular commenter/buddy/friend/associate/whatever that comments on there that I will not subject myself to reading or even seeing reference to. And no, I will not elaborate.

    I’d say wiki competes well with MFA “information” sites, which is a good thing.

  5. 5 MyAvatars 0.2 On March 20th, 2007, Halfdeck said:

    I’m sorry to hear that John. Since they redesigned the site, a ton of people’s been leaving comments on every post, so I don’t even bother commenting there because my comments just get buried. The posts are harder to read too because each one is a mile long unless I just read the RSS feed. I’m gonna drop the Wikipedia issue; I was just looking for a provocative discussion, but I know it can be a sensitive subject. That WMW link you posted is like over a month old and still ticking.

  • Please Support

  • Marquette University

  • Sponsored

125x125

  • Donations


  • ;

Enter your email address:

Delivered by FeedBurner

rss posts
Spread the Word
Sphinn
delicious
digg
technorati
reddit
magnolia
stumbleupon
yahoo
google
  • Readers