Dealing with the consequences of Bourbon Update

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Dealing with the consequences of Bourbon Update

Which changes has Bourbon brought about & How to deal with them?

reseller

3:41 pm on Jun 5, 2005 (gmt 0)

Assuming that the greatest part of of the latest Google update (Bourbon) is completed, its rather important to do some damage assessments, study the changes brought about by Bourbon and suggest ways to deal with them.

We need to keep this thread focused on the followings:

- Changes on your own site ranking on the serps (lost & gained positions or disappearance of the site).

- Changes you have noticed on the new serps (both google.com and your local google site) especially in regards to the nature of the top 10 or 20 ranking sites.

- Stability of the serps. I.e do you get the same serps when you run the same query within the same day or 2-3 successive days (both google.com and your local google site).

- Effective ethical measures to deal with the above mentioned changes.

Thanks.

theBear

11:33 pm on Jun 18, 2005 (gmt 0)

outland88, they always say follow the money.

MikeNoLastName

11:48 pm on Jun 18, 2005 (gmt 0)

So what do people say constitutes copying, sufficient for a dupe penalty?
Just the title alone? A single unique sentence like in the Google Descriptions? A paragraph? Using a common template on your own site which includes a partial title, common menu bar and maybe a common footer? Entire page copying including metatags? Your entire page (called from your site) framed by another person's page?
If we know what to look for it's a lot easier to find causes.
It's be cool if someone could write a comparator utility to see how "alike" another page is to yours.

Billy Batson

11:49 pm on Jun 18, 2005 (gmt 0)

Anyone got a guesstimate when the data from 64.233.167.104 will have fully propagated? I'm still finding DCs with old SERPs on them.

joeduck

11:51 pm on Jun 18, 2005 (gmt 0)

Annej - I'd guess your duplication problems are far more related to "junk" sites with snips of your stuff rather than legitimate sites.

There are a LOT more of the former. In fact I'm beginning to think there may be a staggering number, which helps explain why Google has trouble swatting them down.

Outland - TheBear's right about following the money - it's got to be VERY profitable to steal content if it gets indexed, especially above the originator. In one case a site took our very original state descriptions and now outranks us. Their biz up, ours down.

joeduck

11:57 pm on Jun 18, 2005 (gmt 0)

Mike - have you tried the copyscape comparison tool? Though I prefer using Google to look for stolen snips of our stuff.

But part of the big problem is that NO content is 100% original. We use our own writers plus database plus public domain. Some would consider it fine quality and some might call it pulp non-fiction.

_{Well, the following might be 100% original:
3uldjfe.,jdjdi..sh92, &%#$@!, &%^$#!}

sailorjwd

12:14 am on Jun 19, 2005 (gmt 0)

"NO content is 100% original"

I beg to differ with you on this point. I know I have content that is totally original in my case studies as well as writings on several topics - you couldn't find the subject words anywhere on the internet until they got swiped from my site.

As for what the benefit is for copying content - In my experience someone hires a person to setup a website and that person searches G on the topic so as to get text to put on the page - this has happened DOZENS of times to my site just in the last few months.

So not only are these folks copying my content but they are competitors whom are new to the field and now they range above me.

The scraper sites search G and copy the first 10 results. As I've mentioned before i had the pleasure of being in the top 10 for many thousands of search phrases so I'm in many thousands of directory scrapers.

joeduck

12:31 am on Jun 19, 2005 (gmt 0)

"NO content is 100% original"

SailorJ - we probably agree on this, I was being philosophical and meant the 100% literally. In a James Joyce novel you'll find word combinations from other works, therefore it's not "100% original" even though he's be considered BY FAR one of the most original English speaking authors.

This is not a trivial issue because Google needs to make those determinations via the algo. MikeNo's question is very important to these "anti spam" updates. I've been assuming duplication is determined by a percentage of page content and I'd (wildly) guess they are ratcheting down the percentage, triggering more and more dupe content filters and penalties.

sailorjwd

12:43 am on Jun 19, 2005 (gmt 0)

joe,

And another thing I'm now concerned about is that I re-wrote some paragraphs in order to escape the thieves. I re-worded sentences changed from 3rd person to 1st changed to alternate thesaurus entries for some words.

Does google only look at exact phrase matches or does it go beyond that and use LSI or someother method to look at dup content more broadly.

GG: can you give any input on this - like is it beyond exact phrase match?
Joe

annej

1:02 am on Jun 19, 2005 (gmt 0)

So what do people say constitutes copying, sufficient for a dupe penalty?

Definitely when someone copies your whole page it's duplicate content. Also if you have an article, chart, graphic, anything like that duplicated by someone else without your permission it would be duplicate content . I see it kind of like copyright laws. It's OK to quote a snippet from a work if it is used as a review, critique, reference or such. So even when scraper sites copy a snippet they are within their rights as they are using it to tell people about our site whether we like it or not.

Now that's my definition and pretty close to copyright guidelines. But I don't know if that is what Google means. Are they looking more for duplicated content on the same site or network of sites? Or is it that along with my definition as well?

JaySmith

1:08 am on Jun 19, 2005 (gmt 0)

People are still crying about their sites being gone? Man this thread needs to die.. Move on people.. For every loser there are winners.. I was a loser and a winner in this update. This is how all updates work for me. And I ALWAYS win more than I lose. ALWAYS!.

I diversify.. I build more than one site per niche.. tweak each one differently. If I get more than one in top 3 then that's gravy. But that is my advice to you all diversify. And stop acting like you are on google welfare. They don't owe you anything. They don't have any "OBLIGATIONS" to do anything. They are a public company that only obligation is to their share holders. Not to you guys. And this is coming from a die-hard google hater...

Someone please put this thread out of its misery.. I can see this thread going on and then three months later, someone posts, I'm noticing update "Charlie"....

fearlessrick

1:18 am on Jun 19, 2005 (gmt 0)

My thoughts on dup content, as I just had a flash about what I normally do in the course of publishing my own articles - usually on politics and sports - I publish on my site AND on other sites which offer the opportunity to do so. Many other high quality writings can be found on sites such as these, but the kicker is that they probably have a higher PR than I do and so I get penalized for posting my own content in places that have more traffic.

I do it for promotion and for the public good, as do many others. Apparently, if an AP story is circulated to over 100 or more websites, which is often the case, these sites, typically newspaper type sites, are NOT penalized because they are paying for copyrighted material?

It would be nice to know these things up front. I've been doing this for many months on political topics and for years on sports, so NOW I get penalized?

Personally, I think I should be able to post MY articles wherever they benefit me and the most people the most. Google needs to find another way to eliminate spammers and scrapers, IMO.

MikeNoLastName

1:19 am on Jun 19, 2005 (gmt 0)

"The scraper sites search G and copy the first 10 results."

Sailor seems to indicate he thinks that just the snippet from G is sufficient. If this were the case I think whole sections of our site which are weekly, unique, hand written columns, but which use a template containing the title and a motto and some writer bio info common to all of them would be enough to be considered "duplicates" of each other by this definition.

What about pages on different sites which all happen use a very common expression like "Today is the first day of the rest of your life". Are they copying or being copied?

More questions:
- Can a page be a duplicate of another page on the same site just by copying templates as in my example above? Without consistency pages look like cr@p and visitors get confused.

- Can G detect and differentiate if a single page is copying from MORE THAN ONE other site, like a scraper does?

- What does this mean for educational sites which include PhD thesis' with embedded permitted quotes from multiple famous works? Are they or the original penalized? Or newspaper stories which directly quote a current public figure in todays news?

sailorjwd

1:20 am on Jun 19, 2005 (gmt 0)

Jay,
I would prefer to put you out.

sailorjwd

1:25 am on Jun 19, 2005 (gmt 0)

Mike,

You are right - the scrapers who copy a snippet are not the issue unless they 302 to you and thereby possibly causing a problem.

There are other scrapers who somehow go after the first paragraph on a page - don't know if they do it manually or automatically.. I have a lot of 'thin' pages in the programming examples area .. so I think I am at risk on those pages.

walkman

1:28 am on Jun 19, 2005 (gmt 0)

I still see data moving. I don't rank good for a keyword with (ready for this?) 150,000,000+ results on all DCs. I rank in the top 5 on all but 4-5 DCs. It's part of my domain name, that's all, I didn't optimize for it or anything. Plus, I only get about 4-5 people a day from it, not exactly a money keyword.

sailorjwd

1:29 am on Jun 19, 2005 (gmt 0)

But then again I haven't a clue as to why my site went down in flames. Perhaps it is the missing </p> tag in my footer section on a lot of pages.

Atticus

1:33 am on Jun 19, 2005 (gmt 0)

JaySmith,

Don't panic!

If you can somehow get out word of your location, we will send the police to stop the person who is holding the gun to your head and forcing you to read this thread.

fearlessrick

1:36 am on Jun 19, 2005 (gmt 0)

I'm sure I'll get flamed for this but I'm not a techie, but rather an editorial type, but here goes:

I think SteveB was talking about me having four different verisons of my home page: site.com. www.site.com, www.site.com/index.html, www.site.com/index.shtml. I did not create these. I write to one page, www.site.com/index.html. I link to www.site.com internally.

Steve said this is suicide, but I didn't know I was committing it, really. If I had known I would have done whatever I needed to do to keep the site straight.

Steve also said Google has created four pageranks and four caches for me. First, I never pay any attention to pagerank. It just never seemed to be that important. I always managed to get good rankings (I was #1 for a number of keyword and key phrases before bourbon) without it.

On the cache, what is a cache but a snapshot or copy of my page(s)? And who gave Google the right to make that copy. I didn't, and I own the content, not Google. So, essentially, they are taking my content and penalizing me for it. Nice!

Like I said, I'm not a techie, and I didn't know you needed a complete education in webmastering to compete in this mess. And once again, I rank fine in Y and MSN, so why does G have to be so difficult. I thought they were better? In my estimation, they are not better. They are a PITA.

sailorjwd

1:49 am on Jun 19, 2005 (gmt 0)

fearless,

I 2nd that emotion.

I don't buy the big deal of www/nonwww/index.htm/noindex.htm

I believe the most important thing is your internal linking has to be consistently going to the same type of reference.

I have about 35 index.htm's for subdirectories. all the non-www versions have no pr/cache/backlinks - only the www version. Maybe that's a problem, but it isn't a problem that would wipe out and entire site.

I presently don't have any control over it anyway.

I'm sure 95% of folks with websites don't have a clue what the heck you guys are talking about and a lot of them rank just fine.

I'll go back under the bridge now.

joeduck

2:04 am on Jun 19, 2005 (gmt 0)

Someone please put this thread out of its misery..

Dude - arguably the most important post regarding Bourbon, by GoogleGuy, was *today*. You may also fail to realize that due to a New Orleans Voodoo Curse if this thread dies everybody who has posted in it will expire as well, so we are all in this together.

oldpro

2:09 am on Jun 19, 2005 (gmt 0)

I rank fine in Y and MSN, so why does G have to be so difficult.

Maybe G likes being the center of attention?

outland88

2:11 am on Jun 19, 2005 (gmt 0)

MikeNoLastName, It goes without saying that within a domain things are going to get duplicated. It's when the duplication takes place outside of that domain.

My original question posed to GG was:

>With this update I am finding more and more content thieves hijacking large chunks of my web pages for Adsense and other things. How much is this duplication effecting my rankings and others? To me it seems a large portion of the Google algo would be penalizing for duplication.<

My opinion is this problem is well out of control and GG knows it. And the money trails lead more and more to Adsense scrapers that engage in this. But It's doubtful I'm going to get an answer like yeah Adsense scrapers can wreck your rankings with duplicate content. I would bet no answer at all unless it was a definite no. If no, that would be like saying I can build 100 domains all with the same content. Doesn't stand to reason.

sailorjwd

2:12 am on Jun 19, 2005 (gmt 0)

That's a good one oldpro..

I always became the center of attention by doing a good job... never thought about doing it the other way around.

steveb

2:21 am on Jun 19, 2005 (gmt 0)

Amazing coincidence, the people complaining don't buy the problem, and instead whine at Google.

Look, the days are over when you can sloppily throw something on the Internet and magically have someone else sort it out for you. Now more than ever *optimization* is important, and that includes taking the time to both learn and do, including constructing your website(s) consistently and sensibly. Instead you can choose to put four near duplicate copies of pages on the Internet on the same domain, link to them all and confuse the hell out of an easily confused bot, and then rant about how "they" ruined your site or business. No, you "ruined" it by not giving it enough loving care and attention.

"I didn't know you needed a complete education in webmastering to compete in this mess."

You need one before you should be allowed to wildly blame other people for problems you created, when the basic solution has been posted many times, including by Google Guy (also several times). Stop complaining and get off your butt and solve the problems you created.

walkman

2:31 am on Jun 19, 2005 (gmt 0)

>> I don't buy the big deal of www/nonwww/index.htm/noindex.htm

You should. Unless you fix it, you'll be in the thread for years

sailorjwd

2:34 am on Jun 19, 2005 (gmt 0)

steve,

I don't care how many times you say it.. I can buy sucky positions in the serps but I can't buy a total wipeout overnight due to some of these issues (other than dup content straight out and to a lesser extent 302's). Also can't buy going url-only for 80% of pages in 10 days because of these issues (www/non-www).

In my case it is likely some other problem since I got rid of nonwww in Feb by fixing my internal links. And I don't think it has to do with a bunch of missing </p>'s, or the limited use of tables for formatting or the other 1000 things we-all are throwing out there.

When you don't rank for an exact unique page title amongst 150 results it is something beyond www/nonwww, we call it a penalty.

annej

2:47 am on Jun 19, 2005 (gmt 0)

I'm sure 95% of folks with websites don't have a clue what the heck you guys are talking about and a lot of them rank just fine.

I'm sure a portion of that 95% suddenly lost a good much of their traffic and don't have a clue why. That is why it's so important that Google solve the problem in such a way that the average Joe or Jane doesn't have to be a technical expert in order to avoid penalties like this.

On another topic. Jay, sure some people have been venting their frustration but there has been a lot of interesting discussion as well. I've sure learned a lot about the real world of the Internet in these Bourbon threads.

joeduck

3:00 am on Jun 19, 2005 (gmt 0)

Does google only look at exact phrase matches or does it go beyond that and use LSI or someother method to look at dup content more broadly.

Great question and I'm guessing they are experimenting with different approaches. I'd speculate wildly that they define "duplicate content" as pages that share some percentage of text info relating to the query and then rank those on the basis of PR. PR quirks sometimes cause legitimate pages, whose content has been duped, to fall in SERPs.

MikeNoLastName

3:44 am on Jun 19, 2005 (gmt 0)

I think I'm with Sailor insofar as another factor. Even on the latest DC I'm still seeing a lot of odd things that G has missed or intentionally included for some unknown reason. I just discovered a domain which was NEVER submitted or linked to and intentionally set to: User-agent: * Disallow: / back in May 2004 (not 2005!) showing up with both a www and non-www URL only link.
Also I've got literally dozens of non-www pages showing up on one site and ALL cached from Jan/Feb 2005, while I've been tracking the Bot retrieving some of the same pages getting a 301 and then retry the www version and get a 200 over 10 days ago! And on another domain a couple NEWLY indexed non-www pages were just added YESTERDAY with cache dates AND full descriptions from Jan 2005 while I'm seeing them access the same pages in the log every single day. I'm even seeing Google and Alexa cache pages as results in the SERPs!
Could they be mining IP addresses, whois and old, old databases for all this old, otherwise unavailable (and possibly disruptive) info?

nanotopia

4:23 am on Jun 19, 2005 (gmt 0)

My main website went MIA on the SERPs about 4-5 days ago. This of course has led to sleepless night as my revenue all but disappeared. It's starting to come back again. I can only assume that this was due to reindexing. Has anyone else experienced this?

This 1225 message thread spans 41 pages: 1225