Forum Moderators: Robert Charlton & goodroi
Surely the most obvious candidate must be Google’s patent application 0050071741 which was passed on March 30th and which represents some very rare hard(ish) facts direct from Google about how it plans to improve its search quality. With 60 interacting parameters based primarily on historical document and link analysis it’s difficult to pull out what the effects would be – especially as these would vary across different sectors with, for example, old content sometimes seen as stale and sometimes definitive - but the most important point is that there’s no way that such new factors could be introduced to the mix without resulting in major winners and losers – just what we’ve seen recently. For my software reviews/articles site I can certainly see how links from years ago could now be heavily discounted (though I’d argue the content is still very useful)
Having said that, while I can understand dropping due to the changes, I’ve definitely been blacklisted/sandboxed which presumably means that Google has decided I - and everyone else who’s been affected so severely - is up to blackhat tricks which is definitely not the case. With the patent’s focus on link analysis that brings me back to the idea of too many recent links triggering a spam filter (perhaps with the use of AdSense being seen as a secondary spam indicator as the monitoring of ads is also mentioned in the patent). It’s certainly a possible explanation for what’s happened to my site as Google reports 3500 links many of which I would guess are from recent scraper sites picking up from directory listings and particularly Google’s own SERPS (Google’s previous patent “based on the interconnectivity of the documents in the set” would presumably explain why Google is so susceptible to scrapers and as my site was popular across a broad range of software-based keywords it was a natural scraper target).
I can see the benefit for google and the majority of searchers in clearing out the spam but, if this is what has happened, it means that honest and useful sites are being penalized simply for being popular on Google. And possibly for signing up for Adsense too!
Of course anti-spam false positives (whether as described above or not) are inevitable, but there needs to be some workable appeals procedure for removing undeserved blacklisting based on manual checking. OK it’s not algorithmic/scalable but with $50 billion in the bank I think something could and should be done for those who’ve lost out – the phrase “Don’t Be Evil” springs to mind
And as this forum seems to be the centre of Bourbon discussion/disgruntlement and I’m sure folk at Google are monitoring it, can we not do something about it? For example is there somewhere we can post our actual website addresses in the hope that they’ll fast-track us back into the SERPs if only to stop us moaning?
This is my first posting - sorry it’s so long.
Thanks for interesting post.
<I can see the benefit for google and the majority of searchers in clearing out the spam but, if this is what has happened, it means that honest and useful sites are being penalized simply for being popular on Google. And possibly for signing up for Adsense too!<
Actually we have read on several threads fellow members mentioning sever drop in their sites ranking on the serps or the disappearance of their sites from the index. It was also mentioned that the said sites had high ranking on the serps for their relevant competitive keyphrases. Those sites are/were not spam sites or scrapers. So what's the reason behind their "sufferings".
Through discussion in "Dealing with the consequences of Bourbon Update" thread, several reasons were mentioned for sites being penalized and disappear. Most important is the 301 redirect (www.yoursite.com vs. yoursite.com)
[webmasterworld.com...]
Guess and assumptions are all what we can do. However I don't think that AdSense is a reason for penalizing a site. Because it has been reported by fellow members that there are still "AdSense Scrapers" on top of the serps.
There is a site that links to you that appears to have uncovered the Google cache entry for one of your pages.
That cache entry is findable in an allinurl:domain search.
In addtion you have some 72 pages duplicated between the www form and the non www form of your domain.
Please compare your sites tanking with MikeNoLastName sites tank.
Greetings.
My site parallels your to a large degree.
Same industry
Same # of visitors lost coincidently
Wide number of topics within the industry
Incredible good rankings (probably too high for me)
My decline began on Feb 2nd and finished May 5th.
I havent come back yet except for a handful of phrases.
Had many dup pages via content theft - whole pages, whole multi-page groups, etc.
Had a possible issue with spam-like internal linking with 20-30 keyword-rich links on most pages on left and right nav area. Result was pages with VERY high keyword density if you count link tags.
Interesting footnote. One 3 word search has near zero keyword density for one of the words but word is in 25 link tags pointing to page. This search ranks #1 - even beats mr softy.
I'm in the process of killing the on page keyword density for the link tag keywords and I'm back to position 30-50 on many searches now. (still more work to do)
Joe
It's easy to check for just do:
"allinurl:yourdomain.com cache"
It will show up something like:
[66.102.7.104...]
Always with an IP address.
You can also view the 60,000+ in the database by typing "allinurl: search?q=cache"
Clicking on it will USUALLY show a googlecache of your pages which is oviously duplicated content. It SHOWS a PR of 0 but then it IS a Google domain, so who becomes the authority....?
It would be interesting to hear from if anyone else who has been dumped is seeing this or not. Also if anyone who has NOT been dumped has one just to determine if it could be a factor.
In response to Reseller I picked up on the canonical index issue thanks to the forum and put in the necessary 301 redirect but I can't believe it is the cause of such a massive drop.
Also I definitely get the feeling that things are being done on a keyword/sector basis - I know it's not just sailorjwd who is in a similar boat to me - which would explain reports of Adsense scrapers surviving. I haven't comprehensively looked into it but FWIW I think technology review searches are better than they were. I certainly hope the underlying reason is tackling spam, there's no other justification for putting us through this.
Thanks to the Bear for pointing out the probable hijacks (and the non www issue). These are largely inadvertent redirect links from very respectable sites ie about.com and creativepro.com and have been around for years so I can't believe they are suddenly draining pagerank/traffic on such a scale.
>>Please compare your sites tanking with MikeNoLastName sites tank.
How can I do this - I don't see any URLs in member profiles?
To sailorjwd commiserations and I've wondered myself whether my internal links might count against me - 250 keyword-heavy review links in my archive section could look spammy. Don't want to change it though as it's very useful to visitors.
To MikeNoLastName I did a check and one page did appear as you describe. Again, as with regular hijacking, I can't believe that this alone would suddenly explain a 75% drop (not that that means it doesn't)
And to Joe King - it is possible that I'm suffering duplicate penalty from the magazine sites (basically we both have copyright and both post) though in the past these were data-driven and never figured on Google.
And you're right it's time I updated my photo :)
This makes it clear that Google is going to take historical document and link information into account when it works out rankings which changes the landscape entirely - especially for us old time content-focused publishers doing reviews etc who seem to have been hit hard.
Looking at my referral site entry pages and search engine phrases in my stats it certainly seems a reasonable explanation of what's happened with older pages and their keywords doing less well than more recent pages.
More to the point, on reflection it seems a justifiable explanation as, all things being equal, most searchers will prefer recent reviews and tutorials and content generally. Obviously this is not always the case - some of my articles now have minimal traffic but are still the last word on their subject :) - but on balance I think Google is right to try and take time and historical factors into account. Especially as the stress on organic content and link growth should help kill spam sites.
In retrospect, despite the topic's title, I don't think the patent has much to do with Bourbon which seems to have been a short-term blanket near-100% penalty that has largely been removed since I made the original post (maybe an anti-spam measure). However I still need to explain my earlier 75% drop and the patent seems to me to be the most likely suspect.
If those who have been similarly affected look at your current referral entry pages and search engine phrases compared to the good old days, do you see a similar pattern emerge?
I think that MikeNoLastName has weighed in on the thread.
Happy to point out that which I find.
When did you add your 301s, but what is more important (are you listening Clint) is what if any actions did you take to clean up the mess (g1smd has been through this parts as have I). I haven't seen much that actually helps in doing this.
"If those who have been similarly affected look at your current referral entry pages and search engine phrases compared to the good old days, do you see a similar pattern emerge?
As a member of the 75% down club...
On age of page it seems to have no effect for us. At subdirectory ( topic) level the same topics are popular as before - just the traffic is drastically reduced.
We have highly similar pages in terms of navigation/links in and out/design - just different content some rank #1 some #100+
On search phrases the difference is a bit more marked - but this reflects the position of the pages rather than the topics themselves. So pages on red widgets may be more popular than blue widgets - but widgets as a topic stays constant.
The patent may be a factor in this - I haven't read it- in relation to "links in" age. But patents are often generalised to "catch-all" possibilities.
As an experiment, and to prove demand still exists, our new adwords campaign is getting a very high Click through %.
P.S why is always widgets - when I did accountancy exams it was always XYZ company makes blue widgets make up a balance sheet - does anyone actually sell widgets?
I'm highly skeptical about the patent. I don't trust it.
It might simply be that Google has found ways to apply the ideas within the patent application, and they want to make sure that the competition is kept at arm's length by patent protection for the next 100 years. As an added bonus, webmasters are now in awe AND confusion about Google's all-knowing powers.
Many of these ideas might even worsen the SERPs. Google can't apply them without surveying and thourough evaluation.
All those ideas will require boffo computing power. Maybe one day. But now?
I bet many of them are way, way down the road, if at all.
Or just run this search of WebmasterWorld threads [google.com].
It's an interesting question though, sort of. But bourbon kind of caught people's attention in the last 2 months, since it affected them directly.
"If those who have been similarly affected look at your current referral entry pages and search engine phrases compared to the good old days, do you see a similar pattern emerge?
My site has lost 75% of its Google´s referrals on 3rd Feb 2005 (Allegra). So I might be considered the co-founder of the 75% club ;-)
As to entry pages and search phrases, I can say that its around 40% similar to those of the "good old days".
2by4 I did see the patent topic, that's what alerted me to it, but bizarrely all comment on it seemed to fizzle out just before we were hit with the major algorithm and traffic changes that are exactly what you'd expect if it was implemented.
helleborine I agree that the patent can't have been implemented in full and across the board as the resulting index would bear absolutely no correlation to the old one and everyone would be up in arms. However there's nothing to stop them cherry picking features and implementing them in certain keyword sectors. In my area of computer software it would certainly make sense to assume that if someone searches for "xyz review" or "xyz tutorial" they are more likely to want to see one posted in the last year or so (though of course not always).
Not sure what field johnhh is in but maybe it's not so time-sensitive as mine and reseller's.
I also think that it would make sense to screen out the huge recent rise in junk links from scraper directories that have probably been over-inflating the ranking/traffic of sites that previously did well in SERPs and dmoz style directories which would create a general fall in traffic to all pages which I'm certainly seeing as well.
And yes many of the patent's ideas would falsely lower the rankings for particular pages eg some of my articles aren't stale they are definitive :), but so long as the overall effect was positive, Google would be right to implement them.
When we're looking for new factors to account for across the board (or sector) changes it seems perverse to ignore Google's stated aims that it plans to implement such changes based on historical data just because it's difficult to see exactly how they would do it or how it would work in practice. Especially as you are right that this would be a huge appeal to Google in itself. And should be to legitimate publishers too if it succeeds in shaking off spam sites that are currently manipulating the existing PageRank system.
Basically I think the signal to noise ratio has become too much for the current PageRank citation approach and they need to use historical data to screen out the junk to find the organically added human content and especially backlinks that PageRank needs to work its magic.
If that's what's happening then I can at least understand the reasons for my drastic fall, or rather the underlying purpose, which makes me feel a bit better.
And just following up on johnhhh's observation regarding his high click through rate on AdWords, I think another very important factor that hasn't been discussed AFAIK are AdSense clickthru and eCPM rates. Mine have risen dramatically recently and during June I'd say are more than double what they were in "the good old days". It's a welcome softening of the financial blow but, more importantly, it seems to suggest that Google is succeeding in providing more targeted traffic. Anyone else noticing a similar effect?
Just like to add this was from "search" only not "content" implying users can't see what they want and go for the ads.
IMHO any patent regarding software has to be suspect on a prior existance basis. Perhaps I could have one on "produce listings of extracts of existing pages on the internet in response to a user entered enquiry" rather like Amazon's "one click to purchase" one.
I don't think all the patent may have be introduced in Bourbon due to results seen on my topic area.
However I do understand the massive problems involved in applying multi-factor filters -i.e its really easy to get tit wrong or get the right results in one area the wrong results in another.
Added - this is a site that has been around for years with plenty of backlinks, no underhanded type stuff, and has always ranked extremely well on very competitive words and phrases.
[edited by: Marval at 1:20 pm (utc) on June 28, 2005]
And I think you're right they would only need to implement a couple of simple features from the patent to have a major effect. As I said I think that discounting the recent scraper link bonanza and valuing pages that weren't created years ago more highly for technology-based keywords would go a long way to explain my general fall and its particular pattern. And if that's what was done and searchers were happier with the SERPs as a result then I'd just learn to accept it.
This is where the lack of any information from Google is criminal. If they are tweaking the algorithm on historical grounds or whatever they should say so. They should be able to justify what they are doing or not do it. The fact that we are all hanging on googleguy's postings for the slightest insight into what the googleplex is thinking is ridiculous from what is now the world's biggest media company. Especially when what they are selling is ultimately our media!
This sector is one that could be classified is evergreen in that the content is never really out of date.
His site got hit in March and has since recovered.
Our site also in a sector that should be considered evergreen is on its third trip to the 75% club (actually worse than that). We thought that we had our duplicate content problem cleaned up when we watched a pile of stuff reappear on our ip address for the site we had just cleaned up.
We have a number of questions into Google at the moment, doubt if we will get much more than a form email back. But we did provide plenty of information using their own search engine to point out errors in their system.
Drawing conclusions about SERPs from Adwords is interesting, as is drawing conclusions about a site running Adsense is in danger of getting depressed by Google.
A more likely play would be that some folks are targeting keywords to produce income and are using known holes in the system to knock out the sites with high SERP placements if possible.
This would explain why sites that don't have adsense also would get hit as well.
We had a Google cache of one of our site pages exposed as did MikeNoLastName and as did Tom and some 67,097 other pages. Would these be duplicate content problems who knows.
In short it could be just about anything, including the implementation of parts of the patent to simple errors in the system to folks taking advantage of holes in the system.
Google has stated that the reason for Bourbon was to implement signals of quality.
Error free large software systems are something that have yet to be produced.
Multiple control variables and filters in a feedback based system are a process control nightmare.
So everyone please get your bets down the wheel she be a spinning.
>In short it could be just about anything, including the implementation of parts of the patent to simple errors in the system to folks taking advantage of holes in the system.<
Exactly. And your guess is as good as mine ;-)
>So everyone please get your bets down the wheel she be a spinning.<
She? From the day I heard of that famous The Fat Lady whom took her a month to sing for us a simple song, I don´t like SHEs anymore ;-)
I have a funny feeling she may stop singing - assuming she has actually sung!
theBear: "A more likely play would be that some folks are targeting keywords to produce income and are using known holes in the system "
Currently I can spot a number of sites above us that appear to have "won" using techniques that "lost" before. Hence the comment above, or perhaps it is in the part of the patent that has yet to be introduced as Bourbon appears to have allowed these sites through.
Although "Signals of Quality" may be in the eye of the beholder.