Google Is Working on an Algo Fix - to help wrongly demoted sites - Google Search and SEO forum at WebmasterWorld

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google Is Working on an Algo Fix - to help wrongly demoted sites

tedster

8:56 am on Mar 2, 2011 (gmt 0)

Here's official news that many sites have been waiting to hear. Google fellow Amit Singal is quoted in Wired:

"We deeply care about the people who are generating high-quality content sites, which are the key to a healthy web ecosystem," Singhal said.

"Therefore any time a good site gets a lower ranking or falsely gets caught by our algorithm - and that does happen once in a while even though all of our testing shows this change was very accurate - we make a note of it and go back the next day to work harder to bring it closer to 100 percent."

"That's exactly what we are going to do, and our engineers are working as we speak building a new layer on top of this algorithm to make it even more accurate than it is," Singhal said.

[wired.com...]

viggen

8:21 am on Mar 13, 2011 (gmt 0)

What do you think makes the big re-rankings differ from everflux? Is it more likely to be a significant algo change, rather than just the processing of new crawl data?

everflux --> constant reranking based on existing algo-rules,
big-reranking -->introducing of new or changed algo-rules,

thats how i understand it...

cheers
viggen

freejung

4:13 pm on Mar 13, 2011 (gmt 0)

What do you think makes the big re-rankings differ from everflux? Is it more likely to be a significant algo change, rather than just the processing of new crawl data?

The full calculation of even just plain old PageRank is a huge job. Although it is sparse, the matrix of links is very very large. Throw in the rest of the link metrics they are using now (authority, trust, ?), and you've got a massive computational challenge.

The following is pure speculation, but I think that during "Everflux" Google is using some sort of mathematical shortcut to estimate the result of the full calculation on a continuous basis. Such an estimate would accumulate errors over time, so every now and then the full calculation would have to be performed to re-normalize everything. That would also be the point at which it would make sense to roll in more comprehensive crawl data and any pending algo changes.

Reno

4:18 pm on Mar 13, 2011 (gmt 0)

Yes there are different degrees this algo hit, some lost a little and others have been buried not to be found.

In the past when there was a shift in the algo, we braced ourselves, made adjustments to our pages based on our understandings of the new priorities, and held on until better days returned once again.

This one may be different ~ there may be no going back, which makes me wonder, how long can people who had a Google dependent income hold on? A lost income stream is usually measured in weeks or (at the most) months, so for many, the plug will eventually have to be pulled.

...and if that happens, the "Panda" update will seem too warm & fuzzy. Perhaps renaming it the "Slaughter" update would be more appropriate, or maybe "Scorched Earth".

....................

walkman

5:23 pm on Mar 13, 2011 (gmt 0)

You are right Reno, this is scary, especially since they are many unknowns. But Google gave a few hints: remove 'bad' pages and wait, assuming you have many good ones.

The scary part is that, AFAIK, no one has come back from this, as of right now. (CultofMac ws manually re-instated, no matter how much Google denies it). So, either Google hasn't recalculated everything yet or people aren't telling. If I had to be it would be the first one.

tedster

5:36 pm on Mar 13, 2011 (gmt 0)

I know of at least three sites whose Google traffic has bounced back almost to before update levels. Their graphs look like hooks - big drop, then curving back gradually day by day. One is only down 4%, the other two around 7-8%

Another factor here is that the lost traffic was not well targeted, even though it was a big volume, single-word query term. It had a high bounce rate (over 90% in one case) and terrible conversion. That is still a traffic loss, so if ad impressions are the only monetization a site has it's not good news. If I ran a site that was exclusively monetized by Google search plus ads, I'd be re-thinking my business model right about now.

walkman

5:42 pm on Mar 13, 2011 (gmt 0)

Tedster, any changes there or did Google tweak something? For MANY sites a high bounce rate is legitimate, people come and find exactly what they were looking for.

crobb305

5:57 pm on Mar 13, 2011 (gmt 0)

I know of at least three sites whose Google traffic has bounced back almost to before update levels. Their graphs look like hooks - big drop, then curving back gradually day by day. One is only down 4%, the other two around 7-8%

On the sites that have seen recovery, were any changes made to the sites, or did the recovery happen on its own? How big was the initial traffic drop on those those sites (50%, 60%)?

tedster

5:58 pm on Mar 13, 2011 (gmt 0)

This was all Google. My impression is that there is a machine learning component that kicks in after an algo change. I might be wrong on that - might have been manual tweaks. But the three sites did not make changes in response to their traffic loss. They were too busy doing business and decided to wait a bit for the dust to settle.

The initial drops were in the -15% to -20% area, now bounced back to -4% to -8%

walkman

6:16 pm on Mar 13, 2011 (gmt 0)

OK tedster so they are in the 15-20% range. Not a insignificant drop but not exactly the 50-60% massacre many suffered. I have had many 10-30% ups and downs over the years

tedster

6:34 pm on Mar 13, 2011 (gmt 0)

Point well taken.

The biggest "false positive" I saw was the -40% drop at daniweb. The other big hits seem like they were the kind of content that Google was aiming at.

Most analysis I've read is, unavoidably, looking at very top-level, site-wide numbers. However, the real goodies are often in the drill-down. A mixed result that sums to a net gain or loss is more common. Which page/keyword combos took a hit and which improved? That analysis is more like panning for gold, and there is gold in there.

nomis5

6:51 pm on Mar 13, 2011 (gmt 0)

It's not Armageddon overall though. It may be to some sites but to others there must be a corresponding rise in traffic? Volume of traffic hasn't suddenly changed, rather it has been re-distributed.

One site down, another up, that always was the case and this algo update doesn't affect that basic principle, it just temporarily magnifies the effect.

I feel for the legit sites that are down, I congratulate the legit sites that are up.

Diversity is the key as others have already said.

For those who have little alternative to Adsense, diversity still applies. Multiple sites in multiple markets, with multiple design principles will help spread the effect of algo updates like this. Detailed knowledge of the site's subject matter, rather than detailed SEO knowledge will always be the corner stone of a stable site.

crobb305

6:57 pm on Mar 13, 2011 (gmt 0)

My site has 110 pages of content. 5 pages are monetized with 1 to 2 affiliate links. Those pages have the least content. If I assume those 5 pages are considered junk, and brought down the whole site 50% on traffic, then the threshold for my site seems to be about 5% (I am sure that varies based on trust/authority -- as we have seen some sites completely removed from the index). Now, I have been reading Google's advice, to delete the thin pages, or add "noindex". So, I added "noindex" to one of the pages, added brand new content to three others, and I deleted ads from the 5th page which has never performed well. We'll see what happens.

My traffic drop of 58% to 62% happened Thursday morning and continues today. I agree with Walkman that some new RE-RANKING happened this past week, or else the Feb 24th algorithm is just taking time to spread.

walkman

7:01 pm on Mar 13, 2011 (gmt 0)

The biggest "false positive" I saw was the -40% drop at daniweb. The other big hits seem like they were the kind of content that Google was aiming at.

Just on top on my head: universetoday.com and askthebuilder.com are two other very false positives. They may have overdone it with ads but the content was original and very good. Universetoday is a PR8 no less, PR is not everything but links do matter. Askthebuilder has thousands of very specific answer detailed with pictures, videos and question /answers in the comment section of the article.

falsepositive

7:08 pm on Mar 13, 2011 (gmt 0)

I did a traffic analysis on my site and what I found really disturbed me. As suggested by Vanessa, I checked my high traffic keywords that fell -- these are my most important pages. Many of them were in depth reviews of services/products in my niche. These fell heavily while those articles that were non-competitive and were about trivial things unrelated to my true niche expertise rose tremendously.

Vanessa suggests to look at traffic. The articles with the silly keywords were to me, thin and useless. Yet they skyrocketed in ranking.

The important articles I wrote were niche specific and very well written. They plummeted. I saw nothing wrong with my reviews. What I saw, however, was that by virtue of this page being so successful, that duplicators and scrapers stole the content to use as copy for their business sites. There were a tremendous number of them.

I have many of these examples. It makes no sense for me to rewrite these pages as they were based on my experience and own careful review of products. I feel almost hopeless to think I have to rewrite my entire website in order to gain back 2 years of work. It's almost like I have to start from scratch.

I find this grossly unfair. Especially if by rewriting, it may mean the cycle repeats -- more scrapers, more duplicators and down you go again.

I am doing what I can to fix everything else on my site. But it's just such a difficult battle to fight if they are heavily scoring the "duplicate content" factor.

I have also checked what's happened to my competitors. I have one competitor that I respect quite a bit. It's the same story with him. He was flying high on some really fantastic reviews on these products. These were completely unique pieces that were well thought out, interesting, etc. He was knocked out pretty badly too. And I ran a check -- same exact story! Scrapers copied and pasted his review INCLUDING comments he had onto their sites. These scrapers were trying to sell a service, created a blog and simply stole our content. They interlinked their own pages for keywords they cared about.

What I've seen is that less popular players in our niche are now the ones in the "winner's circle". But perhaps, once they've taken our place and grown big enough, they'll be the target of scrapers and will go down in time. Who knows?

If we are entering a new era in search, this is going to be a very difficult time for smaller businesses to survive.

If you don't have a strong enough brand, you'll be lumped together with scrapers, spammers and content farms.

I agree with some who've insinuated that only big sites, smaller sites or those who are in less competitive areas may survive this.

tedster

7:45 pm on Mar 13, 2011 (gmt 0)

I agree that the scraper problem actually became much worse with the Panda update. Speaking at SMX this past week, Matt Cutts made comments that show Google knows there is a problem. As reported by Vanessa Fox:

Continued Algorithm Changes

Google is working to help original content rank better and may, for instance, experiment with swapping the position of the original source and the syndicated source when the syndicated version would ordinarily rank highest based on value signals to the page. And they are continuing to work on identifying scraped content.

[searchengineland.com...]

CainIV

8:28 pm on Mar 13, 2011 (gmt 0)

The whole way they need to treat syndication needs a serious overhaul. Google needs to provide a secure system where websites can accurately publish data before it is released to crawling so that the original publisher is credited with the material.

Right now the system is a flustercluck to be quite honest. 1200 word research articles we have written have been scraped up and down to the point where no unique searches from that content, nor the piece itself, even rank in top 5 positions anymore.

Any serious attempt at creating a QS for content HAS to absolutely include a system which addresses this, otherwise its just not accurate enough and is going to continue to frustrate publishers and owners.

universetoday

8:37 pm on Mar 13, 2011 (gmt 0)

I think that's an amazing idea CainIV. Suggest it to Google if you haven't already. I'm sure it would be easy to create Wordpress plugins, etc that would support this "first posting" guidelines.

I'm starting to think my site is one of the group that bounced back. I was down about 20% for a few days, but now my traffic is higher than ever. It's really difficult to figure out where I stand across thousands of keywords, especially with some recent news events (tsunami, supermoon, etc).

I've only taken minor actions so far, so I don't think anything I've done had an impact in the change of rankings.

Reno

9:12 pm on Mar 13, 2011 (gmt 0)

... there are many unknowns. But Google gave a few hints: remove 'bad' pages and wait...

So in terms of "removal", we can:

~ meta noindex each of them;
~ remove them from sitemap.xml
~ block them in robots.txt

Is there anything I'm missing? In addition to noindex, is anyone also recommending noarchive and nofollow?

...................

CainIV

9:38 pm on Mar 13, 2011 (gmt 0)

I think that's an amazing idea CainIV. Suggest it to Google if you haven't already

Thanks Universetoday, but I certainly cannot take credit for the idea, many of the longstanding pioneers here have been talking about this idea for years.

Hopefully enough eyeballs however are on this series of threads due to the update that the appropriate people might take notice and put some valued thinking into the concept.

Content_ed

10:23 pm on Mar 13, 2011 (gmt 0)

@CainIV,

If you willing syndicate, that's one thing. At this point, Google either can't tell the difference between syndicated and scraped content, or doesn't care.

crobb305

1:07 am on Mar 14, 2011 (gmt 0)

Looking in GWT, I am finding the following and I don't know how to resolve:

1) Dozens of 404s to some of my pages with discovery on dozens of portals that provide Google search results. Most of these are junk portals in blue/white templates with a button that says "Google Custom Search". Odd characters are being put into the URLs that Googlebot tries to crawl, thereby rendering 404. For example, www.example.com/&837262intendedpage.htm I don't know how to eradicate these if they are being "discovered" on search portals.

2) I house my affiliate links in a php redirect file. Despite blocking this file in robots.txt, for years G has indexed these urls. I only have about 5 valid affiliate links in the file, but over the years many have come and gone, yet Google still tries to crawl them (depsite being served 404, and despite the file being blocked by robots.txt). The urls have a format like www.example.com/go/go.php?url=someoldparameter
Well, old parameters (from a year ago) are still being fetched, but no longer exist (rendering 404 in GWT), and if I do a site: search, I see ALL of these listed. I thought blocking in robots.txt meant Googlebot knew NOTHING about the url (that is what was stated in the SMX article [searchengineland.com...] Nevertheless, I assume I can put a 410 on each of these in htaccess to get Google to stop trying to fetch them, stop indexing old/dead urls, and stop returning 404s?

I think my site is being negatively affected by so many 404s (which, coincidentally, are only to my "thin" pages) and the old/dead affiliate redirect urls.

dickbaker

2:28 am on Mar 14, 2011 (gmt 0)

crobb305, there were quite a few comments in another thread a few weeks back about unusual 404's in Webmaster Tools. I had hundreds of them, almost all with unusual URL's that made no sense.

Don't know if anyone else who was demoted by this update had unusual 404's.

TheMadScientist

2:32 am on Mar 14, 2011 (gmt 0)

Well, old parameters (from a year ago) are still being fetched, but no longer exist (rendering 404 in GWT), and if I do a site: search, I see ALL of these listed. I thought blocking in robots.txt meant Googlebot knew NOTHING about the url (that is what was stated in the SMX article [searchengineland.com...] Nevertheless, I assume I can put a 410 on each of these in htaccess to get Google to stop trying to fetch them, stop indexing old/dead urls, and stop returning 404s?

Blocking GoogleBot in the robots.txt means they know NOTHING about the URLs, except they exist if the ever find a link to them anywhere on the Internet, and will therefore often return the disallowed URLs for a site search.

They know there are URLs, but do not have a clue about the content, because they don't visit, and if you're block is really configured correctly they will not ever see a 410, unless GBot is broken, and I'm sure if this is the case it will likely be fixed.

If you removed the block when you changed the URLs, then GoogleBot would access them and receive the 404 ... If you have not removed the block, then make sure it's correct, because GoogleBot is very compliant in my experience and usually an error where GBot is crawling a 'disallowed' URL is on the part of the site owner.

While Google won't crawl or index the content of pages blocked by robots.txt, we may still index the URLs if we find them on other pages on the web. As a result, the URL of the page and, potentially, other publicly available information such as anchor text in links to the site, or the title from the Open Directory Project (www.dmoz.org), can appear in Google search results.

[google.com...]
Do you have the URLs themselves blocked individually, or:

User-agent: *
Disallow: /go/

crobb305

3:12 am on Mar 14, 2011 (gmt 0)

Do you have the URLs themselves blocked individually, or:

User-agent: *
Disallow: /go/

I have it blocked as you describe, by blocking the whole folder.

If you removed the block when you changed the URLs, then GoogleBot would access them and receive the 404

Googlebot knows the old/deleted links in that folder are 404 because they are listed as such in GWT. Each dead url shows 404. I guess I need to list them individually in htaccess as 410 [G] so Google will stop fetching them.

crobb305, there were quite a few comments in another thread a few weeks back about unusual 404's in Webmaster Tools. I had hundreds of them, almost all with unusual URL's that made no sense.

Don't know if anyone else who was demoted by this update had unusual 404's.

This is interesting. I didn't see that thread, but I will look for it. The day before my site was penalized (Thursday), Googlebot encountered over thirty 404s to one of my valid pages. The odd thing is, the way the 404s are listed in Google Webmaster Tools, they are displayed in the CORRECT format. If I click the url listed, I am taken to my site, and the correct url displays in the address bar, with my custom 404 content. But, if I highlight the address in the address bar, copy, and repaste back into the address bar, it shows the incorrect address that Googlebot saw. It's all so odd.

walkman

3:20 am on Mar 14, 2011 (gmt 0)

... there are many unknowns. But Google gave a few hints: remove 'bad' pages and wait...
---
So in terms of "removal", we can:

~ meta noindex each of them;
~ remove them from sitemap.xml
~ block them in robots.txt

Is there anything I'm missing? In addition to noindex, is anyone also recommending noarchive and nofollow?

Reno, you probably want google to see as noindex and then block or 404 it. Maybe the 404 lingers for a while, while noindex may be faster

TheMadScientist

3:26 am on Mar 14, 2011 (gmt 0)

Googlebot knows the old/deleted links in that folder are 404 because they are listed as such in GWT.

crobb305, there were quite a few comments in another thread a few weeks back about unusual 404's in Webmaster Tools.

Have you checked your raw access logs or are you only going by GWT? If there are strange reporting errors, it could be the block is being mis-interpreted as not found ... I'd check my logs before assuming GBot visited them from a 'buggy' system, which GWT definitely is.

You also might try fetching them as GoogleBot to see what happens.

But, if I highlight the address in the address bar, copy, and repaste back into the address bar, it shows the incorrect address that Googlebot saw. It's all so odd.

That sounds like a URL encoding issue, and might very well be why it looks like pages are getting spidered when they are blocked. An encoding error could possibly 'create' a different page, not covered by the block.

Just an Example: %2Fgo%2F is /go/ encoded. If there is an encoding error or issue somewhere it could appear as /go/ in HTML but may be requested as %2Fgo%2F, which is a different URL than /go/.

tedster

3:38 am on Mar 14, 2011 (gmt 0)

Maybe the 404 lingers for a while, while noindex may be faster

410 is usually the fastest for removal, and it knocks back crawling much faster, too.

Jane_Doe

3:40 am on Mar 14, 2011 (gmt 0)

The odd thing is, the way the 404s are listed in Google Webmaster Tools, they are displayed in the CORRECT format. If I click the url listed, I am taken to my site,

I had that same problem.

crobb305

3:42 am on Mar 14, 2011 (gmt 0)

Me: If I highlight the address in the address bar, copy, and repaste back into the address bar, it shows the incorrect address that Googlebot saw. It's all so odd.

Reply from TheMadScientist: That sounds like a URL encoding issue, and might very well be why it looks like pages are getting spidered when they are blocked. An encoding error could possibly 'create' a different page, not covered by the block.

Just an Example: %2Fgo%2F is /go/ encoded. If there is an encoding error or issue somewhere it could appear as /go/ in HTML but may be requested as %2Fgo%2F, which is a different URL than /go/.

This is a different 404 issue from the /go/ problem. Refer to my point #1 in one of my posts above: "1) Dozens of 404s to some of my pages with discovery on dozens of portals that provide Google search results. Most of these are junk portals in blue/white templates with a button that says "Google Custom Search". Odd characters are being put into the URLs that Googlebot tries to crawl, thereby rendering 404. For example, www.example.com/&837262intendedpage.htm I don't know how to eradicate these if they are being "discovered" on search portals."

These are not from the /go/ problem, these are different 404s from Googlebot trying to fetch pages from search portals. This started a couple of months ago. I never saw this until recently.

MadScientist, thank you for your replies, and your suggestion on checking my raw logs. I will admit to being a little lazy when it comes to that, but when rankings are steady for years, I have really come to just glance at spidering data in GWT. I will be looking more closely at raw logs this week.

I had that same problem.

Jane_Doe, how did you rectify it? There are so many discovery points, that it will be hard for me to ban each of them in htaccess. I just find it an odd coincidence that the largest number of 404s came the day before my site was hit.

[edited by: crobb305 at 3:49 am (utc) on Mar 14, 2011]

TheMadScientist

3:47 am on Mar 14, 2011 (gmt 0)

Got it...

www.example.com/&837262intendedpage.htm

I'd try to match the pattern and redirect it to the correct page, because a click is a click and you might get a visitor from the deal who's trying to find their way out of the garbage they stumbled into ... The other option is a 410, but then I would be sure to include a 'mini-sitemap' for real visitors to get to the index page of your site and the main content 'index' pages, because the fact that it's been 'intentionally removed' doesn't mean it needs to be blank or anything, in fact to get IE to serve content from custom error pages it has to be a certain length or it just serves it's default.

This 325 message thread spans 11 pages: 325