Forum Moderators: Robert Charlton & goodroi
"We deeply care about the people who are generating high-quality content sites, which are the key to a healthy web ecosystem," Singhal said.
"Therefore any time a good site gets a lower ranking or falsely gets caught by our algorithm - and that does happen once in a while even though all of our testing shows this change was very accurate - we make a note of it and go back the next day to work harder to bring it closer to 100 percent."
"That's exactly what we are going to do, and our engineers are working as we speak building a new layer on top of this algorithm to make it even more accurate than it is," Singhal said.
[wired.com...]
What do you think makes the big re-rankings differ from everflux? Is it more likely to be a significant algo change, rather than just the processing of new crawl data?
Yes there are different degrees this algo hit, some lost a little and others have been buried not to be found.
I know of at least three sites whose Google traffic has bounced back almost to before update levels. Their graphs look like hooks - big drop, then curving back gradually day by day. One is only down 4%, the other two around 7-8%
The biggest "false positive" I saw was the -40% drop at daniweb. The other big hits seem like they were the kind of content that Google was aiming at.
Continued Algorithm Changes
Google is working to help original content rank better and may, for instance, experiment with swapping the position of the original source and the syndicated source when the syndicated version would ordinarily rank highest based on value signals to the page. And they are continuing to work on identifying scraped content.
[searchengineland.com...]
... there are many unknowns. But Google gave a few hints: remove 'bad' pages and wait...
I think that's an amazing idea CainIV. Suggest it to Google if you haven't already
Well, old parameters (from a year ago) are still being fetched, but no longer exist (rendering 404 in GWT), and if I do a site: search, I see ALL of these listed. I thought blocking in robots.txt meant Googlebot knew NOTHING about the url (that is what was stated in the SMX article [searchengineland.com...] Nevertheless, I assume I can put a 410 on each of these in htaccess to get Google to stop trying to fetch them, stop indexing old/dead urls, and stop returning 404s?
While Google won't crawl or index the content of pages blocked by robots.txt, we may still index the URLs if we find them on other pages on the web. As a result, the URL of the page and, potentially, other publicly available information such as anchor text in links to the site, or the title from the Open Directory Project (www.dmoz.org), can appear in Google search results.
[google.com...]
Do you have the URLs themselves blocked individually, or:
User-agent: *
Disallow: /go/
Do you have the URLs themselves blocked individually, or:
User-agent: *
Disallow: /go/
If you removed the block when you changed the URLs, then GoogleBot would access them and receive the 404
crobb305, there were quite a few comments in another thread a few weeks back about unusual 404's in Webmaster Tools. I had hundreds of them, almost all with unusual URL's that made no sense.
Don't know if anyone else who was demoted by this update had unusual 404's.
... there are many unknowns. But Google gave a few hints: remove 'bad' pages and wait...
---
So in terms of "removal", we can:
~ meta noindex each of them;
~ remove them from sitemap.xml
~ block them in robots.txt
Is there anything I'm missing? In addition to noindex, is anyone also recommending noarchive and nofollow?
Googlebot knows the old/deleted links in that folder are 404 because they are listed as such in GWT.
crobb305, there were quite a few comments in another thread a few weeks back about unusual 404's in Webmaster Tools.
But, if I highlight the address in the address bar, copy, and repaste back into the address bar, it shows the incorrect address that Googlebot saw. It's all so odd.
Me: If I highlight the address in the address bar, copy, and repaste back into the address bar, it shows the incorrect address that Googlebot saw. It's all so odd.
Reply from TheMadScientist: That sounds like a URL encoding issue, and might very well be why it looks like pages are getting spidered when they are blocked. An encoding error could possibly 'create' a different page, not covered by the block.
Just an Example: %2Fgo%2F is /go/ encoded. If there is an encoding error or issue somewhere it could appear as /go/ in HTML but may be requested as %2Fgo%2F, which is a different URL than /go/.
I had that same problem.
[edited by: crobb305 at 3:49 am (utc) on Mar 14, 2011]
www.example.com/&837262intendedpage.htm