Welcome to WebmasterWorld Guest from 126.96.36.199
That's the Ghost Dataset.
I don't see it as a bad thing those home pages went "missing". I see it as a good thing.
Would you be kind enough to explain your rationale for this please.
[edited by: tedster at 9:28 pm (utc) on Nov. 5, 2008]
It has to do with 301s and the forward slash (lack of) at the end of the url...I suspect those authority sites who are still having issues run a content management system of some sort...I'm thinking Joomla?
ex: www.mydomain.com is setup to 301 redirect www.example.com/ this is a must do with joomla at least, otherwise you have two instances of the same page that could cause duplicate content issues, and most importantly split the weight (same reason/idea here with the commonly practiced/acceptable 301 redirect non www. to www. )
so, the penalty/filter/glitch problem comes in when someone types in an invalid url
ex: www.mydomain.com/bad-ddfff it should trigger a 404 header, but instead (because all pages wihout the forward slash are 301 redirected) it gives it a 301 header, (200 header in one domain that is having problems) , instead of a proper 404 not found...
Obviously this works in reverse (if you have a www to non www 301 redirect setup) as I also noticed an authority domain (still having issues) had a 301 redirect setup on any invalid urls that did have a forward slash on the end, to 301 redirect without a forward slash
ex: http://www.example.com/filename/ would 301 to http://www.example.com/example
I've had these redirects in place for sometime now...very strange, but as spam techniques evolve, Google must keep up, and account for all this...
I guess I will be looking for a solution! Anyone know of a joomla solution for this?
[edited by: tedster at 4:48 am (utc) on Nov. 14, 2008]
[edit reason] use example.com - it can never be owned [/edit]
If you type in- www.mydomain.com/baddjdjf/ it does produce the correct 404 header/response...but that's a rarity (IMO) as the majority of users never get the slash in. I would say less than 5% probably add a slash to the incorrectly typed url (looking at my error logs).
Popular/industry leading type sites with lots of traffic, produce alot more errors because of sheer volume- and they would probably be subjected (popular sites are crawled more) to more scrutiny and consequently being flagged/filtered. I think this why long standing, "authority" sites who play by every rule are getting hit.
My thoughts are October 31st was not a glitch, but a new spam filter test, specifically related to curb redirection spam.
Spammier sites don't really care for a real 404 error because they want the revenue that a click on an ad will bring; therefore, they make every page render 200 OK, 301/302 etc...so the user does what the spammer wants them to do.
This was a Google crackdown...but has lead to an index implosion with all the .edu spam that has invaded. I anticipate we will probably see a slight reversal very soon, once the data is gathered and metrics are reviewed...
Some areas are now dominated by 2005 blog posts - did Google just rollback 3 years instead of 3 weeks?
Their recent incompetence to produce a half reasonable set of results beggars belief. Matt Cutts said everything would be back to normal - I did not realise that this meant normal from 3 years ago!
Addendum: Countless search results that should return several thousand results are now returning under 100 - some massive data loss or a massive new filter must be at work.
Looks to me like another June 4th shake up is going on and started about 4 hours ago - massive changes in areas that I am looking at with authority after authority site being removed.
Whatever they're doing is causing our 19 page blog that is hosted on google's service to come up ahead of the main website with over 1,000 relevant pages for this specific keyword... Nice, not!
One problem with redirecting all requests that would ordinarily return a 404 response with a 301 to a 200 is that search engine bots will think all non-existent pages are real pages and lots of badness can happen because of that. As Rae notes, your 404 pages will get indexed. Seach engine bots may end up spending lots of time crawling non-existent pages, which may mean they don't have time to crawl the pages you really care about being indexed. If you don't have a robots.txt file, a request for it will return a 200... Soft 404s just can cause trickiness that I wouldn't suggest setting up.
Spot on to my issues. I corrected it this evening... any idea if I need to file a reinclusion request or will it take a couple of days or so to recover?
[edited by: tedster at 5:09 am (utc) on Nov. 13, 2008]
[edit reason] add quote box [/edit]
301s should be used to move a page that did exist once to one that does exist now- not to catch URLs that never existed.*
You can still serve up custom 404s, as long as it has the right HTTP header, and is served by the server rather than redirected to a /404.html page
*Note: AFAIK, there is nothing wrong with 301-ing a non-trailing slash to a trailing slash for folders either, or 301-ing mistyped IBLs to the obviously-correct URL
Also, I am not affected, and do not have any non-specific 301s
tedster, can you make it clear what you mean by "true 404 http status"?
Look at the http headers that your server sends and be sure that the response is really a 404 code, rather than something like a redirect to a "custom error page" that actually returns a 200 code.
And is the conclusion now to remove our 301 redirect?
No, I don't get any such conclusion. Each situation may be quite different.
I'd say don't take any action until you understand YOUR situation very clearly. If you need to learn something more about server technology, then use a search engine and study the authoritative - but don't jump into some action because of something someone decides to do about their situation.
When I said that I thought it was a 301 filter that is causing this yoyo effect I meant that it looks to me as though Google took away the authority that 301's pass on, not that they are penalising sites that have 301's pointing to them.
However the 404 issue is somthing that people should sort out regardless IMO.
Matt said there was some data let out of the update and that was corrected. He also said there were some IP issues and those have as well been corrected. He did not go into any details so I have nothing else to report that may help in determining the reason some sites returned and some haven't.
One of my sites never returned. It was exhausting trying to figure out by myself if something was afoot or not. Now I have to thank you very much for sharing your chat with MC here. The way out is not clear for me yet, but it is much funny again to research as it should ever be.
I had my brand name .ws site redirecting to my .com at godaddy for over three years.(the .ws domain was the initial site launched, with all the content, etc...)
One year after launching the .ws site, the .com version of the .ws domain was available for purchase (1 in a million, golden opportunity). An old 1998 domain(undeveloped). So I bought it, and moved everything to the .com (over 3 years ago.) I figured in the long run the .com would be the most successful (I rolled the dice and won with this decision)
Anyhow, I've had a registry 301 redirect in place at godaddy for these three years, and never paid close attention to any messages, etc... it's a 301 redirect, once you hit domain forward, you assume it's a done deal, no worries right?
Well, I should have paid closer attention to the godaddy disclaimer/message of:
"Please note that some search engines will show a 301 "Moved Permanently" as a 302 "Moved Temporarily"."
Apparently, godaddy is not using a standard straight redirect, but a redirect chain 302-301...because of infrastructure issues!
Coupled with my onpage issues (flawed CMS) causing internal redirect chains of 301-404 for invalid requests, this has turned into a double whammy...not good.
I took down the registry redirect chain 48hrs ago, and fixed all the onpage stuff 24hrs ago...
I also filed a lengthy reinclusion request after the onpage fix to explain everything...had no clue until tonight about the offpage stuff...
My questions going forward:
1) Should I leave all 301'ing off? What if I implemented a 301 server side on the same server my .com resides? The .ws has managed to attain (with no promotion, content in 3 years) a PR5 ranking (PR5? WHAT?) and in all actuality is my brand name. (i.e. like owning yahoo.com and yahoo.ws)
Strangely enough, the .ws is a pr5 and the .com for most 2008 has only been a PR3 max, go figure?!?!?
I wonder if I'll be ultimately better off with no .ws and .com redirect (server side)? Maybe the .com was being held back/penalized (from a PR/rankings standpoint)all this time, in a sense of not fully reaching it's entire ranking potential, because of this offpage redirect chain?
July 25th 2008 is when the onpage flaws were inadvertanly created, I'm sure that aspect is just now catching up with me now.
The rankings have been uber good for super competitive terms. May 2007 was the last time I've experienced any ranking flucutations less October 31st, and Nov 6th with no recovery.
Tedster - thanks for the recent email and thoughts
Thanks for taking the time to obtain additional input, my personal web issues after 10/31 took a little longer to resolve than I'd like, but they did come back and have settled into an acceptable range. It's nice to get some confirmation from those "inside".
The fix is a done deal. The issues have been corrected the serps are as is. If you were effected and haven't come back then you need to dig into why this has happened to you.
Matt said there was some data let out of the update and that was corrected.
It was done on Nov 4th, when all the DCs aligned like an eclipse..
Not sure why people didn't see (or believe) my posts on that.
And nice to see MC confirm the Ghost Dataset(TM).
Although i predict the way this "update" rolled out, will happen again soon.
My website using subfolder, like:
But I noted that MSN and LIVE.COM has index my pages with the following URL (without forward slash):
And Google is index my pages correctly as:
So does this mean that my pages got 2 version in the internet?
one with forward slash and one without forward slash.
Co-incidently, I also affected by the current update and all my pages in subfolder are disappeared on ranking.
But all my other pages (in root directory) like: http://www.example.com/page.htm
not affected at all and still remain top ranking in google now. Only those pages in subfolders affected by the current update!
Anyone got any good explanation?
[edited by: tedster at 8:51 am (utc) on Nov. 14, 2008]
[edit reason] switch to example.com - it will never be owned [/edit]
Of course, traffic is waaaaaaay down now.
We do not see any reason for getting hammered. Anyone?