Forum Moderators: open
Woke up this morning to find that Google has DROPPED the homepage from the index. I put in the URL, and Google returns a "Sorry, no information is available for the URL" message.
All 150K other pages on the domain continue to remain indexed, however. It is ONLY the homepage that has been dropped.
Anyone else seeing this?
[edited by: bakedjake at 9:36 pm (utc) on Aug. 14, 2004]
Two possibles come to mind - could it be that if links are www.domain.com and domain.com and www.domain.com/index.html and just index.html that a dup filter could be triggered (if such exists)?
I have seen a site get hit for use of multiple forms of links to the homepage - even from the same pages on the site. Those folks also used meta refreshes and messed up altogether in a few ways, but that homepage thing was glaring.
Another - sometimes people pull "funny stuff" out there. Could there be anything like that happening, pulling something with redirects? There was some 302 garbage going on a while back. Any very unique phrases that could be looked for?
OK.. maybe a bug.
The theory is that somehow Google lost its most current index of some large sites and has reverted back to an older version of the index. Therefore, newer pages are not indexed properly. That would then explain why Googlebot has been going nuts in indexing these sites in the past week - Googlebot has hit me more than 150,000 times since Monday. Having realized their error, Google folks are sending Googlebot out to recover what it is missing.
I don't know if I'm seeing confirmation here where none exists but if Marcia is correct that this same problem occurred months ago, then maybe it is further evidence that Google has accidentally reverted back to a previous index. That could be good news because things will get back to normal eventually.
[webmasterworld.com...]
Another disappearing act less than a month ago
[webmasterworld.com...]
The theory is that somehow Google lost its most current index of some large sites and has reverted back to an older version of the index.
I can disprove this theory. As I've said, the problem is limited to the homepage, and only the homepage. No other pages are affected.
Marcia, I thought about the removal bug, but common sense kinda points me away from that too. I would expect a potential competitor to remove more than just MY homepage if they were trying to pull something.
I think this is an spider bug. The duplicate links/content point is interesting.
[webmasterworld.com...]
If this problem didn't hit Google I'd be more worried. Instead, I wonder if they have an algo-out-of-control...
Somehow I don't think you'll get same-day service like Google did in restoring their homepage to the index! ;)
...could it be that if links are www.domain.com and domain.com and www.domain.com/index.html and just index.html that a dup filter could be triggered (if such exists)?
Technically it shouldn't happen, it should be figured out; but that doesn't always seem to be the case. I've seen a few sites run into problems in different ways from inconsistent linking - within the sites themselves.
Google wants to take out sites that they 'think' (their criteria) is manipulating their search returns in hoping for a better ranking.
Your site seems to be catched by the filter and is dumped to the trash can.
I have been observing this since October 2003 and they are doing the same thing, sometimes the outcome is good and sometimes it turn sour and this month, it seems to hit quite a lot of 'established' site. I have notice at least 4 sites that used to rank #1 with their keyword for years now no way to be seen (and my own opinion think that they deserve the place).
Google objective is clear, is just the current algo doesn't work 100% effectively.
Manipulating in terms of over-optimization the site, utilizing methods such as buying links, excessive link exchange, H1 tags, ALT stuffing, etc.
IMHO, it has nothing to do with duplicate content filter, or www vs non-www, or the conspiration theory on google try to make more more in adwords. What they want to archieve is to control the level of manipulation.
Unluckly, some people that is not manipulating, looks manipulating! and they become the victim of the right intention of Google.
even worst, there are more that we can't control which I believe has contributed to the increasing number of MIA victims (many which are innocent).
With so much automated tools today that will add your link automatically and ask you for link exchange,
and tools that simply crawl search engine SERPs and make it a page with adwords in hope to make some money,
-> many falsed links are created.
This seems to be good for YESTERDAY. But today, it can be the poison on so many innocent established sites MIA. Too many links, with long anchor text, that is EXACTLY the same as your page title! (this tools read your page and use your title as the achor tex. Usually, titles are long)
You don't have control over this. But sadly, I believe Google's new filter has not been able to take this into consideration. Thus, these falsed links, have a strong contribution on triggering the MIA penalty.
* No Control = No Harm?
There is a believe where
Google will not punish you for what other webmasters do because you don't have control over them. For example, links from bad neighbourhood, etc.
But it seems funny that the same people strongly believe
Google will REWARD you for what other webmastesr do
because you don't have control over them. For example, links from good neighbourgood.
Knife - Cut vegetables; Weapon
Stocks - Make money; Lose Money
Drugs - Relief Pain; Abuse
War - Revolution; Rebel
If backlinks can serve your site, it can bite you as well.
The one example that I've studied a little was cross linked but did not have duplicate content (that I saw).
I notice similar case as yours in one of the sites I monitor.
IMHO, the reason for this due to the implementation of Local Rank. In local rank, Google doesn't want a few sites from the same owner appear at the same time in the SERPs for a search. That's the possible reason why only www or your subdomain website appear. This also apply to webmaster that have a few websites (different domain) that rank well for the same keyword, now I notice they are filter to one site left in the SERPs.
For some people who read this, they will definitely ask how Google knows which sites are from the same owner. There are not very smart well on this but they are using 2 not-so-good but good-enough methods.
1. WHOIS
2. C Class IP Address
One really weird thing I am noticing is that when searching for "www.website.com" (my forum site) on Google, I'm seeing what seems to be a nefarious "prepaid legal services" site coming up in place of my forum site as Google's cache of my domain. Should I be worried?
- Very generic Title ("Blues Widgets") with no other text.
- Possible duplicate content on some internal pages (but none of these pages have dropped from the index, only the main page and it has 100% unique content).
- Home page is the only page getting linked to (none of the internal pages have links going to them).
- It has received a higher rate of gaining links than was normal previously (the site has been up for 4 years). It would gain about 1 link per month in the past. This past month, it got about 8.
I have removed ALT tags and made it a normal looking page... may be google find me innocent :)
Google DOES NOT penalize for using Alt tags. For gosh sakes, they're there to improve usability! Google may not give them any weight, but they certainly don;t penailize for having them.
If you're a PR0, you've either been caught doing (or linking to) something bad (i.e. doorway page, spam, invisible text, etc.), or you've been caught by the mysterious "glitch" that snags any number of sites after each algo "tweak".
I'm going under the assumption that this is a bug and not a penalty. I have no proof of this, other than my experience. It doesn't look or feel like a penalty to me.
FYI to others experiencing this problem - I am NOT talking about a PR0 penalty. The homepage is still returning PR via the toolbar, Google directory (!), and other PR checking tools.
So far the best guess I've heard has been some sort of combination duplicate filter/linking issue. After close investigation, there are currently four ways we are linking to the homepage:
www.domain.com domain.com www.domain.com/index.asp domain.com/index.asp All of the internal links are linked to the homepage via the requested form of the domain + index.asp. So, if you access the domain via
domain.com, all of the internal links to the homepage point to domain.com/index.asp. Consequently, if you access the domain via www.domain.com, all of the internal links to the homepage point to www.domain.com/index.asp. The domain (not the homepage) has nearly 30,000 links pointing to it. Breaking the homepage links down a bit more:
- Roughly 1000 point to the
www.domain.com form of the homepage domain.com form of the homepage www.domain.com/index.asp form of the homepage domain.com/index.asp form of the homepage To reiterate:
www.domain.com version of the domain. www.domain.com and domain.com return nothing, while the www.domain.com/index.asp and domain.com/index.asp still return results as expected. www.domain.com/index.asp and domain.com/index.asp do not rank for any of the previous terms that www.domain.com used to rank for. www.domain.com and domain.com forms of the homepage has been dropped I plan to attack this problem in the following order:
1. My next thought is to change all of the internal links to point to
www.domain.com regardless of the requested domain or any other factor. 2. Google is currently maintaining two separate indexes of the site - one for
www.domain.com and another for domain.com (a site query on both shows a different amount of pages returned). I'd like to 301 the entire domain.com version of the site to www.domain.com. This is SOP for me on all of the other sites, but tracking reasons have prohibited doing this until this point on this particular site. 3. Email webmaster@google.com and beg for help.
Thoughts?
I have a site that is doing exactly this:
home page is showing PR, but site search shows www.example.com as URL only. No cache for the home page.
About 2 months ago, I changed the htaccess file to 301 all http: //example.com requests to http: //www.example.com
I just checked the server headers and they're just fine. Barring a hosting outage that I didn't notice - I have to assume that this is a spidering problem with G.
[edited by: PatrickDeese at 6:50 pm (utc) on Aug. 16, 2004]
And I'd search for scraper sites, dup stuff, etc.
Can't tell you how many homepages we've seen vanish for these sorts of reasons. When they were our pages that vanished, every time we found the issue and corrected it, the page came back, though sometimes with waits from 4-12 weeks.
<EDIT>Oops, missed part of the thread...almost certainly Google's maintaining two separate indexes of the site is the problem. It has happened to us. 301's should fix the problem</EDIT>
[edited by: caveman at 6:56 pm (utc) on Aug. 16, 2004]
caveman, as I've said, that's typically SOP for me. You've never heard me rant about internal link architecture before. ;-) But I inherited this site, and unfortunately this happened in the middle of us trying to fix all of this. :)
www.example.com/Google can show you the following information for this URL:
Find web pages that are similar to www.example.com
Find web pages that link to www.example.com
Find web pages that contain the term "www.example.com"
Not a very good sign - either the Google couldn't get into the site, or its in the process of being de-listed. :(
I am hoping it's a spidering problem - because suddenly about 60% of the pages are not showing their Adsense - using the alt advert instead.
Did you throw up the 301s once you discovered the problem, or are you saying that you think the 301s are causing the problem?
I have been adding the redirect domain.com to www.domain.com on all my sites for the past couple of months.
Prior to that I've only had a custom 404 page in my htaccess.
All issues aside, I am hoping that my web host had some sort of technical issue, or perhaps a misconfiguration that affected this site and that they resolved without telling me - it could also simply be that something I did screwed things up - I was making site-wide changes a couple of weeks ago and my connection went out for a couple hours - maybe Google was spidering when that happened.
I really think this is a google bug at the moment.I am seeing sites in Alexa's top 2000 returning this when the quote "Sorry, no information was found..." usually means a site has been penalised.
And as another poster said,the same thing happened to google.com
I would think it is likely you don't have to look any further.
I believe that Google is (thankfully) going after duplicate content all over the Internet. This leads to some situations where relatively innocent duplicate content gets bungled up. Sloppy webmastering is sometimes to blame, but sometimes it is normal old business of www versus non-www. The proactive thing to do is make it virtually impossible for Google to screw up, meaning make all your links absolute and have them pointing to the same consistent URLs.
Perhaps this shouldn't be necessary, but as a defense mechinasm there is no downside.