Forum Moderators: Robert Charlton & goodroi
I do not see this "issue" with any other sites I spot check - their home page is always the first listing for their site:theirsite.com searches.
So, should I assume there is a "problem" somehow with my site?
The reason I am focused on trying to see "what is wrong", is my rankings have been slightly reduced 3 times in past 8 weeks, and most recently, on Monday (they came back in 2 days twice, and once it took 8 days). By reduced, I mean, organic traffic is down by 50% overall, and I can see instances where rankings have been muffled, like, former page 1 #1 ranking is now page 2 #1.
Anyway, I know something is amiss. (<grin> Please don't tell me rankings shuffle, I know that, the fact is, overall traffic is down by about 50%, so something has occurred on a macro level....)
My question really is, is their a hint to the question of "why have my rankings been damaged?", in the fact that my home page does not show up when I do a site:mysite.com search. In fact, I can't see my home page anywhere! Just 40,000+ other pages from my site.
I even used to have sitelinks for my site - now they're gone. Also, when I do a standard "mysite" search, I still get a double indented result, with 2 close-to-top-level pages from my site, neither of which are my homepage, and the sitelinks are gone. In webmaster tools, I still have the sitelinks listed...
And I'm also sure other scenarios exist - have you checked other data canters, explicitly using their IP address? How about the results on Caffeine?
Using www2.sandbox.google.com I get different results, but no home page.
As for spidering, I have downloaded all my errors from Webmaster tools, and my home page is not there.
However, I have 2 possible concerns, the only things I can come up with:
1. I did have a batch of URLs which I 301ed into my home page. Could that be causing an issue? I thought with a 301, the old URL would die. You see, when I search site:mysite.com, these old URLs show up, but now, even worse, the content is my home page - thousands of them - this confuses me because I thought the 301 would kill the URL. I worry that I have caused all sorts of duplicate content on my home page. On top of 301ing these URLs, I also blocked them in robots.txt - though now I wonder if this simply resulted in googlebot not getting my 301 redirect because it was blocked from getting the URL at all - dunno. (I have today changed this strategy: I removed the 301 in my httpd.conf, I removed the robots block, and instead I added a meta tag noindex to these pages - my hope / understanding is that the URLs will exist as normal, (which I need for my site to function, but do not want google indexing), but the noindex will pull it from the search results).
2. The only other issue I had a while ago was a http:www.mysite.com:443 URL that somehow got indexed, and was actually pulling up an error page. I discovered it about a month ago, not sure how long it existed. I fixed that as much as I could, so now it pulls up the home page properly, but I have been unsuccessful in rewriting the 443 out of the URL. That troubles me, but I don't know if it is fatal. The point is, for a while, the 443 URL was actually the _first_ (and only home page) result in my site:mysite.com results! Now, it has disappeared from those results, though I know the URL could feasibly still exist.
[edited by: helpnow at 1:21 am (utc) on Oct. 2, 2009]
This is what google says about a meta noindex tag:
A noindex meta tag. When we see a noindex meta tag on a page, Google will completely drop the page from our search results, even if other pages link to it. If the content is currently in our index, we will remove it after the next time we crawl it.
Quite frankly, this is exactly what I understood a 301 would do too, no?
So, if the 301 caused my home page to dissappear from the sit: operator, then the root cause was that I had thousands of URLs all with the same content as my home page?
So, you're saying that this really could have been the cause of the site: operator issue? On top of this, I also had my rankings damaged this week, which is why I've been digging around.
I'm getting the feeling as this conversation progresses that this is all related: lots of URLs 301ed to homepage, home page dissappears from site: operator, rankings get damaged...
So, when the heck am I supposed to use a 301? Sounds like the meta noindex is the ultimate tool, certainly in this situation! If I am reading you right, I should NEVER have used the 301 for this, and it finally caught up to me! It sounds like I should just use the 301 to REWRITE bad URLs, but not to remove URLs from the index.
A robots.txt file restricts access to your site by search engine robots that crawl the web. (Note, however, that while Google won't crawl or index the content of pages blocked by robots.txt, we may still index the URLs if we find them on other pages on the web.)
So, from what I can see now, the only SURE way to get URLs out, is with noindex. noindex will get them out eventually, when google next crawls that page... to get the URLs out NOW, you can use google's remove tool [google.com...]
From what I can see, robots is an undependable tool, espcially if the URLs already exist. If they don't exist yet, and you are being proactive, well, robots will get the job done until the URLs slip out. And in my experience, over the years, the URLs will slip out eventually somehow. So, better you use noindex and be 100% sure. Assuming the spider in question honors noindex. ; )
But as I analyze it all, I can't see what value robots.txt has anymore - better you use noindex to be sure. I'm clearing mine out and moving everything to noindex. Because from what I can see, if you put in a noindex, but still have it blocked in robots, the spider will never get to see the noindex, and you will be trapped in a grey zone.
From what I can see, robots is an undependable tool, especially if the URLs already exist.
A robots.txt disallow rule tells the spider not even to request the URL. But a noindex robots meta tag can only be seen if the page gets spidered in the first place.
For an already indexed URL, a robots.txt disallow may get a slow response - but you can always speed that along by doing a URL Removal Request.
Shoudl I wait for th 301s to al get crawled, or can I just go ahead and delete all the URLs with the URL Removal Request? My concern is, will Google "remember" the URLs and the duplicate content they had? Is it important to wait for them to all get recrawled with the 301 into the 404.html, thereby removing them as duplicate content?
When I do a site:mysite.com, let's say about 50 of them still show the duplicate content, and about 50 of them are listed with no title or description, suggesting they have been crawled to the 404.html page.
Why are they still displaying in the site: operator? Will they ever dissappear, or do I have to block them in robots, or URL Removal Request them? I thought they would dissappear once they hit the 404 code, no? Am I right in assuming that because some of them are now displaying with no titles/descs in site:mysite.com, that the crawl has updated them to the 404 page?
Yes, I have this problem. I have almost 5000 pages which mistakenly took on the content of my home page. I have fixed the pages, they are slowly losing the home page content in site:mysite.com.
So, now, I've lost my nerve - I am unsure if I should just relax and wait for the crawl, or if there is something else I need to do to cement the fix / could do to speed things up. I am nervous about the URL Removal Request because I worry that I will kill the URLs before Google has a chance to fix my home page, and that it will never be able to fix the home page if I kill the URLs too soon.