| 4:32 pm on Jul 19, 2005 (gmt 0)|
bump :-) --- sorry. It took a long time to get this to show up.
| 11:33 pm on Jul 19, 2005 (gmt 0)|
What format have you used in robots.txt?
What was the reason they reject?
| 11:44 am on Jul 20, 2005 (gmt 0)|
They give no reason for the rejection, just "Request Denied." As for Robots text, this is the only entry...
All pages return a 404. I thought this was what had to be returned before G would delete the URL from the directory.
Doeasn't work though. The pages are still listed and being crawled. Any help would be greatly appreciated.
| 5:40 pm on Jul 20, 2005 (gmt 0)|
| 5:50 pm on Jul 20, 2005 (gmt 0)|
I am not sure really about the removal tool as I have never used it but this will serve as a bump if nothing else
I can tell you a few things
I would be very sure that the pages are properly returning a 404 and not a 200 followed by a 404, which I've seen many times and that will not prompt G to remove the page.
you could return a couple other codes as well
301 Moved Permanently (if there is a new url for that content)
either of these would be appropriate.
It also takes quite a while for G to sort itself out with removing pages sometimes.
| 6:03 pm on Jul 20, 2005 (gmt 0)|
Thanks for answering!
According to the header checker here on Webmaster World, it is returning a 404. But I found a checker in the past that would actually follow all the hops and it worked well, especially for 301 and 302s. I am not sure if there is an extra hop using the WebmasterWorld tool.
Is there another tool available that would show this? All the tools I have found are showing 404.
Also, is there anything else that needs to be done? These pages really don't exist and I would like to delete them from the index. Request Denied is all the info G will give. The pages have been there for months now.
| 6:14 pm on Jul 20, 2005 (gmt 0)|
if you have checked the 404 with a couple tools then it should be right.
>> anything else
well, they will go away eventually, I wouldn't worry about it too much, G will get a 404 every time it tries to spider them.
| 6:28 pm on Jul 20, 2005 (gmt 0)|
That's what I am hoping for. But the problem is that the site gets a lot of referrals from G and since it was rebuilt 4 months ago, over 50% of ALL the referrals are going to the 404 page. 95% of all G referrals are going to the 404 page and the visitors are then leaving. It's pretty bad when the top page being hit in the stats is a 404 page and the the 2nd is the index page. My client is getting a little P.O'ed about it. That's why I am trying to get G to just drop the pages.
| 4:21 pm on Jul 21, 2005 (gmt 0)|
Googlbot also shows a 404 when accessing these links via the web logs. So I cannot figure this one out. Googlebot shows a 404, header checker shows a 404, I have <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> in the header, and still cannot remove the URL.
Perplexed for a few months now :-(
| 8:05 pm on Jul 22, 2005 (gmt 0)|
Well, this is interesting. I resubmiited just one of the missing pages. I see the googlebot-console tried hitting the page and got a 404. In the results pane after logging into the removal tool, it still shows pending. Has been that way for the past 2 days. In my opinion, that is good. The last batch that were "request denied" came back within 24 hours as being denied. Not sure what changed, but will check again on Monday to see if it worked.
| 1:17 pm on Jul 25, 2005 (gmt 0)|
Day for and still pending. If this works, I'll submit the rest of the URLs. It seems googlebot is quite active, but still no changes in the index :-(
| 11:57 am on Jul 27, 2005 (gmt 0)|
I find that WebBug is a very good tool for checking exactly what comes back in the HTTP headers. You can also test for HTTP 1.0 and HTTP 1.1 etc.
| 3:22 pm on Jul 27, 2005 (gmt 0)|
Talking about the removal tool:
Trying to help Google sorting out a site structure, a few days ago I unintentionally wiped out a complete site using this tool due to an error in robots.txt.
Does someone have any idea about how to make this undone? Any advice would be very much appreciated.
| 3:30 pm on Jul 27, 2005 (gmt 0)|
Ouch, all of the pages will be gone for 90 or 180 days.
I am not sure that there is anything that you can do.
| 8:15 pm on Jul 27, 2005 (gmt 0)|
Link is still pending after 8 days. mmm... Sure wish I could get this thing to work!
| 9:21 pm on Jul 27, 2005 (gmt 0)|
If you do a site: search I expect that you'll find that the URLs have already been removed from the SERPs, and that all you are waiting for is the removal tool status to be updated from the SERPs database.
| 11:50 am on Jul 28, 2005 (gmt 0)|
I think ive heard of issues with 301 redirects to the homepage so research that first BUT......if you remove the pages that does not mean google will give the traffic to your new pages. So traffic-wise you are better off 301 (permanently) redirecting the existing crawled pages to new pages.
On a related topic, I changed extensions of my site from asp to shtml and 301 redirected to the shtml pages. The existing traffic went to the shtml BUT once google dropped the asps the new shtml pages never ranked the same as the old. Changeing filenames or extensions seems like a bad idea with google, perhaps because google likes older pages, I dont know (and I have limited experience with it). Note also that I did this around the time of allegra so it might have just been that.
| 12:24 pm on Jul 28, 2005 (gmt 0)|
AHHHAHHAHHHAH (that's me screaming)
2005-07-21 16:10:27 GMT :
removal of [mysite.org...]
What the heck?!?!?!?! Why can't I get this thing to work? Would someone who knows what the heck they are doing kindly take a look? This is getting old very fast.
| 12:36 pm on Jul 28, 2005 (gmt 0)|
Okay, I will step through everything I am doing...
First, I click on "Remove an outdated link."
Next I am filling in the complete URL and then I am givin the following choices as radio buttons (I am a little confused on this part)...
|anything associated with this URL |
snippet portion of result (includes cached version)
cached version only
I am checking "anything associated with this URL" Is this correct?
| 12:51 pm on Jul 28, 2005 (gmt 0)|
I tried WebBug....
HTTP/1.1 404 Object Not Found
Date: Thu, 28 Jul 2005 12:51:43 GMT
I am very confused.
| 1:14 pm on Jul 28, 2005 (gmt 0)|
What response do you get with a HTTP 1.0 request?
Is it still 404?
| 1:31 pm on Jul 28, 2005 (gmt 0)|
When I change the HTTP version to 1.0, I get...
HTTP/1.1 404 Object Not Found
Date: Thu, 28 Jul 2005 13:29:49 GMT
When I set it to 1.1, I get...
HTTP/1.1 404 Object Not Found
Date: Thu, 28 Jul 2005 13:30:15 GMT
When I set it to 0.9, I get...
HTTP/1.1 400 Bad Request
| 1:51 pm on Jul 28, 2005 (gmt 0)|
What I did succesfully is this:
1. Sign up for a google account.
2. Go to google search and do a search for 'site:mysite.com'
Go through all the links writing down all the wrong/dead URLs you wish to remove (if you have 10,000 links and this is not possible just use the ones you know are definitely not working/dead)
3. Add all these links to your robots.txt
4. Go back to your google account and go to remove URLs - automatic removal system. Link is at bottom of this page - 'Remove Content from Google's Index'
Submit your robots.txt file to google
5. Wait a bit and then google will come by.
6. Remove all this stuff from your robots.txt once complete or keep it for a few days changing Uer Agent to * so that other SEs can remove these files too.
Hope this helps. Worked a treat for me.
| 12:00 pm on Jul 29, 2005 (gmt 0)|
Well, I finally broke down and emailed google to see if they could figure out the problem. Still waiting to here from them.
| 6:27 pm on Jul 29, 2005 (gmt 0)|
It takes about three days for a reply to come back... and they are all "cut and paste" standard answers.
| 12:29 pm on Aug 1, 2005 (gmt 0)|
You are absolutely correct sir. Standard reply - we cannot comment on individual sites, go to google news for more information - go to google faq on using sitemaps - yada yada yada.
Sure wish I could figure this one out.