Forum Moderators: Robert Charlton & goodroi
Anyone know any reason why these requests are being denied? A I missing something here? I need to get these pages out of the index!
Thanks!
User-agent: *
Disallow: /cgi-bin/
All pages return a 404. I thought this was what had to be returned before G would delete the URL from the directory.
Doeasn't work though. The pages are still listed and being crawled. Any help would be greatly appreciated.
Thanks
I can tell you a few things
I would be very sure that the pages are properly returning a 404 and not a 200 followed by a 404, which I've seen many times and that will not prompt G to remove the page.
you could return a couple other codes as well
301 Moved Permanently (if there is a new url for that content)
or
410 Gone
either of these would be appropriate.
It also takes quite a while for G to sort itself out with removing pages sometimes.
Thanks for answering!
According to the header checker here on Webmaster World, it is returning a 404. But I found a checker in the past that would actually follow all the hops and it worked well, especially for 301 and 302s. I am not sure if there is an extra hop using the WebmasterWorld tool.
Is there another tool available that would show this? All the tools I have found are showing 404.
Also, is there anything else that needs to be done? These pages really don't exist and I would like to delete them from the index. Request Denied is all the info G will give. The pages have been there for months now.
Thanks
On a related topic, I changed extensions of my site from asp to shtml and 301 redirected to the shtml pages. The existing traffic went to the shtml BUT once google dropped the asps the new shtml pages never ranked the same as the old. Changeing filenames or extensions seems like a bad idea with google, perhaps because google likes older pages, I dont know (and I have limited experience with it). Note also that I did this around the time of allegra so it might have just been that.
2005-07-21 16:10:27 GMT :
removal of [mysite.org...]
request denied
What the heck?!?!?!?! Why can't I get this thing to work? Would someone who knows what the heck they are doing kindly take a look? This is getting old very fast.
First, I click on "Remove an outdated link."
Next I am filling in the complete URL and then I am givin the following choices as radio buttons (I am a little confused on this part)...
anything associated with this URL
snippet portion of result (includes cached version)
cached version only
I am checking "anything associated with this URL" Is this correct?
HTTP/1.1 404 Object Not Found
Date: Thu, 28 Jul 2005 13:29:49 GMT
Content-Length: 930
Content-Type: text/html
When I set it to 1.1, I get...
HTTP/1.1 404 Object Not Found
Date: Thu, 28 Jul 2005 13:30:15 GMT
Connection: close
Content-Length: 930
Content-Type: text/html
When I set it to 0.9, I get...
HTTP/1.1 400 Bad Request
Content-Type: text/html
Content-Length: 87
Connection: close
1. Sign up for a google account.
2. Go to google search and do a search for 'site:mysite.com'
Go through all the links writing down all the wrong/dead URLs you wish to remove (if you have 10,000 links and this is not possible just use the ones you know are definitely not working/dead)
3. Add all these links to your robots.txt
User-agent: googlebot
Disallow: /myoldfile.html
Disallow: /myoldfile1.html
Disallow: /myolddirectory/myoldfile.html
Disallow: /myolddirectory/myoldfile1.html
4. Go back to your google account and go to remove URLs - automatic removal system. Link is at bottom of this page - 'Remove Content from Google's Index'
Submit your robots.txt file to google
[mysite.com...]
5. Wait a bit and then google will come by.
6. Remove all this stuff from your robots.txt once complete or keep it for a few days changing Uer Agent to * so that other SEs can remove these files too.
Hope this helps. Worked a treat for me.
Disallow googlebot