Welcome to WebmasterWorld Guest from 54.196.238.210

Message Too Old, No Replies

URL Removal Tool Help

Why can't I get it to work?

     

webdude

6:16 pm on Jul 18, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I was wondering why I cannot get the URL removal tool on G to work. I am running a db program that defaults to an error page if a file is not found. I did a header check on this page and was suprised that it returned a 200 Object Found. What this means is that everytime I changed a file name on my sites that queried the database, Googlebot would crawl and assume that the page was ok even though the file was not found. The result for one of my sites, that was completely redone, is many old URLs pointing to this error page, all being crawled and indexed. I am worried about dupe content here and just want to delete the URLs.
I submitted about 15 of the URLs using the removal tool last week and am dumbfounded that the requests have been denied. I got a "request denied" on every single page I submitted. I changed my server settings to return a 404 Object Not found for these database queries. I checked the page via the header checker and am now getting a 404 Object Not Found. I also added <META NAME="GOOGLEBOT" CONTENT="NOINDEX, NOFOLLOW"> to the error page.

Anyone know any reason why these requests are being denied? A I missing something here? I need to get these pages out of the index!

Thanks!

webdude

4:32 pm on Jul 19, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



bump :-) --- sorry. It took a long time to get this to show up.

lierduh

11:33 pm on Jul 19, 2005 (gmt 0)

10+ Year Member



What format have you used in robots.txt?
What was the reason they reject?

webdude

11:44 am on Jul 20, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



They give no reason for the rejection, just "Request Denied." As for Robots text, this is the only entry...

User-agent: *
Disallow: /cgi-bin/

All pages return a 404. I thought this was what had to be returned before G would delete the URL from the directory.

Doeasn't work though. The pages are still listed and being crawled. Any help would be greatly appreciated.

Thanks

webdude

5:40 pm on Jul 20, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Anybody?

jatar_k

5:50 pm on Jul 20, 2005 (gmt 0)

WebmasterWorld Administrator jatar_k is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I am not sure really about the removal tool as I have never used it but this will serve as a bump if nothing else

I can tell you a few things

I would be very sure that the pages are properly returning a 404 and not a 200 followed by a 404, which I've seen many times and that will not prompt G to remove the page.

you could return a couple other codes as well

301 Moved Permanently (if there is a new url for that content)
or
410 Gone

either of these would be appropriate.

It also takes quite a while for G to sort itself out with removing pages sometimes.

webdude

6:03 pm on Jul 20, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



jatar_k,

Thanks for answering!

According to the header checker here on Webmaster World, it is returning a 404. But I found a checker in the past that would actually follow all the hops and it worked well, especially for 301 and 302s. I am not sure if there is an extra hop using the WebmasterWorld tool.

Is there another tool available that would show this? All the tools I have found are showing 404.

Also, is there anything else that needs to be done? These pages really don't exist and I would like to delete them from the index. Request Denied is all the info G will give. The pages have been there for months now.

Thanks

jatar_k

6:14 pm on Jul 20, 2005 (gmt 0)

WebmasterWorld Administrator jatar_k is a WebmasterWorld Top Contributor of All Time 10+ Year Member



if you have checked the 404 with a couple tools then it should be right.

>> anything else

well, they will go away eventually, I wouldn't worry about it too much, G will get a 404 every time it tries to spider them.

webdude

6:28 pm on Jul 20, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That's what I am hoping for. But the problem is that the site gets a lot of referrals from G and since it was rebuilt 4 months ago, over 50% of ALL the referrals are going to the 404 page. 95% of all G referrals are going to the 404 page and the visitors are then leaving. It's pretty bad when the top page being hit in the stats is a 404 page and the the 2nd is the index page. My client is getting a little P.O'ed about it. That's why I am trying to get G to just drop the pages.

webdude

4:21 pm on Jul 21, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Googlbot also shows a 404 when accessing these links via the web logs. So I cannot figure this one out. Googlebot shows a 404, header checker shows a 404, I have <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> in the header, and still cannot remove the URL.

Perplexed for a few months now :-(

webdude

8:05 pm on Jul 22, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, this is interesting. I resubmiited just one of the missing pages. I see the googlebot-console tried hitting the page and got a 404. In the results pane after logging into the removal tool, it still shows pending. Has been that way for the past 2 days. In my opinion, that is good. The last batch that were "request denied" came back within 24 hours as being denied. Not sure what changed, but will check again on Monday to see if it worked.

webdude

1:17 pm on Jul 25, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Day for and still pending. If this works, I'll submit the rest of the URLs. It seems googlebot is quite active, but still no changes in the index :-(

g1smd

11:57 am on Jul 27, 2005 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I find that WebBug is a very good tool for checking exactly what comes back in the HTTP headers. You can also test for HTTP 1.0 and HTTP 1.1 etc.

Petrocelli

3:22 pm on Jul 27, 2005 (gmt 0)

10+ Year Member



Talking about the removal tool:

Trying to help Google sorting out a site structure, a few days ago I unintentionally wiped out a complete site using this tool due to an error in robots.txt.

Does someone have any idea about how to make this undone? Any advice would be very much appreciated.

Peter

g1smd

3:30 pm on Jul 27, 2005 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Ouch, all of the pages will be gone for 90 or 180 days.

I am not sure that there is anything that you can do.

webdude

8:15 pm on Jul 27, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Link is still pending after 8 days. mmm... Sure wish I could get this thing to work!

g1smd

9:21 pm on Jul 27, 2005 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



If you do a site: search I expect that you'll find that the URLs have already been removed from the SERPs, and that all you are waiting for is the removal tool status to be updated from the SERPs database.

whitehatwizard

11:50 am on Jul 28, 2005 (gmt 0)

10+ Year Member



I think ive heard of issues with 301 redirects to the homepage so research that first BUT......if you remove the pages that does not mean google will give the traffic to your new pages. So traffic-wise you are better off 301 (permanently) redirecting the existing crawled pages to new pages.

On a related topic, I changed extensions of my site from asp to shtml and 301 redirected to the shtml pages. The existing traffic went to the shtml BUT once google dropped the asps the new shtml pages never ranked the same as the old. Changeing filenames or extensions seems like a bad idea with google, perhaps because google likes older pages, I dont know (and I have limited experience with it). Note also that I did this around the time of allegra so it might have just been that.

webdude

12:24 pm on Jul 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



AHHHAHHAHHHAH (that's me screaming)

2005-07-21 16:10:27 GMT :
removal of [mysite.org...]
request denied

What the heck?!?!?!?! Why can't I get this thing to work? Would someone who knows what the heck they are doing kindly take a look? This is getting old very fast.

webdude

12:36 pm on Jul 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Okay, I will step through everything I am doing...

First, I click on "Remove an outdated link."

Next I am filling in the complete URL and then I am givin the following choices as radio buttons (I am a little confused on this part)...

anything associated with this URL
snippet portion of result (includes cached version)
cached version only

I am checking "anything associated with this URL" Is this correct?

webdude

12:51 pm on Jul 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Okay,

I tried WebBug....

HTTP/1.1 404 Object Not Found
Date: Thu, 28 Jul 2005 12:51:43 GMT
Connection: close
Content-Length: 930
Content-Type: text/html

I am very confused.

g1smd

1:14 pm on Jul 28, 2005 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



What response do you get with a HTTP 1.0 request?

Is it still 404?

webdude

1:31 pm on Jul 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



When I change the HTTP version to 1.0, I get...

HTTP/1.1 404 Object Not Found
Date: Thu, 28 Jul 2005 13:29:49 GMT
Content-Length: 930
Content-Type: text/html

When I set it to 1.1, I get...

HTTP/1.1 404 Object Not Found
Date: Thu, 28 Jul 2005 13:30:15 GMT
Connection: close
Content-Length: 930
Content-Type: text/html

When I set it to 0.9, I get...

HTTP/1.1 400 Bad Request
Content-Type: text/html
Content-Length: 87
Connection: close

webdevfv

1:51 pm on Jul 28, 2005 (gmt 0)

10+ Year Member



What I did succesfully is this:

1. Sign up for a google account.

2. Go to google search and do a search for 'site:mysite.com'

Go through all the links writing down all the wrong/dead URLs you wish to remove (if you have 10,000 links and this is not possible just use the ones you know are definitely not working/dead)

3. Add all these links to your robots.txt

User-agent: googlebot
Disallow: /myoldfile.html
Disallow: /myoldfile1.html
Disallow: /myolddirectory/myoldfile.html
Disallow: /myolddirectory/myoldfile1.html

4. Go back to your google account and go to remove URLs - automatic removal system. Link is at bottom of this page - 'Remove Content from Google's Index'

Submit your robots.txt file to google
[mysite.com...]

5. Wait a bit and then google will come by.

6. Remove all this stuff from your robots.txt once complete or keep it for a few days changing Uer Agent to * so that other SEs can remove these files too.

Hope this helps. Worked a treat for me.

Disallow googlebot

webdude

12:00 pm on Jul 29, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, I finally broke down and emailed google to see if they could figure out the problem. Still waiting to here from them.

g1smd

6:27 pm on Jul 29, 2005 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



It takes about three days for a reply to come back... and they are all "cut and paste" standard answers.

webdude

12:29 pm on Aug 1, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yep g1smd,

You are absolutely correct sir. Standard reply - we cannot comment on individual sites, go to google news for more information - go to google faq on using sitemaps - yada yada yada.

Sure wish I could figure this one out.

 

Featured Threads

Hot Threads This Week

Hot Threads This Month