homepage Welcome to WebmasterWorld Guest from 54.197.183.230
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
URL Removal Tool Help
Why can't I get it to work?
webdude

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 30438 posted 6:16 pm on Jul 18, 2005 (gmt 0)

I was wondering why I cannot get the URL removal tool on G to work. I am running a db program that defaults to an error page if a file is not found. I did a header check on this page and was suprised that it returned a 200 Object Found. What this means is that everytime I changed a file name on my sites that queried the database, Googlebot would crawl and assume that the page was ok even though the file was not found. The result for one of my sites, that was completely redone, is many old URLs pointing to this error page, all being crawled and indexed. I am worried about dupe content here and just want to delete the URLs.
I submitted about 15 of the URLs using the removal tool last week and am dumbfounded that the requests have been denied. I got a "request denied" on every single page I submitted. I changed my server settings to return a 404 Object Not found for these database queries. I checked the page via the header checker and am now getting a 404 Object Not Found. I also added <META NAME="GOOGLEBOT" CONTENT="NOINDEX, NOFOLLOW"> to the error page.

Anyone know any reason why these requests are being denied? A I missing something here? I need to get these pages out of the index!

Thanks!

 

webdude

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 30438 posted 4:32 pm on Jul 19, 2005 (gmt 0)

bump :-) --- sorry. It took a long time to get this to show up.

lierduh

5+ Year Member



 
Msg#: 30438 posted 11:33 pm on Jul 19, 2005 (gmt 0)

What format have you used in robots.txt?
What was the reason they reject?

webdude

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 30438 posted 11:44 am on Jul 20, 2005 (gmt 0)

They give no reason for the rejection, just "Request Denied." As for Robots text, this is the only entry...

User-agent: *
Disallow: /cgi-bin/

All pages return a 404. I thought this was what had to be returned before G would delete the URL from the directory.

Doeasn't work though. The pages are still listed and being crawled. Any help would be greatly appreciated.

Thanks

webdude

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 30438 posted 5:40 pm on Jul 20, 2005 (gmt 0)

Anybody?

jatar_k

WebmasterWorld Administrator jatar_k us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 30438 posted 5:50 pm on Jul 20, 2005 (gmt 0)

I am not sure really about the removal tool as I have never used it but this will serve as a bump if nothing else

I can tell you a few things

I would be very sure that the pages are properly returning a 404 and not a 200 followed by a 404, which I've seen many times and that will not prompt G to remove the page.

you could return a couple other codes as well

301 Moved Permanently (if there is a new url for that content)
or
410 Gone

either of these would be appropriate.

It also takes quite a while for G to sort itself out with removing pages sometimes.

webdude

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 30438 posted 6:03 pm on Jul 20, 2005 (gmt 0)

jatar_k,

Thanks for answering!

According to the header checker here on Webmaster World, it is returning a 404. But I found a checker in the past that would actually follow all the hops and it worked well, especially for 301 and 302s. I am not sure if there is an extra hop using the WebmasterWorld tool.

Is there another tool available that would show this? All the tools I have found are showing 404.

Also, is there anything else that needs to be done? These pages really don't exist and I would like to delete them from the index. Request Denied is all the info G will give. The pages have been there for months now.

Thanks

jatar_k

WebmasterWorld Administrator jatar_k us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 30438 posted 6:14 pm on Jul 20, 2005 (gmt 0)

if you have checked the 404 with a couple tools then it should be right.

>> anything else

well, they will go away eventually, I wouldn't worry about it too much, G will get a 404 every time it tries to spider them.

webdude

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 30438 posted 6:28 pm on Jul 20, 2005 (gmt 0)

That's what I am hoping for. But the problem is that the site gets a lot of referrals from G and since it was rebuilt 4 months ago, over 50% of ALL the referrals are going to the 404 page. 95% of all G referrals are going to the 404 page and the visitors are then leaving. It's pretty bad when the top page being hit in the stats is a 404 page and the the 2nd is the index page. My client is getting a little P.O'ed about it. That's why I am trying to get G to just drop the pages.

webdude

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 30438 posted 4:21 pm on Jul 21, 2005 (gmt 0)

Googlbot also shows a 404 when accessing these links via the web logs. So I cannot figure this one out. Googlebot shows a 404, header checker shows a 404, I have <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> in the header, and still cannot remove the URL.

Perplexed for a few months now :-(

webdude

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 30438 posted 8:05 pm on Jul 22, 2005 (gmt 0)

Well, this is interesting. I resubmiited just one of the missing pages. I see the googlebot-console tried hitting the page and got a 404. In the results pane after logging into the removal tool, it still shows pending. Has been that way for the past 2 days. In my opinion, that is good. The last batch that were "request denied" came back within 24 hours as being denied. Not sure what changed, but will check again on Monday to see if it worked.

webdude

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 30438 posted 1:17 pm on Jul 25, 2005 (gmt 0)

Day for and still pending. If this works, I'll submit the rest of the URLs. It seems googlebot is quite active, but still no changes in the index :-(

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 30438 posted 11:57 am on Jul 27, 2005 (gmt 0)

I find that WebBug is a very good tool for checking exactly what comes back in the HTTP headers. You can also test for HTTP 1.0 and HTTP 1.1 etc.

Petrocelli

10+ Year Member



 
Msg#: 30438 posted 3:22 pm on Jul 27, 2005 (gmt 0)

Talking about the removal tool:

Trying to help Google sorting out a site structure, a few days ago I unintentionally wiped out a complete site using this tool due to an error in robots.txt.

Does someone have any idea about how to make this undone? Any advice would be very much appreciated.

Peter

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 30438 posted 3:30 pm on Jul 27, 2005 (gmt 0)

Ouch, all of the pages will be gone for 90 or 180 days.

I am not sure that there is anything that you can do.

webdude

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 30438 posted 8:15 pm on Jul 27, 2005 (gmt 0)

Link is still pending after 8 days. mmm... Sure wish I could get this thing to work!

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 30438 posted 9:21 pm on Jul 27, 2005 (gmt 0)

If you do a site: search I expect that you'll find that the URLs have already been removed from the SERPs, and that all you are waiting for is the removal tool status to be updated from the SERPs database.

whitehatwizard

5+ Year Member



 
Msg#: 30438 posted 11:50 am on Jul 28, 2005 (gmt 0)

I think ive heard of issues with 301 redirects to the homepage so research that first BUT......if you remove the pages that does not mean google will give the traffic to your new pages. So traffic-wise you are better off 301 (permanently) redirecting the existing crawled pages to new pages.

On a related topic, I changed extensions of my site from asp to shtml and 301 redirected to the shtml pages. The existing traffic went to the shtml BUT once google dropped the asps the new shtml pages never ranked the same as the old. Changeing filenames or extensions seems like a bad idea with google, perhaps because google likes older pages, I dont know (and I have limited experience with it). Note also that I did this around the time of allegra so it might have just been that.

webdude

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 30438 posted 12:24 pm on Jul 28, 2005 (gmt 0)

AHHHAHHAHHHAH (that's me screaming)

2005-07-21 16:10:27 GMT :
removal of [mysite.org...]
request denied

What the heck?!?!?!?! Why can't I get this thing to work? Would someone who knows what the heck they are doing kindly take a look? This is getting old very fast.

webdude

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 30438 posted 12:36 pm on Jul 28, 2005 (gmt 0)

Okay, I will step through everything I am doing...

First, I click on "Remove an outdated link."

Next I am filling in the complete URL and then I am givin the following choices as radio buttons (I am a little confused on this part)...

anything associated with this URL
snippet portion of result (includes cached version)
cached version only

I am checking "anything associated with this URL" Is this correct?

webdude

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 30438 posted 12:51 pm on Jul 28, 2005 (gmt 0)

Okay,

I tried WebBug....

HTTP/1.1 404 Object Not Found
Date: Thu, 28 Jul 2005 12:51:43 GMT
Connection: close
Content-Length: 930
Content-Type: text/html

I am very confused.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 30438 posted 1:14 pm on Jul 28, 2005 (gmt 0)

What response do you get with a HTTP 1.0 request?

Is it still 404?

webdude

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 30438 posted 1:31 pm on Jul 28, 2005 (gmt 0)

When I change the HTTP version to 1.0, I get...

HTTP/1.1 404 Object Not Found
Date: Thu, 28 Jul 2005 13:29:49 GMT
Content-Length: 930
Content-Type: text/html

When I set it to 1.1, I get...

HTTP/1.1 404 Object Not Found
Date: Thu, 28 Jul 2005 13:30:15 GMT
Connection: close
Content-Length: 930
Content-Type: text/html

When I set it to 0.9, I get...

HTTP/1.1 400 Bad Request
Content-Type: text/html
Content-Length: 87
Connection: close

webdevfv

5+ Year Member



 
Msg#: 30438 posted 1:51 pm on Jul 28, 2005 (gmt 0)

What I did succesfully is this:

1. Sign up for a google account.

2. Go to google search and do a search for 'site:mysite.com'

Go through all the links writing down all the wrong/dead URLs you wish to remove (if you have 10,000 links and this is not possible just use the ones you know are definitely not working/dead)

3. Add all these links to your robots.txt

User-agent: googlebot
Disallow: /myoldfile.html
Disallow: /myoldfile1.html
Disallow: /myolddirectory/myoldfile.html
Disallow: /myolddirectory/myoldfile1.html

4. Go back to your google account and go to remove URLs - automatic removal system. Link is at bottom of this page - 'Remove Content from Google's Index'

Submit your robots.txt file to google
[mysite.com...]

5. Wait a bit and then google will come by.

6. Remove all this stuff from your robots.txt once complete or keep it for a few days changing Uer Agent to * so that other SEs can remove these files too.

Hope this helps. Worked a treat for me.

Disallow googlebot

webdude

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 30438 posted 12:00 pm on Jul 29, 2005 (gmt 0)

Well, I finally broke down and emailed google to see if they could figure out the problem. Still waiting to here from them.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 30438 posted 6:27 pm on Jul 29, 2005 (gmt 0)

It takes about three days for a reply to come back... and they are all "cut and paste" standard answers.

webdude

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 30438 posted 12:29 pm on Aug 1, 2005 (gmt 0)

Yep g1smd,

You are absolutely correct sir. Standard reply - we cannot comment on individual sites, go to google news for more information - go to google faq on using sitemaps - yada yada yada.

Sure wish I could figure this one out.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved