Welcome to WebmasterWorld Guest from 23.20.110.176

Forum Moderators: Robert Charlton & aakk9999 & andy langton & goodroi

Message Too Old, No Replies

disallow in robots.txt

will this cause a page to be removed?

     
12:17 am on May 22, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2004
posts:693
votes: 0


If I want a page removed and i disallow it in robots.txt will it eventually get removed from google without using the removal tool? or will it just sit there gathering dust.
10:40 pm on May 23, 2005 (gmt 0)

New User

10+ Year Member

joined:Apr 24, 2005
posts:15
votes: 0


If you can use the noindex, follow/nofollow meta tags.

In my experience using Disallow via robots.txt will remove your page as well. But it takes a long long time. First the cache and snippet will be removed but your page is still listed without description in the index. Additionally the page can still come up in the SERPs....

Therefore... use metatags...

URL removal tool will also work but I have heard of problems using this tool.

2:36 am on May 24, 2005 (gmt 0)

New User

10+ Year Member

joined:Mar 8, 2005
posts:40
votes: 0


I agree with Dawg in how search engines practice.

But i disagree that it is the best method ... robots should just clear those pages it would be a much better world.

To speed the process it may help to feed a 404 response to the search engine you want to drop the page via a cloaking script.

4:14 pm on May 24, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Google will not autmatically remove pages mentioned in the robots.txt file.

To remove the pages from their index you need to submit the URL of the robots.txt file to the Google URL console. The pages will then be removed within days, and will stay out of the index for 6 months. They will continue to stay out only if the pages are still mentioned in the robots.txt file after that time.

Alternatively, the robots meta tag will see pages dropped from the index within a matter of a week or so.

8:56 am on May 28, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2004
posts:693
votes: 0


The reason that robots will continue to request a URL that returns a 404 is because servers do go down.
They don't want to drop the cache or have to re-index the site at the drop of a hat. That takes precious bandwidth. So they will continue to request it - waiting for the server to come back online.

410 gone should cause it to be removed though.
Thanks for all the input guys - Personally I would use the robots.txt submission method but I was just wondering...

9:30 am on May 28, 2005 (gmt 0)

Full Member

10+ Year Member

joined:Mar 8, 2004
posts:311
votes: 0


A server that has gone down should not be returning 404s. Either it wouldn't respond (timeouts/connection refused) or it would return one of the 5xx server errors. It shouldn't return a 4xx client error.
11:20 am on May 29, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2004
posts:693
votes: 0


let me rephrase that 'pages go down'
how many times have I found 404 on some important outbound link but just waited 24 hrs and see it come back.
4:35 pm on May 29, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 23, 2003
posts:801
votes: 0


g1smd wrote
Alternatively, the robots meta tag will see pages dropped from the index within a matter of a week or so.

Or not...
I have a website I took down, leaving the pages up there with a META robots noindex.
The pages went supplemental within a week, and have stayed supplemental for 6 months now...
Since it's a free ISP, I have no access to ROBOTS.TXT or 404s on that particular site, so I've modified the pages to point to my new site, and am still waiting for them to go!
DerekH