Welcome to WebmasterWorld Guest from 50.19.156.19

Message Too Old, No Replies

disallow in robots.txt

will this cause a page to be removed?

     

Reid

12:17 am on May 22, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If I want a page removed and i disallow it in robots.txt will it eventually get removed from google without using the removal tool? or will it just sit there gathering dust.

Dawg

10:40 pm on May 23, 2005 (gmt 0)

10+ Year Member



If you can use the noindex, follow/nofollow meta tags.

In my experience using Disallow via robots.txt will remove your page as well. But it takes a long long time. First the cache and snippet will be removed but your page is still listed without description in the index. Additionally the page can still come up in the SERPs....

Therefore... use metatags...

URL removal tool will also work but I have heard of problems using this tool.

WA_Smith

2:36 am on May 24, 2005 (gmt 0)

10+ Year Member



I agree with Dawg in how search engines practice.

But i disagree that it is the best method ... robots should just clear those pages it would be a much better world.

To speed the process it may help to feed a 404 response to the search engine you want to drop the page via a cloaking script.

g1smd

4:14 pm on May 24, 2005 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Google will not autmatically remove pages mentioned in the robots.txt file.

To remove the pages from their index you need to submit the URL of the robots.txt file to the Google URL console. The pages will then be removed within days, and will stay out of the index for 6 months. They will continue to stay out only if the pages are still mentioned in the robots.txt file after that time.

Alternatively, the robots meta tag will see pages dropped from the index within a matter of a week or so.

Reid

8:56 am on May 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The reason that robots will continue to request a URL that returns a 404 is because servers do go down.
They don't want to drop the cache or have to re-index the site at the drop of a hat. That takes precious bandwidth. So they will continue to request it - waiting for the server to come back online.

410 gone should cause it to be removed though.
Thanks for all the input guys - Personally I would use the robots.txt submission method but I was just wondering...

py9jmas

9:30 am on May 28, 2005 (gmt 0)

10+ Year Member



A server that has gone down should not be returning 404s. Either it wouldn't respond (timeouts/connection refused) or it would return one of the 5xx server errors. It shouldn't return a 4xx client error.

Reid

11:20 am on May 29, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



let me rephrase that 'pages go down'
how many times have I found 404 on some important outbound link but just waited 24 hrs and see it come back.

DerekH

4:35 pm on May 29, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



g1smd wrote
Alternatively, the robots meta tag will see pages dropped from the index within a matter of a week or so.

Or not...
I have a website I took down, leaving the pages up there with a META robots noindex.
The pages went supplemental within a week, and have stayed supplemental for 6 months now...
Since it's a free ISP, I have no access to ROBOTS.TXT or 404s on that particular site, so I've modified the pages to point to my new site, and am still waiting for them to go!
DerekH