Forum Moderators: goodroi

Message Too Old, No Replies

Elimination of pages from a website.

Do i have to notify the engines in some way?

         

Rani

1:19 pm on Mar 1, 2006 (gmt 0)

10+ Year Member



Hi,

I eliminated some pages from my website.

I also eliminated them from the Google Sitemap.

Is it enough to keep the bots from comming again to grab these pages?

Thanks.

engine

12:28 pm on Mar 2, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



By blocking the robots the pages will naturally fall out from the index, in time. So, no, you don't need to notify.

Dijkgraaf

7:58 pm on Mar 2, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ah, but how much time?

I still see people trying to access pages from search engines searches years after the original pages have gone.
I've got a custom error page that will redirect them to the correct area for most of them.

But yes, disallowing those pages in robots.txt will stop the search engines bots asking for them anymore.

The_Zenker

9:17 pm on Mar 10, 2006 (gmt 0)

10+ Year Member



My understanding is that removed content should be send back to the browser a HTTP 404 Page Not Found. Even if you have a "User friendly" error page that helps the user find your site and its new content, if a page has been removed that error page should return 404. In that way both the user and the SE are happy, the user finds your site,and the SE can clean up its index.

Dijkgraaf

9:46 pm on Mar 10, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The_Zenker, even after returning 404 codes for 2 years, the search engines still hadn't removed the results from their index.

jdMorgan

9:56 pm on Mar 10, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If the request is made with HTTP/1.1 or with 'extended' HTTP/1.0, then a 410-Gone response is what should be returned for content that has been intentionally removed. Generally, if an HTTP/1.0 spider sends an HTTP_HOST header, it is an 'extended' HTTP/1.0 spider, and will most probably understand a 410.

Everybody but Inktomi Slurp and Jeeves used to handle 410 responses correctly, and stop requesting the file as soon as all spider hosts with that URL on their crawl list had seen the 410 response.

Jim

Rani

5:34 am on Mar 11, 2006 (gmt 0)

10+ Year Member



207.46.98.47 - - [11/Mar/2006:05:45:35 +0200] "GET /widgets.htm HTTP/1.0" 404 5172 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"

This is what i am talking about."widgets.htm" is a page i had removed from my website 1 month ago.Inspite of that msnbot keeps comming again and again to fetch it.My server replies with a 404 as you can see.

a 410-Gone response is what should be returned

What do i have to do to make the server return a 410 response instead of the 404 response?

Thanks.

Dijkgraaf

10:41 am on Mar 11, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You could have a custom error page that checks to see if the requested resource is one you've marked as permanently gone, and if so issue a 410 code.
You would either have to have those pages coded in the page itself, or in a database in which it can look it up.

The_Zenker

12:26 am on Mar 12, 2006 (gmt 0)

10+ Year Member



Dijkgraaf and jdMorgan, thanks for the info on the 410. That was new to me. SEO as a whole is new to me and I have been trying to come up to speed. Appreciate the help.