What to do with old files spidered by google bot?

Forum Moderators: open

Message Too Old, No Replies

What to do with old files spidered by google bot?

silverbytes

8:32 pm on Sep 24, 2003 (gmt 0)

Googlebot is spidering old files that are not anymore in server... I see in log accessing

/directory/page.htm

which is not on server... how is that possible?
In fact accessed index.htm, robots.txt and that file only...

Any idea?

What is best for deleted documents? to redirect permanently to a general document in the site? Just delete it? Since it will be in google's search results but not on the server...

[edited by: WebGuerrilla at 8:36 pm (utc) on Sep. 24, 2003]
[edit reason] widgetized [/edit]

WebGuerrilla

8:39 pm on Sep 24, 2003 (gmt 0)

As long as your server is sending a proper 404, your best plan would be to simply delete any old files you don't want and then forget about it. Google will continue to request the old urls for awhile, but they will eventually stop.

silverbytes

5:50 pm on Sep 25, 2003 (gmt 0)

If I put a permanent redirect to index wouldn't be better?
I mean, users go away on 404 pages (even in custom many of them run away).
...

Or that causes the page to be in google's result and never dissapear?

closed

7:38 pm on Sep 25, 2003 (gmt 0)

I'd use either a permanent redirect (301) or gone (410) in your case. See:

[w3.org ]

berli

8:48 pm on Sep 25, 2003 (gmt 0)

Can anybody quickly give an example of how to do a 410 "Gone" on an Apache server (.htaccess)?

closed

4:31 am on Sep 26, 2003 (gmt 0)

You would use RewriteRule just like you would with a forbidden page, except that instead of using the F flag for a forbidden page, you'd use the G flag for a page that's gone.

For an example of the use of the F flag, see this page:

[httpd.apache.org ]