Welcome to WebmasterWorld Guest from 22.214.171.124
Forum Moderators: mack
This appears to happen because I have many files redirecting to the file being hit like this:
<meta http-equiv="refresh" content="0;URL=../books.htm">
<body bgcolor="#FFFFFF" text="#000000">
There are multiple files redirecting to the file 'books.htm' but when the crawler is redirected to the file in this way it does not appear to remember that it has been redirected there before so it reads it again. So the file 'books.htm' is getting hundreds of hits.
This is using up a lot of bandwidth is there any way to stop it?
Another possibility would be to use a 301 redirect instead of the meta refresh to tell robots that the content has moved permanently to the new URL. I'm not sure how the Live crawler interprets meta refreshes, but at one point, they were frowned upon as a possible spamming technique.
The third thing to do is to make sure you have no internal links to the files you are using meta refreshes on, and then just delete them. The only reason you might not want to do this would be if there were external links pointing to these files, and in this case, I think the 301 redirect would be better.
The reason I did it that way was that I have many thousand HTML pages arranged in quite a steep heirachy, each page has a 'further reading' link built into the page template, this link goes to a books.htm page in the same directory. If I don't have any books for that page I want it to redirect to the books.htm page at the next level down the directory structure. So I just put a books.htm file which redirects to ../books.htm.
That way when I get a book list for a particular page I just have to replace the one redirect file with a page containing the book list. All the alternatives I could think of seem to involve either,
1) updating all the incoming links each time I make a change.
2) maintaining a separate file with all the pages.
3) writing some sort of script to look down the tree for the first book.htm
I wanted to avoid these things and its annoying if I have to change the whole of my site because of a bug in the 'live' spider.