homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Microsoft / Bing Search Engine News
Forum Library, Charter, Moderators: mack

Bing Search Engine News Forum

my bandwidth being used up by 'live' crawler
redirect seems to cause problem

5+ Year Member

Msg#: 3516455 posted 7:25 pm on Nov 29, 2007 (gmt 0)

I am getting a lot of multiple reads (100's) of particular files on my site from a crawler (or spider or whatever the correct name is) identifying itself as 'live'.

This appears to happen because I have many files redirecting to the file being hit like this:

<meta http-equiv="refresh" content="0;URL=../books.htm">
<body bgcolor="#FFFFFF" text="#000000">

There are multiple files redirecting to the file 'books.htm' but when the crawler is redirected to the file in this way it does not appear to remember that it has been redirected there before so it reads it again. So the file 'books.htm' is getting hundreds of hits.

This is using up a lot of bandwidth is there any way to stop it?




10+ Year Member

Msg#: 3516455 posted 11:40 pm on Dec 1, 2007 (gmt 0)

If indeed your theory is correct, then one way to stop it would be to use robots.txt to tell SE spiders not to index the file(s) that are getting redirected to books.htm. Since the files have no content anyway, I don't see any harm in dropping them from the index.

Another possibility would be to use a 301 redirect instead of the meta refresh to tell robots that the content has moved permanently to the new URL. I'm not sure how the Live crawler interprets meta refreshes, but at one point, they were frowned upon as a possible spamming technique.

The third thing to do is to make sure you have no internal links to the files you are using meta refreshes on, and then just delete them. The only reason you might not want to do this would be if there were external links pointing to these files, and in this case, I think the 301 redirect would be better.


5+ Year Member

Msg#: 3516455 posted 4:07 pm on Dec 2, 2007 (gmt 0)

Thanks for the reply, I realise what I'm doing is not ideal but as far as I could see it is valid, its been like that for many years and no other spiders have had problems with it until I stated getting massive hits apparently from 'live' starting about 3 months ago.

The reason I did it that way was that I have many thousand HTML pages arranged in quite a steep heirachy, each page has a 'further reading' link built into the page template, this link goes to a books.htm page in the same directory. If I don't have any books for that page I want it to redirect to the books.htm page at the next level down the directory structure. So I just put a books.htm file which redirects to ../books.htm.

That way when I get a book list for a particular page I just have to replace the one redirect file with a page containing the book list. All the alternatives I could think of seem to involve either,
1) updating all the incoming links each time I make a change.
2) maintaining a separate file with all the pages.
3) writing some sort of script to look down the tree for the first book.htm

I wanted to avoid these things and its annoying if I have to change the whole of my site because of a bug in the 'live' spider.


Global Options:
 top home search open messages active posts  

Home / Forums Index / Microsoft / Bing Search Engine News
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved