Forum Moderators: DixonJones

Message Too Old, No Replies

How is high bandwidth & lots of 404/301 possible with low traffic?

         

trader

5:08 pm on Mar 19, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have a new domain/website where my HTTP Status Codes from AWstats so far this month show this:

404 Document Not Found 11,500 hits
301 Moved Permanently (redirect) 10,100 hits

With the additional anomaly of the site being automatically shutdown twice by my server as a result of bandwidth overuse! I has my normal and relatively low typical bandwidth limit set at time a new site is made.

However, the website only received 50 visitors for the same time period. Very low traffic!

Does anyone know or at least guess as to how this is possible? Thanks.

[edited by: trader at 5:14 pm (utc) on Mar. 19, 2008]

martinibuster

5:13 pm on Mar 19, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



>>>404 Document Not Found 11,500 hits
Do you have a robots.txt?

trader

5:15 pm on Mar 19, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No. Should I?

martinibuster

5:22 pm on Mar 19, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



If you want to see those 404 errors go down, yeah. Those are likely bots, which could also be behind the bandwidth issues.

trader

5:45 pm on Mar 19, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks but I am not too familiar with robots.txt

Can you point me to a good robots.txt file example and how it could be used as far as what it should contain to exclude bots?

martinibuster

6:08 pm on Mar 19, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



[webmasterworld.com...]

[robotstxt.org...]

[google.com...]

[help.yahoo.com...]

A rogue bot is not going to obey the robots.txt, but at least you can use it to slow down the good bots and cut down on the 404 errors. More info about spiders in the spider forum [webmasterworld.com].

trader

2:01 pm on Mar 20, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks but the problem is that in all likelilood it would be a rogue bot. The good bots are insignificant.

The person I purchased the domain from had a business dispute with a 3rd party who wanted to block the sale of the domain to me.

I am fairly certain this is caused by attacks to cause me trouble. In fact, the 3rd party basically threatened to bring me trouble when I was negotiating to buy the domain (which he heard about at another forum and the sale venue).

My server automatically shut the site down again today (the 3rd time) for excessive bandwidth (in spite of only a few valid visits) even though I increased the bandwidth significantly the day before. What can be done? Please help!

g1smd

7:14 pm on Mar 23, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I assume that the majority of the requests are for the URLs of the old site.

What do your site logs say?

martinibuster

7:42 pm on Mar 23, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



>>>Thanks but the problem is that in all likelilood it would be a rogue bot.

Did you check your log files to see what IP adresses (and any referrers) the bots are coming from?

Identify the bots, identify the IPs, identify referrers (yes, referrers because they could be riding in on links to non-existent pages). If they're coming from the same IP addresses or blocks, those are easy to ban with htaccess. If they're using proxies that's different. Incredibill is a master of that. Hey Bill, where are you?!

And do put up a robots.txt to cut down on the 404 errors.

incrediBILL

8:32 pm on Mar 23, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



automatically shutdown twice by my server as a result of bandwidth overuse

Define high bandwidth?

The person I purchased the domain from had a business dispute with a 3rd party who wanted to block the sale of the domain to me.

That could easily explain all of the 404s and the 301s if you no longer have the same content on the server.

Perhaps the previous owner used "www." and you don't use "www." or vice versa, therefore every hit to your server using the previous convention would now get redirected to the proper location.

The good bots looking for previously existing content could be responsible for most if not all of the 404s.

Thanks but the problem is that in all likelilood it would be a rogue bot. The good bots are insignificant.

That's not exactly true as rogue bots often cause problems that the good bots escalate. I have rogue bots scrape my site and often mangle URLs that the good bots then detect on scraper sites and I then get tons of resulting 404s. The only way to fix the problem is to 301 redirect the broken page names to the corrected page names and the SE's eventually fix the problem.

However, in your case, it sounds like you're getting hit by page names that previously existed.

Doesn't mean a rogue bot, bot net or something else couldn't be at work here, but I suspect it may be simpler than you think.

Check your AWStats under "Pages not found" and see what they are actually asking for and you'll have a better clue.

[edited by: incrediBILL at 8:33 pm (utc) on Mar. 23, 2008]

omegaman66

12:50 am on Mar 24, 2008 (gmt 0)

10+ Year Member



I would think something must be amiss if you are getting excessive bandwidth for a new site or even an old site for that matter from bots unless you bandwidth limit is very low.

What I would do is find out what exactly is being called for that isn't being found and where the request are coming from. If you are lucky all you will need to do is block the ip and have things go back to normal. If you are only getting a few real visitors to the site then your website logs will be almost 100% from the jerk trying to cause you problems.

Either way bots or butts the logs should point you in the right direction.

trader

3:40 am on Apr 12, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for all the replies. My bad in this late response and not replying individually. I am researching it more and looking at your good suggestions.

I managed to at least temporarily solve the problem with a much higher bandwidth allocation and luckily my server has not shut it down in a long time.

It appears my non-use of www was a factor as there were many 404 errors from the www version to non-www (also many 301's). Here are the Match AWstats http status codes which are quite high:

404 Document Not Found 18,490 55%
301 Moved permanently (redirect) 14,960 44%

P.S. Any more ideas why these error codes are so significant? Actual uniques and visitor numbers are contrarily very low.