Forum Moderators: DixonJones
404 Document Not Found 11,500 hits
301 Moved Permanently (redirect) 10,100 hits
With the additional anomaly of the site being automatically shutdown twice by my server as a result of bandwidth overuse! I has my normal and relatively low typical bandwidth limit set at time a new site is made.
However, the website only received 50 visitors for the same time period. Very low traffic!
Does anyone know or at least guess as to how this is possible? Thanks.
[edited by: trader at 5:14 pm (utc) on Mar. 19, 2008]
[robotstxt.org...]
[google.com...]
[help.yahoo.com...]
A rogue bot is not going to obey the robots.txt, but at least you can use it to slow down the good bots and cut down on the 404 errors. More info about spiders in the spider forum [webmasterworld.com].
The person I purchased the domain from had a business dispute with a 3rd party who wanted to block the sale of the domain to me.
I am fairly certain this is caused by attacks to cause me trouble. In fact, the 3rd party basically threatened to bring me trouble when I was negotiating to buy the domain (which he heard about at another forum and the sale venue).
My server automatically shut the site down again today (the 3rd time) for excessive bandwidth (in spite of only a few valid visits) even though I increased the bandwidth significantly the day before. What can be done? Please help!
Did you check your log files to see what IP adresses (and any referrers) the bots are coming from?
Identify the bots, identify the IPs, identify referrers (yes, referrers because they could be riding in on links to non-existent pages). If they're coming from the same IP addresses or blocks, those are easy to ban with htaccess. If they're using proxies that's different. Incredibill is a master of that. Hey Bill, where are you?!
And do put up a robots.txt to cut down on the 404 errors.
automatically shutdown twice by my server as a result of bandwidth overuse
Define high bandwidth?
The person I purchased the domain from had a business dispute with a 3rd party who wanted to block the sale of the domain to me.
That could easily explain all of the 404s and the 301s if you no longer have the same content on the server.
Perhaps the previous owner used "www." and you don't use "www." or vice versa, therefore every hit to your server using the previous convention would now get redirected to the proper location.
The good bots looking for previously existing content could be responsible for most if not all of the 404s.
Thanks but the problem is that in all likelilood it would be a rogue bot. The good bots are insignificant.
That's not exactly true as rogue bots often cause problems that the good bots escalate. I have rogue bots scrape my site and often mangle URLs that the good bots then detect on scraper sites and I then get tons of resulting 404s. The only way to fix the problem is to 301 redirect the broken page names to the corrected page names and the SE's eventually fix the problem.
However, in your case, it sounds like you're getting hit by page names that previously existed.
Doesn't mean a rogue bot, bot net or something else couldn't be at work here, but I suspect it may be simpler than you think.
Check your AWStats under "Pages not found" and see what they are actually asking for and you'll have a better clue.
[edited by: incrediBILL at 8:33 pm (utc) on Mar. 23, 2008]
What I would do is find out what exactly is being called for that isn't being found and where the request are coming from. If you are lucky all you will need to do is block the ip and have things go back to normal. If you are only getting a few real visitors to the site then your website logs will be almost 100% from the jerk trying to cause you problems.
Either way bots or butts the logs should point you in the right direction.
I managed to at least temporarily solve the problem with a much higher bandwidth allocation and luckily my server has not shut it down in a long time.
It appears my non-use of www was a factor as there were many 404 errors from the www version to non-www (also many 301's). Here are the Match AWstats http status codes which are quite high:
404 Document Not Found 18,490 55%
301 Moved permanently (redirect) 14,960 44%
P.S. Any more ideas why these error codes are so significant? Actual uniques and visitor numbers are contrarily very low.