Welcome to WebmasterWorld Guest from 34.237.76.249

Forum Moderators: Ocean10000 & phranque

Message Too Old, No Replies

Why is the bot still haunting my website?

Can I have your help?

     
4:12 am on Nov 7, 2007 (gmt 0)

New User

10+ Year Member

joined:Nov 14, 2006
posts:36
votes: 0


I banned a bot by adding an ip entry both in host.deny and in .htaccess.
However, the bot is still haunting my website endlessly 24/7.
Below are the two strange things,

1. The apache log always shows pairs of records like the following for the bot. One always has a 302 response code and one always 403. What do that mean?

ip-122-152-129-12.asianetcom.net - - [07/Nov/2007:12:49:25 +0900] "GET /modules.php?name=Forums&t=10948 HTTP/1.1" 302 506 "-"
"Baiduspider+(+http://www.baidu.com/search/spider_jp.html)"
ip-122-152-129-12.asianetcom.net - - [07/Nov/2007:12:49:25 +0900] "GET / HTTP/1.1" 403 4114 "-" "Baiduspider+(+http://www.baid
u.com/search/spider_jp.html)"

2. The Webalizer shows the statistics below, in which Files is zere while Hits and KBytes are still growing up every day. It seems to me the bot still can access to my website. My blocking is not working?

Hits Files KBytes Visits Hostname
140119 107929 4519965 3 crawl-66-249-73-154.googlebot.com
55677 0 125608 2 ip-122-152-129-12.asianetcom.net

My basic questions are
1. Have I succeeded in blocking the bot?
2. Why is the bot still haunting my website?
3. Is the bot doing any trick or anything on my website?

11:27 am on Nov 7, 2007 (gmt 0)

Full Member from BE 

10+ Year Member

joined:Dec 3, 2006
posts:262
votes: 1


Adding a 403 rule to .htaccess will only prevent the target from getting content, not from filling your logs. From what I can see, you have a rule set to redirect to your site's homepage, and then another one that refuse access to this page. You'll need to generalize the second rule to reply directly with a 403.

My advice is to set up a proper robots.txt and Baidu won't come anymore.

8:30 am on Nov 8, 2007 (gmt 0)

New User

10+ Year Member

joined:Nov 14, 2006
posts:36
votes: 0


Thanks Achernar for the response.

Actually I didn't set any rule. Just simply blocked the ip of the host.

I'm not redirecting any pages, yet the bot always creates a 302 response code. The url /modules.php?name=Forums&t=10948 is even not a correct and valid one. Since it always comes with a 302/403 pair, I wonder what the bot is doing, and if it's doing any trick to get around blocking.

3:00 pm on Nov 8, 2007 (gmt 0)

Full Member from BE 

10+ Year Member

joined:Dec 3, 2006
posts:262
votes: 1


If you see a 302 for a page that doesn't exist, it means that your server is not configured (correctly) to display 404 pages.

Does this bot tries to get a file name robots.txt? It would help us to know if this is a good or a bad bot.

3:04 pm on Nov 8, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


The 'bot may be requesting pages from your site using a non-canonical address (www- versus non-www, for example), the server IP address, or another domain name that you have 'pointed' to your main domain.

Use a server headers checker such as the "Live HTTP Headers" add-on for Firefox/Mozilla browsers to test any/all applicable scenarios, and see if you get a 302-Found response from your server or domain-pointing service.

Jim