Forum Moderators: phranque
1. The apache log always shows pairs of records like the following for the bot. One always has a 302 response code and one always 403. What do that mean?
ip-122-152-129-12.asianetcom.net - - [07/Nov/2007:12:49:25 +0900] "GET /modules.php?name=Forums&t=10948 HTTP/1.1" 302 506 "-"
"Baiduspider+(+http://www.baidu.com/search/spider_jp.html)"
ip-122-152-129-12.asianetcom.net - - [07/Nov/2007:12:49:25 +0900] "GET / HTTP/1.1" 403 4114 "-" "Baiduspider+(+http://www.baid
u.com/search/spider_jp.html)"
2. The Webalizer shows the statistics below, in which Files is zere while Hits and KBytes are still growing up every day. It seems to me the bot still can access to my website. My blocking is not working?
Hits Files KBytes Visits Hostname
140119 107929 4519965 3 crawl-66-249-73-154.googlebot.com
55677 0 125608 2 ip-122-152-129-12.asianetcom.net
My basic questions are
1. Have I succeeded in blocking the bot?
2. Why is the bot still haunting my website?
3. Is the bot doing any trick or anything on my website?
My advice is to set up a proper robots.txt and Baidu won't come anymore.
Actually I didn't set any rule. Just simply blocked the ip of the host.
I'm not redirecting any pages, yet the bot always creates a 302 response code. The url /modules.php?name=Forums&t=10948 is even not a correct and valid one. Since it always comes with a 302/403 pair, I wonder what the bot is doing, and if it's doing any trick to get around blocking.
Use a server headers checker such as the "Live HTTP Headers" add-on for Firefox/Mozilla browsers to test any/all applicable scenarios, and see if you get a 302-Found response from your server or domain-pointing service.
Jim