|Baidu fetched disallowed files|
and banned itself
| 8:56 pm on Apr 15, 2003 (gmt 0)|
baiduspider can back early today. It did fetch robots.txt, but apparently did not parse it correctly. It ended up fetching a disallowed file, which happens to redirect to a spider trap, and banned itself by IP address.
We have discussed Baidu at length here, and I've tried to give them the benefit of the doubt. However, they don't seem to be capable of coding their spider to correctly fetch and parse robots.txt. I'm leaving the ban in place until they get it right.
| 2:42 am on Apr 16, 2003 (gmt 0)|
Jd: what is baiduspider?
| 4:12 am on Apr 16, 2003 (gmt 0)|
This is one I can say, "I TOLD YOU!"
Bauduispider is the spider for a search engine... I think in Japan (might be China)... I often found this bad boy misbehaving.
I am halfway like Don on this one... It behaved so badly on my site (about 6 months ago) that I completely banned it.
| 11:42 am on Apr 23, 2003 (gmt 0)|
I've found it has tended to misbehave also, and am dithering over whether to ban it.
In recent weeks it has requested several times the file
my server is unix-based, so requests are case sensitive. A file /sitetech/global.css does exist, but only lower-case, i.e. it's generating a 404 at present because /SIteTECh/global.Css doesn't actually exist.
There are no links on any of my pages to the above, and I really doubt there are external links to a non-existent file, so why is it making a request for this file which has never existed?
| 3:11 pm on Apr 23, 2003 (gmt 0)|
I believe someone fgrom baiduspider posted a month to two months ago on this forum... you might want to do a search and see what they had to say. I seem to remember they said they had cleaned it up...
Like I said, I banned it, and have not unbanned it.... I just do not see the need to be indexed in this one. But that is my own decision based on my goals, and my sites. YMMV!
| 10:50 am on Apr 29, 2003 (gmt 0)|
I got a reply from them. They said:
"We are testing if your site is case sensitive or not. So we change some character in the filename to uppercase th get it. If we can get it, your site is not case sensitive, and we will change all characters in the urls of your site to avoid get duplicate page in your site. I am very sorry to trouble you."
-- Personally I think it is unacceptable deliberately to cause errors on other people's sites, time after time. If they want to check for duplicate pages they should just do a comparison of two files...