Welcome to WebmasterWorld Guest from 54.147.44.93

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

Baidu fetched disallowed files

and banned itself

     

jdMorgan

8:56 pm on Apr 15, 2003 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



baiduspider can back early today. It did fetch robots.txt, but apparently did not parse it correctly. It ended up fetching a disallowed file, which happens to redirect to a spider trap, and banned itself by IP address.

We have discussed Baidu at length here, and I've tried to give them the benefit of the doubt. However, they don't seem to be capable of coding their spider to correctly fetch and parse robots.txt. I'm leaving the ban in place until they get it right.

Jim

SEO practioner

2:42 am on Apr 16, 2003 (gmt 0)

10+ Year Member



Jd: what is baiduspider?

tks

carfac

4:12 am on Apr 16, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Jim:

This is one I can say, "I TOLD YOU!"

SEO:

Bauduispider is the spider for a search engine... I think in Japan (might be China)... I often found this bad boy misbehaving.

I am halfway like Don on this one... It behaved so badly on my site (about 6 months ago) that I completely banned it.

dave

mvl22

11:42 am on Apr 23, 2003 (gmt 0)

10+ Year Member



I've found it has tended to misbehave also, and am dithering over whether to ban it.

In recent weeks it has requested several times the file

/SIteTECh/global.Css

my server is unix-based, so requests are case sensitive. A file /sitetech/global.css does exist, but only lower-case, i.e. it's generating a 404 at present because /SIteTECh/global.Css doesn't actually exist.

There are no links on any of my pages to the above, and I really doubt there are external links to a non-existent file, so why is it making a request for this file which has never existed?

carfac

3:11 pm on Apr 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I believe someone fgrom baiduspider posted a month to two months ago on this forum... you might want to do a search and see what they had to say. I seem to remember they said they had cleaned it up...

Like I said, I banned it, and have not unbanned it.... I just do not see the need to be indexed in this one. But that is my own decision based on my goals, and my sites. YMMV!

dave

mvl22

10:50 am on Apr 29, 2003 (gmt 0)

10+ Year Member



I got a reply from them. They said:

"We are testing if your site is case sensitive or not. So we change some character in the filename to uppercase th get it. If we can get it, your site is not case sensitive, and we will change all characters in the urls of your site to avoid get duplicate page in your site. I am very sorry to trouble you."

-- Personally I think it is unacceptable deliberately to cause errors on other people's sites, time after time. If they want to check for duplicate pages they should just do a comparison of two files...

 

Featured Threads

Hot Threads This Week

Hot Threads This Month