homepage Welcome to WebmasterWorld Guest from 107.22.45.61
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Accredited PayPal World Seller

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Baidu fetched disallowed files
and banned itself
jdMorgan




msg:396623
 8:56 pm on Apr 15, 2003 (gmt 0)

baiduspider can back early today. It did fetch robots.txt, but apparently did not parse it correctly. It ended up fetching a disallowed file, which happens to redirect to a spider trap, and banned itself by IP address.

We have discussed Baidu at length here, and I've tried to give them the benefit of the doubt. However, they don't seem to be capable of coding their spider to correctly fetch and parse robots.txt. I'm leaving the ban in place until they get it right.

Jim

 

SEO practioner




msg:396624
 2:42 am on Apr 16, 2003 (gmt 0)

Jd: what is baiduspider?

tks

carfac




msg:396625
 4:12 am on Apr 16, 2003 (gmt 0)

Jim:

This is one I can say, "I TOLD YOU!"

SEO:

Bauduispider is the spider for a search engine... I think in Japan (might be China)... I often found this bad boy misbehaving.

I am halfway like Don on this one... It behaved so badly on my site (about 6 months ago) that I completely banned it.

dave

mvl22




msg:396626
 11:42 am on Apr 23, 2003 (gmt 0)

I've found it has tended to misbehave also, and am dithering over whether to ban it.

In recent weeks it has requested several times the file

/SIteTECh/global.Css

my server is unix-based, so requests are case sensitive. A file /sitetech/global.css does exist, but only lower-case, i.e. it's generating a 404 at present because /SIteTECh/global.Css doesn't actually exist.

There are no links on any of my pages to the above, and I really doubt there are external links to a non-existent file, so why is it making a request for this file which has never existed?

carfac




msg:396627
 3:11 pm on Apr 23, 2003 (gmt 0)

I believe someone fgrom baiduspider posted a month to two months ago on this forum... you might want to do a search and see what they had to say. I seem to remember they said they had cleaned it up...

Like I said, I banned it, and have not unbanned it.... I just do not see the need to be indexed in this one. But that is my own decision based on my goals, and my sites. YMMV!

dave

mvl22




msg:396628
 10:50 am on Apr 29, 2003 (gmt 0)

I got a reply from them. They said:

"We are testing if your site is case sensitive or not. So we change some character in the filename to uppercase th get it. If we can get it, your site is not case sensitive, and we will change all characters in the urls of your site to avoid get duplicate page in your site. I am very sorry to trouble you."

-- Personally I think it is unacceptable deliberately to cause errors on other people's sites, time after time. If they want to check for duplicate pages they should just do a comparison of two files...

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved