Forum Moderators: open

Message Too Old, No Replies

Baiduspider weird hits

         

winexec

4:50 pm on Apr 28, 2016 (gmt 0)

10+ Year Member



Baiduspider accesses my site at every few minutes. That wouldn't be much of a problem, but Baidu hits are on inexistent URLs like:

/for-sale-cat-m19/franklin+tn+sale
/jobs-cat-m69/lpn+jobs+register
/administrative-jobs-cat-74/administrative+coordinator+information+systems+manager
/distributors-wanted-cat-90/wanted+1989+cadillac+brougham+in+massachusetts

Any idea on what is that? I see such entries in my logs for some time.

Thank you!

lucy24

7:57 pm on Apr 28, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Is it the real Baiduspider? It seems unlikely. I see a lot of spoofed ones-- which is pretty hilarious, since it's not as if the real one would be welcomed with open arms.

If you do admit the real Baiduspider, you could make the same type of rule that many people have for the Googlebot: If the user-agent is {suchandsuch} but it doesn't come from {known suchandsuch IPs}, then deny it. Remember to poke a hole for (a) robots.txt and (b) your custom 403 page if you've got one.

winexec

4:19 am on Apr 29, 2016 (gmt 0)

10+ Year Member



It's the real Baiduspider (180.76.15.xxx). I 403'ed the user agent yesterday.

But what baffles me is WHY Baiduspider hits such inexistent pages. Baidu isn't crawling my site, like a spider, Baidu hits pages which I don't have (and never had). Now I see hits on:

/peoplesoft-expert-in-murfreesboro/
/lawson-expert-in-schaumburg/
/peoplesoft-consulting-in-passaic/
/crystalreports-consultant-in-whittier/
/lawson-consultant-in-scranton/

keyplyr

6:33 am on Apr 29, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I get traffic from Baidu Search. I use an allowed IP list for Baiduspider. The spoofed UAs I see are usually from either Chinanet or China Mobile.

@winexec - Like most SEs, they pull URIs from many resources. Sometimes you can find the source, sometimes not. Try searching for those paths you posted. Possibly someone else is experiencing similar issues.

winexec

6:53 am on Apr 29, 2016 (gmt 0)

10+ Year Member



Until I used the rewrite rule for Baiduspider, the hits were at every 3 minutes. Now the hits are at ~8 minutes.

I already searched for those paths and saw them only on some AWStats. Baidu hits others too :P

All the Baidu hits use that pattern and I cannot understand the reason behind that.

lucy24

8:01 am on Apr 29, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Oh, wait. Didn't somebody once have an analogous question, and they finally worked out it was a DNS hiccup? The requests matched real URLs from someone else's (unrelated) site, but the DNS information got garbled, so legitimate search engines were briefly requesting them from the wrong place.

Not sure I care for those paths with plus signs in them, though...

keyplyr

8:10 am on Apr 29, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



"plus signs" can be additions of search terms from a querry result.

winexec

8:42 am on Apr 29, 2016 (gmt 0)

10+ Year Member



It's not a DNS hiccup, it lasts for some time.

Other websites are hit in the same way.