Welcome to WebmasterWorld Guest from 54.146.55.156

Forum Moderators: Ocean10000 & incrediBILL & keyplyr

Message Too Old, No Replies

BingBot Fake

     
10:16 am on Jan 31, 2013 (gmt 0)

New User

joined:Oct 9, 2012
posts:34
votes: 0


Looks as though I had me a fake BingBot visit

109.72.87.192 - 109.72.87.255

109.72.87.xxx resolved to hosted.by.pcex-xxx-treme

109.72.87.xxx - - [31/Jan/2013:00:23:56 -0800] "GET / HTTP/1.1" 403 3573 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +ht---tp://www.example.com/bingbot.htm)"

Ken
12:53 pm on Jan 31, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


In htaccess I block anything that claims to be a popular search engine bot, but is not coming from a small range of specific IPs.
1:26 pm on Jan 31, 2013 (gmt 0)

New User

joined:Jan 30, 2013
posts:6
votes: 0


scrapers gunna scrape
3:39 pm on Jan 31, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Aug 29, 2006
posts:1312
votes: 0


scrapers gunna scrape

In this case they appear to have scraped nothing but a 403 page.

Seems like a cunning plan.

...
10:32 pm on Jan 31, 2013 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3121
votes: 3


I've seen a lot of bing and google bot UAs from "illegal" IPs over the past few weeks - more than usual but blocks are already in place so not a problem.
11:42 pm on Jan 31, 2013 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13210
votes: 347


Careful with the bings. They've been moving into all kinds of obscure new IPs. (Apparently they missed the early-90s /8 handout by about five minutes, so they have to grab IPs where they can find them.) If it's an unfamiliar range and the visitor begins by asking for robots.txt, it may be worth looking them up.
8:32 pm on Feb 1, 2013 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3121
votes: 3


The ones I'm seeing are from broadband and known server ranges. They're fake. :)
1:45 pm on Feb 2, 2013 (gmt 0)

Junior Member

10+ Year Member

joined:Apr 13, 2005
posts: 128
votes: 0


I bet we had more than 40 IPs just in the past week claiming to be BingBot. Between that and the Majestic12 (MJ12) bot fakes, it's been a fun week.
4:32 pm on Feb 2, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2004
posts:1835
votes: 47


Majestic12 are not fakes but the unofficial UFO visitors that haven't gotten the DRILL yet. Like that girl with the shiny teeth on the back of the free local newspaper in your town.

and such....
11:32 pm on Feb 2, 2013 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13210
votes: 347


How can you tell if an MJ12 is real or fake? Urk. That sounds like the straight-line to a joke. Last time I looked they didn't have an IP of their own so you had to take them on faith.
11:45 pm on Feb 2, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5459
votes: 3


lucy,
There's old threads by the bot owner who was a forum participant for the the longest while (at least five years).

He set up some type of conformation per Jim's request tailored to your inquiry and/or websites.

In addition, the most widespread of these UA's are the version number he cited as an early version of the software and NOT part of the bots current version.

I've no clue how to locate these old threads.
3:59 am on Feb 3, 2013 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:6546
votes: 114


How can you tell if an MJ12 is real or fake?

Isn't MJ12 the one where you sign-up to get SEO data (similar to GWT) in exchange for letting them crawl site? If so, they give you an access code to allow only their authentic bot. I did this for awhile but didn't see the benefit so I stopped allowing them.
6:25 am on Feb 3, 2013 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13210
votes: 347


Well, ###. So if I have no idea what you're talking about, it means that every "MJ12" bot I've ever met is fake? :)

:: detour to www site helpfully linked in all UA strings, presumably including the fake ones ::

OK, not so bad. Version numbers seem to fit.
7:43 am on Feb 3, 2013 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:6546
votes: 114


Well, ###. So if I have no idea what you're talking about, it means that every "MJ12" bot I've ever met is fake? :)
Not at all.

The access code is just so you can allow the real one and block the others. If I remember correctly, you sign-up for an account at MJ12 and during that process you choose an access code (example: lucy24) then he confirms via email. Then you use that access code in a mod_access filter allowing MJ12 bots that have the code in their UA string, but denying other MJ12 bots that don't. Then I think you need to specifically allow his bot in robots.txt as well (never did understand this part.)

This opt-in filter worked very well. I just discontinued it since I didn't utilize the seo data he offers, you may. He obviously uses your collected site data for business, but at least he is up front about it and offers a trade-off unlike a lot of domain-info-scrapers out there.
1:19 pm on Feb 3, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2004
posts:1835
votes: 47


I've no clue how to locate these old threads.


Don,

I think these are the threads on MJ12.

[webmasterworld.com...]
[webmasterworld.com...]

Otherwise,

Most resent from Lord_Majestic:

[webmasterworld.com...]
10:39 pm on Feb 3, 2013 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3121
votes: 3


I didn't subscribe to majestic's philosophy and I also decided it was easier to block all MJ bots with both robots.txt (for legit ones) and by 403 (for illegal ones). Have to say the legit ones seem to obey robots.txt.
12:12 am on Feb 4, 2013 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:6546
votes: 114




Then you use that access code in a mod_access filter allowing MJ12 bots

Sorry, I meant to say mod_rewrite (not mod_access.)
1:58 am on Feb 4, 2013 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13210
votes: 347


... and if you had meant mod_access, you would really have meant mod_authz-thingie ;)
3:45 am on Feb 4, 2013 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:6546
votes: 114



It's that thingie that's the real challenge!