homepage Welcome to WebmasterWorld Guest from 54.235.16.159
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

This 131 message thread spans 5 pages: < < 131 ( 1 2 [3] 4 5 > >     
MSN fakes referrers
SEOPTI




msg:3875365
 6:14 pm on Mar 20, 2009 (gmt 0)

This has been discussed in 2007:
[webmasterworld.com...]

They do it again, I see hundreds of fake visitors from MSN IPs across all of my domains.

Are there any news what they try to accomplish by doing this?

 

GaryK




msg:3941996
 5:45 pm on Jun 28, 2009 (gmt 0)

Another week and yet more fake referrers from MSN. Personally I've had enough. I just don't get enough traffic from MSN to make this worthwhile so as of now they're banned from all my sites.

6/24/2009
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30707; MS-RTC LM 8)
65.55.104.64
msnbot-65-55-104-64.search.msn.com
[search.live.com...]

6/27/2009
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; MS-RTC LM 8)
65.55.104.63
msnbot-65-55-104-63.search.msn.com
[search.live.com...]

dstiles




msg:3942075
 8:52 pm on Jun 28, 2009 (gmt 0)

One wonders: since bing is now referring traffic, can ALL live.com's be blocked?

GaryK




msg:3942104
 10:00 pm on Jun 28, 2009 (gmt 0)

Did the blog post we were promised in a few days to a week back on May 18th ever appear?

can ALL live.com's be blocked?

Do you mean bots or what appears to be legit traffic?

caribguy




msg:3942211
 3:01 am on Jun 29, 2009 (gmt 0)

Simple enough:
RewriteCond %{REMOTE_ADDR} ^(65\.5[2-5]\.)
RewriteCond %{HTTP_REFERER} ^(http\:\/\/search\.live\.com\/results\.aspx\?q\=) [NC] # Faked
RewriteRule (.*) - [F,L]
[/code]

dstiles: no, these are ok:

hxxp://by999w.bay999.mail.live.com/mail/InboxLight.aspx?FolderID=00000000-0000-0000-0000-000000000999&InboxSortAscending=False&InboxSortBy=Date&n=999[/

Pfui




msg:3942217
 3:12 am on Jun 29, 2009 (gmt 0)

Aside from an Apache 1.3.x user: mod_rewrite instructions with same-line # Comments cause errors. Traditional left-margin/whole-line # Comments are A-OK.

dstiles




msg:3942669
 8:58 pm on Jun 29, 2009 (gmt 0)

GaryK: I meant traffic - I know the bots are still tagged as msnbot (when they are behaving themselves, anyway!). Haven't seen any posting from the msn guy. Maybe he fell foul of the MS anti-publicity brigade? :)

caribguy: thanks. I need to adapt the regex, though, as I'm using IIS, which is poor on clever stuff. Hence my question: can I block all live.com referers regardless.

GaryK




msg:3946365
 5:11 am on Jul 5, 2009 (gmt 0)

Seems they're still doing it. I'm not gonna bother posting details anymore. Suffice to say my logs are full of referrer spam from msnbot again this week. I banned the bot from most of my sites so how it's hitting the ones it can still get thru to; the few I can't really ban if from cause I need to see what it's doing.

Haven't seen any posting from the msn guy. Maybe he fell foul of the MS anti-publicity brigade?

I just wanted to be sure I didn't miss a blog post. I guess we've been abandoned.

Pfui




msg:3946389
 6:52 am on Jul 5, 2009 (gmt 0)

The MS-connected posters in this thread, newcomer MS_Jason [webmasterworld.com] and old-timer msndude [webmasterworld.com], haven't posted in any WebmasterWorld forum since April and May, respectively. (I don't know how msndude found the time to post as much as he did around here in the first place.)

GaryK




msg:3946495
 2:57 pm on Jul 5, 2009 (gmt 0)

I don't know how they found time either. But msndude made a promise to us that seems to have been broken. Even after going so far as to state he, really [does] enjoy helping people overcome issues. I think we're within reason to voice our unhappiness with the situation so long as it's not seen as a personal attack. Am I the only one here who still feels strongly about this issue? If so I'll let it go.

dstiles




msg:3946544
 5:20 pm on Jul 5, 2009 (gmt 0)

This is really getting stupid now. Just received a fake referer from an MSN IP with an rDNS of msnbot. The only times I've seen the dumb start of the UA "user-agent=" is from site-scrapers and really badly corrupted UAs. On the other hand, MS updates seem to corrupt UAs quite badly anyway so perhaps they're just trying out their ordinary browser? :(

UserAgents=Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30707; InfoPath.2)

I keep promising myself I'll dump the whole parcel of fake referers. It's just finding the time to test the block.

blend27




msg:3946717
 4:01 am on Jul 6, 2009 (gmt 0)

I have subbed my time watching MSN Bots going wild with getting a grove-on with some Amazing SALSA Music!

They could have all the fun with the sum that = 7(4+~+3)

It is so much Toonier! Telling Ya! Abosuletly Lovely! Let's kill this thread... They don't care :(

Pfui




msg:3947338
 1:49 am on Jul 7, 2009 (gmt 0)

Don't know if it's related to blocking the fake refs but --

Have you checked your main site(s) on bing.com? I did earlier today and it's as if we've been dropped. All that appears is the bare URL, plus a few links blocked by robots.txt ('natch).

That's it. No title, no info, no doodad to explore the site, no nothing. One. Bare. URL. And that's after 2,372 hits by msnbot last month alone (not including msnbot's variations and MS's other bots), and for a long-authenticated, long-standing, 15,000- plus incoming links site.

And the msnbot hit count is also only a fraction of MS search-admin hits because the majority of the fake ref hits are done by cloaked UAs:

msnbot-65-55-104-67.search.msn.com
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)

Fake Ref: [search.live.com...]

In comparison, Googlebot hit the same site 1,704 times, and about 90% of the sitemap URLs appear for multiple keywords, and include titles, descriptions, etc.

Anyone else discover that when it comes to your site(s), Bing's nothing but the pits?

GaryK




msg:3947351
 2:15 am on Jul 7, 2009 (gmt 0)

Just checked it now. Sure wish I could corroborate your experience.

My main money site is in the #1 spot for all its keywords. Full description, pop-up thingy with more info and related links. The royal treatment!

MSN and its related-bots have been blocked from crawling that site for a week now.

ADDED: Did you notice msndude was on the site today. He posted in another thread [webmasterworld.com], but ignored this one. This one was far more important to us than the one he posted in.

[edited by: GaryK at 2:20 am (utc) on July 7, 2009]

Pfui




msg:3947512
 8:38 am on Jul 7, 2009 (gmt 0)

I'm happy you're A-OK with Bing, Gary, and also that msndude's haunting WW again. (Besides, not posting doesn't mean not reading:) It'll be interesting to see how long you receive Bing's royal treatment after kicking its emissaries out of your kingdom. Interesting paradoxical effect, that.

GaryK




msg:3947722
 3:04 pm on Jul 7, 2009 (gmt 0)

Never said I was A-OK with Bing. :)

Nearly 80% of my traffic comes from Google. Most of the rest comes from links on other sites. Bing, Yahoo! and a few others account for a few percent each.

And yet many days Bing would use more bandwidth than all the other crawlers combined. This despite Google being on the site almost constantly.

GaryK




msg:3951049
 6:09 pm on Jul 12, 2009 (gmt 0)

I'm not sure whether to take what happened this week personally or not.

I'm inclined to think it's far too egotistical for me to consider I make any kind of difference at all to MSN/Live/Bing.

And yet, after announcing here I had blocked msnbot, every single one of my almost 400 sites got hit by the same kind of traffic as above. Every single day for the entire week. Sometimes multiple times per day.

Same odd-ball UAs. Same fake referrers. The major difference was not one of the IP Addresses had a rDNS. WhoIs revealed every one of them belonged to Microsoft.

I'm not even going to attempt posting them all. Here's a representative sample.

How the heck do I get rid msnbot now?

Is there any correlation between what's happening in this thread and the new thread Pfui started [webmasterworld.com]?

7/5/2009
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30707; MS-RTC LM 8)
65.55.109.88
No PTR
-----
OrgName: Microsoft Corp
OrgID: MSFT
Address: One Microsoft Way
City: Redmond
StateProv: WA
-----
[search.live.com...]
[search.live.com...]
---------------------------------------
7/7/2009
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)
65.55.109.186
No PTR
-----
OrgName: Microsoft Corp
OrgID: MSFT
Address: One Microsoft Way
City: Redmond
StateProv: WA
-----
[search.live.com...]
[search.live.com...]
---------------------------------------
7/9/2009
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30707; InfoPath.2)
65.55.109.182
No PTR
-----
OrgName: Microsoft Corp
OrgID: MSFT
Address: One Microsoft Way
City: Redmond
StateProv: WA
-----
[search.live.com...]
---------------------------------------
7/10/2009
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)
65.55.110.82
No PTR
-----
OrgName: Microsoft Corp
OrgID: MSFT
Address: One Microsoft Way
City: Redmond
StateProv: WA
-----
[search.live.com...]
---------------------------------------
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4325)
65.55.109.208
No PTR
-----
OrgName: Microsoft Corp
OrgID: MSFT
Address: One Microsoft Way
City: Redmond
StateProv: WA
-----
[search.live.com...]
---------------------------------------
7/11/2009
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.30729; .NET CLR 3.5.30729; MS-RTC LM 8)
65.55.109.242
No PTR
-----
OrgName: Microsoft Corp
OrgID: MSFT
Address: One Microsoft Way
City: Redmond
StateProv: WA
-----
[search.live.com...]
---------------------------------------
7/11/2009
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.50727)
65.55.107.202
No PTR
-----
OrgName: Microsoft Corp
OrgID: MSFT
Address: One Microsoft Way
City: Redmond
StateProv: WA
-----
[search.live.com...]
---------------------------------------
7/11/2009
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)
65.55.110.66
No PTR
-----
OrgName: Microsoft Corp
OrgID: MSFT
Address: One Microsoft Way
City: Redmond
StateProv: WA
-----
[search.live.com...]
---------------------------------------
7/11/2009
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.40607)
65.55.110.110
No PTR
-----
OrgName: Microsoft Corp
OrgID: MSFT
Address: One Microsoft Way
City: Redmond
StateProv: WA
-----
[search.live.com...]

wilderness




msg:3951077
 6:34 pm on Jul 12, 2009 (gmt 0)

And yet, after announcing here I had blocked msnbot, every single one of my almost 400 sites got hit by the same kind of traffic as above. Every single day for the entire week. Sometimes multiple times per day.

Gary,
Since this thread began (nearly four months ago) I've seen these partial searches arrive at my sites it batches. For weeks there are none and then suddenly they appear.
No rhyme or reason.

GaryK




msg:3951080
 6:37 pm on Jul 12, 2009 (gmt 0)

Guess I won't take it too personally then! ;)

tpeacock




msg:3952876
 11:16 am on Jul 15, 2009 (gmt 0)

After reading this thread, I was somewhat shocked to find a large number of these fake referrals on my main site in a search from last month from search.live.com. I found that every one word search term is highly related to my site. My main concern is keeping my CTR as high as possible on the pages (about 30%) I have ads. Unfortunately the majority of these searches hit pages with ads.

Bad News: After blocking the referrals for about a week, this morning (July 15) I discovered the same type of referrals coming from www.bing.com/search all in the IP range of 65.55.104.nnn. Examples:

7/15/2009
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)
65.55.104.66
No PTR
-----
OrgName: Microsoft Corp
OrgID: MSFT
Address: One Microsoft Way
City: Redmond
StateProv: WA
-----
[bing.com...]

-----------------------------------------------------

7/15/2009
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)
65.55.104.68
No PTR
-----
OrgName: Microsoft Corp
OrgID: MSFT
Address: One Microsoft Way
City: Redmond
StateProv: WA
-----
[bing.com...]

Is anyone else seeing this?

Thomas

Rosalind




msg:3952927
 1:09 pm on Jul 15, 2009 (gmt 0)

I've also noticed that the fake live.com referrers have switched to bing.com ones. So they have time to sort that out, but not to stop them altogether?

What are they trying to achieve with this? Is it a case of any publicity will do, no matter how bad? Has someone forgotten to inform the techies that ref. spam is strictly for desperate slimeballs?

SEOPTI




msg:3952967
 2:09 pm on Jul 15, 2009 (gmt 0)

Yes, it has definitely started again, now bing bombs all of my sites with faked search queries.

GaryK




msg:3953069
 4:51 pm on Jul 15, 2009 (gmt 0)

I did a mid-week log files analysis. Something I rarely do. But I wanted to check for bing.

36 of the 39 new user agents I found are all made-up IE UAs. And all of them contain the same kind of fake search queries Thomas cited above. Except, and this is a big except:

Every single one is coming from a Qwest data center. And for whatever reason, Qwest, is contained within the UA string.

---

Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; GTB6; .NET CLR 1.1.4322; .NET CLR 2.0.50727; Qwest 1.0; OfficeLiveConnector.1.3; OfficeLivePatch.0.0; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; Qwest 1.0; MSN 9.0;MSN 9.1;MSN 9.6; MSNbQ002; MSN
97.117.86.110
97-117-86-110.slkc.qwest.net

[bing.com...]

---

I'm not sure what to make of this? An attempt to get around blocking perhaps? Cause it worked really well!

ADDED: I forgot to mention. The lack of a closing parenthesis in the UA is not my editing mistake. That's how it was in the log files.

wilderness




msg:3953093
 5:20 pm on Jul 15, 2009 (gmt 0)

Gary,
I've had some recent and unusual activity from the same Qwest (.slkc.), however the activity was on topic searches and the visitor utilized Google, rather than Bing.
It was also a different Class B.
Over a few days the same UA utilized multiple Class C's of the same Class B. (I have a widget visitor from this area (.slkc.) that is quite resourceful in his "access attempts").

I suspect the .slkc. is just a relay station (terminology?) similar to what some of the other providers (i. e., RR and some Verizon) use that do NOT provide designated sub-net assignments.

Qwest traffic for my sites has always been a real PITA. At one time, had every one of their ranges that I could locate denied.

Don

jdMorgan




msg:3953148
 6:06 pm on Jul 15, 2009 (gmt 0)

This is the "MSN Optimized" custom MSIE version (or IE add-on?) that corrupts the UA string, leading to the missing closing parenthese and missing spaces between the "MSN" versions numbers -- and note that just like .NET CLR, they just keep adding MSN versions rather than replacing the old version numbers. It's daft.

Donning my tinfoil hat, I'd say it might be a long-term plot to overshoot the default LimitRequestFieldsize and LimitRequestLine settings on Apache servers... I figure another seven years, and the MSIE/.NET CLR UA string will be too long for the default Apache input buffer size (8190 bytes) due to all of the accumulated .NET CLR and "MSN Optimized" updates. ;)

The Quest substring may be another customized MSIE version, provided by Qwest to its subscribers (Qwest is the former Mountain States Telephone Company, plus Northwestern and Pacific Northwest Bell). From the UA token order, it's likely that "Qwest IE" was installed from a Quest subscriber setup CD, and then "MSN Optimized" was later installed on top of that (after several additional .NET CLR updates).

Jim

GaryK




msg:3955020
 6:10 pm on Jul 18, 2009 (gmt 0)

There's a piece (I'm not sure of what) on TechCrunch today referencing a article on Microsoft's Bing Community blog.

We do not anticipate any problems related to our increasing emphasis on MSNBot 2, but the unexpected canít always be avoided, no matter how hard you try! As such, we wanted to preemptively alert folks to the most effective way to report bot and crawling issues to Bingís support team in case they arise.

Um, I think we've know about this issue and others for several months now.

incrediBILL




msg:3955229
 6:51 am on Jul 19, 2009 (gmt 0)

I'd say it might be a long-term plot to overshoot the default LimitRequestFieldsize and LimitRequestLine settings on Apache servers...

Unless a rather large group of cranky webmasters starts punting those UAs to the curb until MS fixes the problem.

Bing surely doesn't want bad PR? ;)

Solution1




msg:3957201
 9:42 am on Jul 22, 2009 (gmt 0)

I found that mod_security has been blocking Microsoft IP's, like 65.55.108.233, 65.55.108.234 and 65.55.108.235.

Reason why: a user agent of "Mozilla/4.0".

That's a good reason, in my opinion.

Pfui




msg:3959107
 5:46 pm on Jul 24, 2009 (gmt 0)

Aside to @jdTinfoil:)

FWIW, I keep a little stash of the longest UAs I see. Here's the hands-down winner, weighing in at a whopping 418 characters and spaces:

Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; SIMBAR={4FF12884-DA94-11DD-9A55-0030052ADD26}; GTB5; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; Embedded Web Browser from: [bsalsa.com...] (R1 1.6); InfoPath.2; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; .NET CLR 1.1.4322; .NET CLR 3.0.04506.30; OfficeLiveConnector.1.3; OfficeLivePatch.1.3; MSN OptimizedIE8;DEDE)

That came courtesy of a .de asking for -- wait for it -- /(null [webmasterworld.com])

Signed, Cranky Webmaster

GaryK




msg:3959827
 4:53 pm on Jul 26, 2009 (gmt 0)

Once again, out of nearly 40 new user agents this week, all but about 10 of them were related to Microsoft and their persistent use of fake referrers. I guess I don't understand why they need to create a new user agent for every visit from their spambot. Does anyone have a suggestion for me, please.

Pfui, my top five longest UAs are: 896, 645, 587, 512, 480. Here's the top three:

896 characters:
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; GTB5; .NET CLR 1.1.4322; .NET CLR 2.0.50727; AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

AAAAAABBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBCCCCCCCCCCCCCCC

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC

CCCCCCCCCCCCCCCCCCCCCCCC; BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBCCCCCCCCCCCCCCCCCC

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDD

DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD;
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC

CCCCCCCCCCCCCCCCCCCCCC; AAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBB;
ASDFSDFFFFFFFFFFFFFFF)

645 characters:
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB5; SearchSystem8804412899; SearchSystem8926915604; SearchSystem5551447122; SearchSystem9881314576; SearchSystem3047846404; SearchSystem8500999806; SearchSystem2898910852; SearchSystem7960553825; SearchSystem9320878896; InfoPath.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; OfficeLiveConnector.1.3; OfficeLivePatch.0.0; .NET CLR 3.0.04506.648; SearchSystem8804412899; SearchSystem8926915604; SearchSystem5551447122; SearchSystem9881314576; SearchSystem3047846404; SearchSystem8500999806; SearchSystem2898910852; SearchSystem7960553825; SearchSystem9320878896)

587 characters:
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Mozilla/4.0(Compatible Mozilla/4.0(Compatible-EmbeddedWB 14.59 [bsalsa.com...] EmbeddedWB- 14.59 from: [bsalsa.com...] ; Mozilla/4.0(Compatible Mozilla/4.0EmbeddedWB- 14.59 from: [bsalsa.com...] ; .NET CLR 1.1.4322; .NET CLR 2.0.50727; aff-kingsoft-ciba; staticlogin:product=cbpro09&act=login&info=ZmlsZW5hbWU9UG93ZXJ3b3JkMjAwOVByby4yNTI2OS4
0MDEwLmV4ZSZtYWM9QkI0OTc0RDJCNTFGNDUzNjkzQUI4MTE
xNkQ0MENDQzgmcGFzc3BvcnQ9JnZlcnNpb249MjAwOS4wNS4y
NS4yLjI4MyZjcmFzaHR5cGU9MQ==&verify=a3510b6c7b3fbc63bdee621ecfce0c5d; MAXTHON 2.0)

All three crashed my analysis program as they overflowed the max column width for user_agent! It's now set at varchar(1000)!

:)

jdMorgan




msg:3959885
 6:41 pm on Jul 26, 2009 (gmt 0)

I toss user-agents at 768 characters. That's enough. Anything more, and it's either abuse or a visitor too stupid to say "no" to every free toolbar and screensaver/adware program on the planet -- and if so, I don't want their business.

As for MS's dozens of unidentified no-rDNS user-agents, I don't want them either. I may "pay for this" later, but at this point I really don't care any more -- They're simply too "high-maintenance" for anyone who runs a whitelist+rDNS-based access control system, and their 'bots have historically had too many problems for me to have much enthusiasm for them anymore. MS's "tide" agents don't even spoof their own MSIE browsers properly; They either add or omit spaces, or send the wrong HTTP-Accept headers, or put info for one header (or even another header's name!) into another header. These various "development" robots show a stunning lack of quality checking prior to being unleashed on the world. It is quite evident that either they have no internal 'standards' for Web agents or that if they do, there is very little to no standards-compliance testing.

Too bad for them, 'cause MSN/Live/Bing (and its users) will miss the good content that necessitated all my anti-scraper protection in the first place. Bing, bung, plonk... Have a 403.

On the UA-string length, I got one yesterday from an outfit that doesn't deserve any publicity that weighed in at 774 characters. That may not be the longest one I've ever received, but it was still a pretty nice catch -- I'm considering taking it to the taxidermist to have it mounted so I can hang it over my fireplace.

Jim

GaryK




msg:3959887
 6:50 pm on Jul 26, 2009 (gmt 0)

I'm considering taking it to the taxidermist to have it mounted so I can hang it over my fireplace.

On one of my sites we have a TOS that requires photographic proof of such extraordinary claims! :)

whitelist+rDNS-based access control system

The problem I have with using rDNS in real-time is some bots, like the MS ones, simply hit my sites too quickly and too often. And as a result, repeated rDNS lookups would, I think, bog things down worse than just letting them have at it.

posts:20804

You really need a life, Jim! ;) The only site I have that kind of post count on is my main money site that's been going since 1998.

This 131 message thread spans 5 pages: < < 131 ( 1 2 [3] 4 5 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved