homepage Welcome to WebmasterWorld Guest from 54.166.96.101
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

This 42 message thread spans 2 pages: 42 ( [1] 2 > >     
MSNBot has become a constant Fast-Scraper
7 IPs crawling at max 12 pages / sec - this is out of order
AlexK

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4401159 posted 2:22 pm on Dec 24, 2011 (gmt 0)

My site has an auto-stop, block, report system; most effective for stopping script kiddies operating abusive bots. For a number of weeks now, the MSNBot has been caught in it's net. Finally, I've got sick of it, and am also reporting it here:


Date of Abuse...... IP........... Rate
------------------- ------------- -----------------
2011-12-23 09:25:01 65.52.108.146 [forums.modem-help.co.uk] 12 pages / second
2011-12-23 01:05:57 157.55.16.219 [forums.modem-help.co.uk] 11 pages / second
2011-12-23 00:52:13 157.55.18.9 [forums.modem-help.co.uk] ..11 pages / second
2011-12-22 18:18:59 207.46.13.212 [forums.modem-help.co.uk] .7 pages / second
2011-12-22 04:02:38 207.46.195.240 [forums.modem-help.co.uk] 7 pages / second
2011-12-22 00:18:28 65.52.110.200 [forums.modem-help.co.uk] .3 pages / second
2011-12-21 04:32:16 65.52.104.26 [forums.modem-help.co.uk] ..9 pages / second


The ASN for each IP above is AS8075 [cidr-report.org] (MICROSOFT). Each event above has been auto-reported daily to the relevant abuse email address; naturally, no action. Each link above shows the relevant abusive activity from that IP.

Each IP caught in abusive activity gets banned from my site for a week, with a notice explaining why. At the moment, at the end of that week the MSNBot IP takes up it's abusive behaviour all over again, gets stopped by the Stop-Abuse routines, reported & banned.

This abusive behaviour first began on June 20 this year and was reported in this forum [webmasterworld.com]. It continued until July [webmasterworld.com] then, thanks to a WebmasterWorld member with MS contacts, it stopped. All was then quiet until a month or so back, when it started all over again.

For the record, the odd Google IP & Yahoo! IP has occasionally got caught up in this net. However, nothing like the extent of MSNBot IPs, which I think can now be classified as endemic abuse.

 

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4401159 posted 1:39 pm on Dec 25, 2011 (gmt 0)

Yesterday's logs show MSN indeed scraping all files: html, css, js, php & images.

IP:207.46.193.53
UA: Mozilla/4.0 (compatible

Notice the missing last parenthesis, so all of us that were blocking "Mozilla/4.0 (compatible)" or "Mozilla/4.0 (compatible;)" are letting it through.

Hmmm, I wonder if they read some of my posts a few weeks ago?

AlexK

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4401159 posted 3:11 pm on Dec 25, 2011 (gmt 0)

AlexK:
the odd Google IP & Yahoo! IP has occasionally got caught up in this net

What good timing!

2011-12-25 04:59:01 :: 66.249.71.26 [forums.modem-help.co.uk] :: max 3 / sec

The above is a GoogleBot; the IP is on ASN AS15169 [cidr-report.org] (GOOGLE). To keep this report balanced, it does seem that Google respond quickly to the abuse reports (though not to me), as their bots seldom re-offend. However, and also in the name of balance, Google is the only search-engine to currently feature within the Top-50 Bad Hosts maintained by HostExploit [hostexploit.com] (Q3 2011 [hostexploit.com] position is #30) (Yahoo! were #33 in Q1 2011, but have since dropped off the listing).

AlexK

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4401159 posted 12:44 pm on Dec 26, 2011 (gmt 0)

2 fresh MSNBot IPs committing abuse:

Date of Abuse...... IP................................. Rate
------------------- ------------- --------------------- -----------------
2011-12-25 16:12:09 65.52.109.194 [forums.modem-help.co.uk] .8 pages / second
2011-12-25 15:35:55 157.55.38.162 [forums.modem-help.co.uk] .4 pages / second


That is now 9 MSN IPs committing abuse upon my site. What chances that Microsoft will apologise for this behaviour?

Staffa

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4401159 posted 3:54 pm on Dec 26, 2011 (gmt 0)

chances that Microsoft will apologise

Wanna bet, none

A couple of questions though :
- is it the Bingbot UA that they are crawling with
- do you have a 'Crawl-delay: n' in your robots.txt

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4401159 posted 5:34 pm on Dec 26, 2011 (gmt 0)

is it the Bingbot UA that they are crawling with

Probably not. As noted elsewhere in this Forum, the msnbot seems to like going out in street clothes. But the IP is kinda hard to disguise.

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4401159 posted 7:04 pm on Dec 26, 2011 (gmt 0)

Same Q as Staffa re UA: AlexK, what UA(s), please? And if more than one, which is/are the worst?

AlexK

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4401159 posted 9:53 pm on Dec 26, 2011 (gmt 0)

Hi Pfui.

The site is never interested in the UAs; too easily forged. You cannot forge a IP. As to the worst, investigate each IP at the links above. The report gives both detail & summary. Decide whether you consider rate of scrape or number of pages to be the most important. Note that these are attempted pages (not style, script, images, etc.).

There used to be a Crawl-Delay. Not now. The SEs just ignored it, so I abandoned it.

For those that want it, here are sample access_log entries (pages anonymised):

65.52.108.146 - - [25/Dec/2011:03:01:07 +0000] "GET /page.php HTTP/1.1" 403 546 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" In:833 Out:528:63pct. "-"
157.55.16.219 - - [23/Dec/2011:20:11:46 +0000] "GET /page.php HTTP/1.1" 403 546 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" In:833 Out:528:63pct. "-"
157.55.18.9 - - [25/Dec/2011:04:02:26 +0000] "GET /page.php HTTP/1.1" 403 544 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" In:829 Out:526:63pct. "-"
207.46.13.212 - - [25/Dec/2011:04:03:28 +0000] "GET /page.php HTTP/1.1" 403 546 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" In:833 Out:528:63pct. "-"
207.46.195.240 - - [22/Dec/2011:09:16:55 +0000] "GET /page.php HTTP/1.1" 403 547 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" In:835 Out:529:63pct. "-"
65.52.110.200 - - [22/Dec/2011:01:59:12 +0000] "GET /page.php HTTP/1.1" 403 1346 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" In:2749 Out:1328:48pct. "-"
65.52.104.26 - - [22/Dec/2011:16:04:29 +0000] "GET /page.php HTTP/1.1" 403 545 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" In:831 Out:527:63pct. "-"
65.52.109.194 - - [26/Dec/2011:21:25:23 +0000] "GET /page.php HTTP/1.1" 403 547 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" In:833 Out:529:63pct. "-"
157.55.38.162 - - [25/Dec/2011:15:35:55 +0000] "GET /page.php HTTP/1.1" 503 170 "-" "msnbot/0.01 (+http://search.msn.com/msnbot.htm)" In:- Out:-:-pct. "-"

I then used the following:

egrep -c '^157.55.38.162' /var/log/httpd/access* | awk -F: '{SUM+=$2}END{print SUM}'
(total pages taken/attempted from all sites in 1 month)

...and got this:

65.52.108.146 :: 8409
157.55.16.219 :: 17107
157.55.18.9 :: 5799
207.46.13.212 :: 9949
207.46.195.240 :: 5611
65.52.110.200 :: 2579
65.52.104.26 :: 6322
65.52.109.194 :: 2901
157.55.38.162 :: 125

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4401159 posted 12:47 am on Dec 27, 2011 (gmt 0)

Thank you for the additional details, Alex. For those of us that do note/block UAs (and/or constrain UAs to IPs and vice-versa, as I do), your seeing mostly this --

Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)

-- and this oddity (which generated a 503) --

msnbot/0.01 (+http://search.msn.com/msnbot.htm)

-- but, for example, not these --

Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Win64; x64; Trident/4.0)
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0; WOW64; Trident/5.0)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)
msnbot/2.0b (+http://search.msn.com/msnbot.htm)._

-- is all helpful info.

Too bad MSN's bad acts go so far, and so far back. [webmasterworld.com...] Here's hoping they respond to your complaints. (Aside: They have to mine recently, filed through Bing Webmaster Tools, vis-a-vis significant indexing errors. But weeks along, things remain unresolved and wrong.)

AlexK

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4401159 posted 1:46 pm on Dec 27, 2011 (gmt 0)

Another day, and a 10th MSNBot IP committing abuse upon my site:

2011-12-26 05:47:33 :: 207.46.13.144 [forums.modem-help.co.uk] :: max 12 pages / second


@Pfui:
Do not pay too much attention to some IPs getting stopped with a 403, and some with a 503. In brief, it works like this:

There are two tests:
1) Fast scraper (>= 3 page / sec)
2) Slow scraper (forums only; > 50 pages in 1 hour)

Only fast scrapers are reported.
Once spotted, a fast scraper is given a 403 block.
If a fast scrape continues for long enough, it will be caught first by the slow-scrape routine & given a 503.

bingdude



 
Msg#: 4401159 posted 8:42 pm on Dec 28, 2011 (gmt 0)

AlexK,

Have you used the Bing Webmaster Tools specifically designed to allow you to control the rate at which we crawl your website?

You can spec a lower rate whenever you choose through the custom tool.

Rosalind

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4401159 posted 8:56 pm on Dec 28, 2011 (gmt 0)

Have you used the Bing Webmaster Tools specifically designed to allow you to control the rate at which we crawl your website?

That's what Crawl-Delay is for. And what's better, webmasters don't have to sign up to your proprietary site to be able to use it. Imagine the chaos if every SE had its own webmaster section that people had to have heard of in order to sign up to in order to fix simple stuff like Crawl-Delay!

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4401159 posted 9:50 pm on Dec 28, 2011 (gmt 0)

I haven't seen anything exceptional from bingbot BUT I have seen a lot of fake bingbots this past week - ie coming from IPs that do not belong to MS.

I appreciate all the IPs listed in this thread are MS but it's worth checking.

On the upside of this: I used to get a lot of fake googlebots but not so much now. Getting fake bingbots suggests that hackers are switching away from google? That would be a good indication. :)

Bingdude - nice to see you posting here at last! :)

AlexK

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4401159 posted 10:59 pm on Dec 28, 2011 (gmt 0)

@bingdude:
Thanks for taking the time to post, but your reply is wrong in so many ways (a couple of which Rosalind has already pointed out).

A basic question to you: do you consider 12 pages / second to be a reasonable rate for BingBot to crawl a site? The implication of your question is that, if I want to stop your bots hitting my site at that rate, that I need to use BWT to limit them. I think that you will find that every WebMaster on this planet will disagree with you.

bingdude, your bots are currently committing daily abuse upon my site. Stop it sir, and do so now.

tangor

WebmasterWorld Senior Member tangor us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4401159 posted 12:14 am on Dec 29, 2011 (gmt 0)

Whew! There's a fine line between having a site indexed, or crying abuse because that index is too "abusive". Not sure where that number is... 12 pages/sec... then done, or 12 days with no index... :(

Most of my sites are in the 2k page and under, so this is not a problem for me. I can see how it might affect significantly larger sites.

I do echo agreement that Crawl-Delay should be routinely supported without any requirement to sign up. Let's not fix something that shouldn't be broken (if properly implemented).

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4401159 posted 12:44 am on Dec 29, 2011 (gmt 0)

Rather than measuring a nebulous pages per second rate without reference to size of pages or total number of pages on the site, nor for how long this rate was sustained and how often it was repeated, perhaps those and some other units of measure may be more useful, both by individual IP and by each searchengine?

How much bandwidth usage per searchengine per month is acceptable, how long between succesive pulls of the same page, and what percentage of the overall bandwidth is used by each search engine?

Quantify the access in these other terms then see if the picture looks better... or even worse.

AlexK

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4401159 posted 1:48 am on Dec 29, 2011 (gmt 0)

tangor, I do not think that you have thought this through. I've been stopping abuse from bots for several years now, and reporting it for 18 months. Here are some recent facts from the last year for you to consider:

30/40/50+ bots trying to scrape my site each day used to be typical. That has finally dropped dramatically - Dec 28 was just 7 bots, and *that* is now typical (human hits remain unchanged).

The fastest bot caught was Technicolor [forums.modem-help.co.uk], at 403 pages / second. Imagine if that started happening on your site 50 times a day; consider also that fibre & Gigabit networks are rolling out daily - it is not quite so unlikely a prospect as we may think.

The worst in terms of volume was a BigPond bot [forums.modem-help.co.uk]. It tried to take 137,836 pages before it finally stopped. How long before that sucks all your (supposedly) unlimited bandwidth if 50 of those hit you each day?

The average number of pages stopped by my site's routines across 18 months is 4,500 pages each day (now dropping).

According to AWStats, last November humans took 74.16 GB from my site, whilst bots took 818.49 GB. Please allow me to remind you that the Modem-Help site has defence in depth against abusive scrapers: the top-25 worst ASNs are blocked at the firewall, and both fast- & slow-scrapers get blocked before they can barely start. Yet, bots took 11 times as much bandwidth as humans in October & November.

My site barely creeps into being 'medium-size' in terms of visits. With all due respect to yourself, I would suggest that there are blinkers on your vision that you would do well to drop. This is far more of an issue than you seem to consider it to be.

It is bad enough having to cope with script-kiddies, bored corporate desk-jockeys & spam-criminals trying to download my million-page site each day. To then have to factor in supposedly reputable Search-Engines as equivalent abusers just takes the biscuit.

tangor

WebmasterWorld Senior Member tangor us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4401159 posted 5:23 am on Dec 29, 2011 (gmt 0)

tangor, I do not think that you have thought this through. I've been stopping abuse from bots for several years now, and reporting it for 18 months. Here are some recent facts from the last year for you to consider:


Granted. I got serious about this sometime around 2000. That said, Your HURT is apparently different than mine. I certainly don't have 130k plus pages to be sucked (see above). But then again, the Bing suck only happens on a routine basis and all my pages are indexed in short order and is done. Different strokes, and no heart strokes for me. As always YMMV.

My comment was specific: Be indexed or NOT... your choice by throttle, .htaccess, or Crawl-Delay. No skin off my nose. :)

Let's not make this a Your Hurt is bigger than Mine. I promise my "hurt" is way less! I only allow FIVE (5) bots (white list). Period. Thus all those others mentioned never bothered me. (I have thought this out) Anything else is nuked with prejudice. I don't make any attempt to coddle any bots other than my select few in hopes of possible "human" traffic most likely from countries which have no valuation (conversions). Hence my Monthly Bandwidth is well within control, etc. etc. etc.

You pick and choose your battles. I start with Best Defense First (what I allow), then relax IF THERE IS A REASON TO DO SO. That has happened only once... and lasted a year then was shut down.

Bing is not a problem FOR ME... that's where my traffic is coming from these days. Again, pick and choose... but I do feel your pain. What to block? What to block? What to...

AlexK

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4401159 posted 6:15 am on Dec 29, 2011 (gmt 0)

@tangor:
Yes, I am prone to over-reaction & shooting from the hip. I should be American! I constantly do my very best to offset it.

I'm interested that your traffic mostly comes from Bing. Microsoft has always given the least traffic of all the major SEs for me, and I could never work out why.

Most of my response was to your declaration of a `12 pages/sec' index being OK, and even apparently belittling the idea that to crawl at that rate may be abusive. I therefore thought that I should add a little more substantia to my claims, and at the same time highlight an on-coming issue that I've seen little commented-upon elsewhere.

@g1smd:
Do you think `nebulous' when checking your site load? Or reading the time from your watch? It's not the word that I would apply to an auto-calculated hit-rate. And I do appreciate that you are suggesting other measures are also needed. The problem is that much of what you ask for is on the pages previously linked via their IPs.

The issue here is that there needs to be some method of ID-ing an abusive scraper. Hit-rate is the simplest & easiest, and also accurate. Once identified as abusive, that IP no longer gets any site pages. Only a 403 (or 503), with a short explanation.

AlexK

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4401159 posted 6:43 am on Dec 29, 2011 (gmt 0)

Oh good lord. Now an 11th MSN IP committing abuse:

2011-12-29 04:22:22 :: 65.52.109.152 [forums.modem-help.co.uk] :: max 5 pages / second

tangor

WebmasterWorld Senior Member tangor us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4401159 posted 6:53 am on Dec 29, 2011 (gmt 0)

Most of my response was to your declaration of a `12 pages/sec' index being OK, and even apparently belittling the idea that to crawl at that rate may be abusive. I therefore thought that I should add a little more substantia to my claims, and at the same time highlight an on-coming issue that I've seen little commented-upon elsewhere.

Works for me. As I said, I allow Bing. If Bing romps at 12p/s and gets out of the way, that okay with me. After all, I let them in. So, It's all okay with me.

This solves the immediate problem (if there really is one):

Disallow: Bing

Or one can live with it. The log file listed above indicates you're returning a 403 which is about 500ish BYTES -- which can't get anywhere NEAR 818+GB. Sorry, this seems like a tempest in a teapot as regards Bing. Since Bing is the next best thing to Google what's the problem? Please, let us compare apples to apples.

AlexK

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4401159 posted 6:49 pm on Dec 29, 2011 (gmt 0)

Well, OK tangor, it is your site after all, and if you are cool with an SE scraping your site at 12p/s that is your right. The thought that all SEs will take your attitude as the green light to employ such behaviour worldwide makes me shudder, but there you are.

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4401159 posted 7:47 pm on Dec 29, 2011 (gmt 0)

I have always been under the impression that the default rate for all "reasonable" bots was a maximum rate of 2 pages/second. Depending on whether it grabs images etc and how many of those per page the max bot hit-rate for a site can obviously exceed that but the page rate should not.

That is not to say the crawl rate cannot be upped by crawl-rate directives if YOU wish.

Staffa

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4401159 posted 9:02 pm on Dec 29, 2011 (gmt 0)

The thought that all SEs will take your attitude as the green light to employ such behaviour worldwide makes me shudder


It's rather you who are giving the SEs the green light by not having a 'crawl-rate' in your robots.txt. Since there are no restrictions they are given the green light to crawl at any rate they wish.

Put a crawl-rate in your robots.txt file and any IP of SEs that do not conform redirect them to said file for a number of hours, it obviously needs time to memorize the content.

It works for others, it may work for Bing as well.

AlexK

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4401159 posted 10:32 pm on Dec 29, 2011 (gmt 0)

Staffa:
you who are giving the SEs the green light by not having a 'crawl-rate' in your robots.txt

I wish that it were true, Staffa. It was there for all my sites for many years. They ignored it. In the end, I gave up & removed it. No point in a directive that not a single SE--including MSN, who originated it--followed.

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4401159 posted 11:04 pm on Dec 29, 2011 (gmt 0)

It [crawl-rate directive] was there for all my sites for many years. They ignored it. In the end, I gave up & removed it. No point in a directive that not a single SE--including MSN, who originated it--followed.


I agree. I also removed it since not one SE followed it. And even though it was correctly written and passed validation, it was cited as the reason why one bot was not following the denied files list.

There's a significant issue with SE's and bot runners respecting the property of webmasters. Sadly, I used to trust people a lot more prior to working on the internet.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4401159 posted 11:08 pm on Dec 29, 2011 (gmt 0)

I have always assumed that the Crawl-Delay parameter applied per-IP and not per-searchengine.

I can't see how automous processes running on different servers, posssibly in different continents, could ever hope to co-ordinate efforts to obey a per-searchengine limit.

That said, if individual IPs are exceeding specified limits, then there's a fundamental flaw.

AlexK

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4401159 posted 1:26 am on Dec 30, 2011 (gmt 0)

g1smd:
if individual IPs are exceeding specified limits

That has come to be the topic of this thread.

The point is that there are *no* specified limits. `Crawl-Delay' is now null & void; I think that, in hindsight, it was a good idea. But the bots ignored it, so it has fallen into the bin-bucket.

My site is built for humans, not bots. I do not mind bots crawling it if they behave themselves, and it helps if there is a payback for the site. At the moment, that is true only for Google. In terms of payback, all the rest are a waste of time on my site. In spite of that, there are no restrictions if they behave themselves.

So, the site is built for humans. No human can view 3 pages / second. No human can even obtain 3p/s without software assistance, in which case they become a bot. Hence, the trip parameter for abuse is set at 3p/s, and that seems reasonable to me as a base parameter for behaviour when browsing my site. But is it accurate?

tangor has stated an attitude to this, and I understand it totally at one level. Without the SEs my income is toast, and I'm sure that that is true for many. My site needs worldwide exposure, and I cannot summon the marketing finances at that level. Even WebmasterWorld has been forced to co-exist with the bots, and I'm certain that most sites are in the same position. The question then comes as to the nature of that co-existence. Are we to say: "sure honey; anything you like", or are there acceptable limits to behaviour? And if so, what are they?

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4401159 posted 1:41 am on Dec 30, 2011 (gmt 0)

I sometimes open slightly more than three pages per second in my browser when I first visit a site or new section of a site.

My browsing style is to first open all of the stuff I am interested in, in a series of tabs, and then read all of the open tabs, replying and closing as I go.

AlexK

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4401159 posted 2:24 pm on Dec 30, 2011 (gmt 0)

@g1smd:
That is why the routines look for extended fast-scraping across a period of seconds before blocking. (let me check the algorithm) ...on my forums, if you open more than 14 pages across 7 seconds you get blocked.

This 42 message thread spans 2 pages: 42 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved