Forum Moderators: open

Message Too Old, No Replies

spider or scraper?

based on IP, how can you get the REAL STORY?

         

arnarn

4:17 am on May 7, 2006 (gmt 0)

10+ Year Member



We just logged IP access (no name available via DNS) from 66.90.95.[41 - 70], 66.90.96.[207-254].

Each IP address read about 15-20 pages and they had a major impact on our server.

Browser info is "randomly" different and no referrer info and no info regarding it being a bot.

I've found only a few references from other forums where they banned the "C" class, but did not go into detail on what was going on.

It would be greatly appreciated if someone could identify this range of IPs as a spider or something to be blocked.

wilderness

9:26 pm on May 7, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've found only a few references from other forums where they banned the "C" class, but did not go into detail on what was going on.

colocator backbone. could be anybody.

arnarn

2:08 am on May 8, 2006 (gmt 0)

10+ Year Member



One note on my initial post.. there was a "type-o" on one of the IP ranges. They all came from 66.90.95.[subranges].

I did some more checking and came across a DNS entry on one of the IP addresses and came up with 244.95.90.66.in-addr.arpa IN NS ns1.bsd.forbrazil.com.br

Things are really wierd with these accesses, like they are deliberately trying not to be open on who they are.

Would anybody recommend blocking 66.90.95.* , or do you think the same thing will happen repeatedly with new sets of IP addresses?

Any other recommendations on how to procede?

wilderness

2:20 am on May 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have the entire backbone range denied:

RewriteCond %{REMOTE_ADDR} ^66\.90\.(6[4-9]¦[7-9][0-9]¦1[01][0-9]¦12[0-7])\. [OR]

larryhatch

2:32 am on May 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I just checked my access_log for Saturday-Sunday and
66.90.95.* slammed me hard. Whole site, maybe twice.
Downloaded all .html and .txt files but no images.
'*' stands for any number, constantly changing as if trying to hide itself.
Likewise, rotating user agents (if thats the word),
various versions of FF, Opera, MSIE and so forth like its trying to mimic natural traffic.
No success there, it sticks out like a sore thumb.

Does anybody know who is doing this, how and/or why?
I don't see the point in it unless someone is trolling for email addresses or the like.

Just what is a 'colocator' in this context? -Larry

wilderness

3:53 am on May 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just what is a 'colocator' in this context?

[en.wikipedia.org...]

Larry,
99.9999 % of my sites visitors from these types of addresses have been unidentified crawls.

I have a vague recollection of one of the major search engines using one for a short while. (Wouldn't surprise me if it was MSN as they have a history of some questionable crawls without IDing themselves. Although I did NOT search my documentation.)

For the most in Forum-11 (SSID) it's generally not even a good idea to discuss these services or even mention their crawls!
All it does is provide the outfit with free advertising.

Don

arnarn

4:09 am on May 8, 2006 (gmt 0)

10+ Year Member



Don/Wilderness..

so, am I to understand that we should not talk about these things in this forum?

I thought that was what WebmasterWorld forums were for.

Why even have a forum on Google search engine? Who knows, the bad guys might find out something bad and use it to their advantage.

Just seems a little bit too controlling from my perspective as one being impacted by such activities and nobody wanting to talk about it?

My feeling is the bad guys will always find ways, often because the "good" guys aren't informed.

-

larryhatch

4:09 am on May 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks Don: I read the wiki article, but I'm still a bit confused about this.

Whoever is using these services is obviously trying to mask their identity.
Could this be a way for the SEs to uncover cloaked pages?
I don't see much other reason for a legitimate SE to hide like that.
If not SEs, then who?
I don't see the point unless somebody is trolling for email addresses or whatever. -Larry

wilderness

5:06 am on May 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



so, am I to understand that we should not talk about these things in this forum?

arnarn,
I don't have any affiliation with the administration of this forum.

This is the only forum that I participate in at Webmaster World and have for at least five years (save the less than one year period the forum was not active.)

Direct advertising or promotion is against the forum CHARTER.

Not having a desire to promote a host (even by mention of name) whose only purpose is providing a service that is detrimental to websmasters is my own choice as well as some longtime particpants in this forum.

Don

wilderness

5:12 am on May 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If not SEs, then who?
I don't see the point unless somebody is trolling for email addresses or whatever.

Larry,
the reason or intent is really insignificant to me personally.

What matters is the the spider is NOT identiying themselves in an acceptable method (providing a URL in the UA)and as a result ANY webmaster may only assume the purpose of the visit to be an anonymous harvest of my (or your) pages.

Many times, we as webamsters will never have insight to a clear reason of harvest.
Only that harvest has either occurred or is occurring.

Providing materials to unidentifed harvesters is never what I intended the materials on my web pages to be utilzed for.

Don

volatilegx

3:00 pm on May 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> so, am I to understand that we should not talk about these things in this forum?

This topic is perfectly OK to discuss here. Don just doesn't like to mention names of scrapers because he doesn't want to inadvertantly give them any free press :)

wilderness

4:58 pm on May 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



he doesn't want to inadvertantly give them any free press

I come by it naurally Dan?
I was employed for ten years by a newspaper whose name was a "city" followed by Free Press ;)

Don