Forum Moderators: open
Anybody know anything about this?
Thank you.
Guest IP: 00.00.000.00 â Whois
Charlotte/0.05 Index page Sat Mar 07, 2009 5:48 am
I'll leave the IP off for now. You see, this looks different from what's discussed in the thread you referred me to.
Just a new face of the same crawler, maybe?
I'm going to ban the IPs and see what happens.
See if it comes back at me with new ones.
Anyway, can't be up to any good, right?
I can always unban the IPs if I find out it's a charity organization and all sweet stuff.
But I do appreciate the help on an identification. Nobody else seems to know anything about this.
71.170.242.nnn
99.6.235.nnn
65.67.112.nnn
216.84.45.nnn
99.186.215.nnn
66.25.28.nnn
74.81.199.nnn
66.25.8.nnn
65.69.153.nnn
Admittedly, I'm no expert, but I don't ever recall seeing a "user agent" farming a website using assorted ISPs. Off and on we had 8 of those on at the same time. That would either be one very sophisticated bot program, or very low tech manual work. No?
What concerns me is whether this was specifically targeted at only my site. If so, then I won't be getting feedback from anyone else having seen this happening on their site.
Why would somebody target me specifically? Good question. Maybe it's love.
But it seems odd, and when it comes to security "odd" worries me.
Some of the scrapers or spam email harvesters uses a wide range of user agents and rotate them randomly attempting to find one that gets past your security.
However, you could also be seeing a very new crawler that's too naive to know any better than crawl through proxies all over the place, this technique is known as proxy hijacking where you claim another sites content by redirection.
That's from two sites that I checked these addresses on. I'll go further, if needed, but I'd like to know if "proxy hijacking" can be accomplished from addresses that are not proxies?
I can provide my full list of the results from the two sites for each address, if the rules here allow it.
By the way, <incrediBILL>, I've been studying your posts from back in 2007 and since and some other sites, and the more I read the more confused I get about the actual definition of "proxy" in the expression "proxy hijacking" so please allow me to apologize for not yet getting the picture clear in my mind and asking what may seem to you to be stupid questions. I am trying.
Here's a crawl example:
Googlebot -> exampleproxy.com -> examplesite.com
Googlebot asks for a page via exampleproxy.com via some link like:
exampleproxy.com/blah/examplesite.com/index.html
So Googlebot has now crawled a page from examplesite.com using exampleproxy.com to deliver that page, and now exampleproxy.com *MAY* be credited as the source of the page instead of it's rightful owner examplesite.com
Does that make sense?
It doesn't happen often anymore, at least not that I'm seeing in Google or Live.
Google [Bot] IP: 66.249.66.106 » Whois
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) Reading topic in Politics Wed Mar 11, 2009 12:39 am
I am afraid I don't understand where the "exampleproxy.com" is located in that information above.
In fact, since we started keeping records a month or so ago I think my people have identified about 15 "user agents" that are listed on <honeypot> and <stopforumspam> as being problems.
I was just trying to match that list with the one you have as a sticky above and it seems we are doing something wrong. or something different.
But, back on-topic, it's that Charlotte bla.bla we're focusing on here. If the addresses aren't proxies, then is it still proxy hijacking? If not, what might we be seeing?
I was just wondering if anyone else had seen this so I could assure myself that it was not targeted only at my site.
I'm afraid I picked up a few "enemies" over the years while running another rather large site and that's why I wondered if somebody was up to more than just no good -- like real bad stuff.
I've got a post about this over on phpBB, so between here and there and keeping an eye out on some other known "we ID baddies" sites I'll see in a week or so if anyone else has seen this mystery gal named Charlotte. Hope I don't get in trouble before then.
My Moderation Team Leader informed me yesterday my site has been listed on some sort of blacklist, so I've got to figure that out next.
But like I indicated above, the more I seem to learn, the more questions seem to come up. It wouldn't be so bad if I didn't have brick-and-mortar work to deal with, as well.
But if you are really <icrediBILL> with super incredible powers, Bill, you could slow down the rotation of this planet and get us all a few more hours per day, right? Then you would be famous off the Net, as well.
But if you are really <icrediBILL> with super incredible powers, Bill, you could slow down the rotation of this planet and get us all a few more hours per day, right? Then you would be famous off the Net, as well.
LOL - My powers are the ability to work at home and avoid a corporate job which already gives me a few extra hours per day everyone else wastes getting ready for and going to/from that job, so technically I have a few more hours in my day.
About the crawler, I wouldn't be too concerned, I get literally hundreds of things like this daily and if I stopped to worry about them all I'd never get anything done whatsoever so I have firewalled my site to the best of my ability and if anything does get through, they're more determined than I am at that point.
FWIW, there is a very big internet "black market" that makes money leeching the content of others so if you're being targeted it means you're successful and probably a leader in your field as these people don't waste their time scraping losers.
FWIW, there is a very big internet "black market" that makes money leeching the content of others so if you're being targeted it means you're successful and probably a leader in your field as these people don't waste their time scraping losers.
In that case, there are some really stupid people out there, because I doubt my site is the leading anything. Down the road, maybe, but not now.
Anyway, I appreciate you taking the time to help me out here.
I am fairly certain these are coming from botnets. Most are from domestic IP blocks, some of which are running a primitive server of some kind, suggesting a proxy has been installed or (more likely?) someone was careless installing their OS.
Purpose: No idea. Possibly scraping content for use in hyping their trojanned sites but at least several of the targetted sites use querystrings so possibly the bots are looking for vulnerable servers.
Full reverse DNS that identifies the crawler like the big SEs:
crawl2.nat.svl.searchme.com.
The following UA:
"Mozilla/5.0 (compatible; Charlotte/1.1; [searchme.com ])"
Operating from the following location:
network:IP-Network:208.111.154.0/24
network:Auth-Area:208.111.128.0/18
network:Org-Name:Kavam, Inc.
Anything else should be treated as a spoof, fake, whatever you want to call it.
[edited by: incrediBILL at 11:30 pm (utc) on Mar. 15, 2009]
208.111.154.249
Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.1.11) Gecko/20080109 (Charlotte/0.9t; [searchme.com ])
robots.txt? NO
-----
204.62.53.36
Mozilla/5.0 (compatible; Charlotte/1.0t; [searchme.com ])
robots.txt? NO
OrgName: Searchme, Inc.
OrgID: KAVAM
Address: 800 W El Camino Real, Suite 100, Mountain View, CA 94040
NetRange: 204.62.52.0 - 204.62.55.255
CIDR: 204.62.52.0/22
-----
209.249.86.17
Mozilla/5.0 (compatible; [url]Charlotte/1.0b; [searchme.com...]
robots.txt? YES
209.249.86.210
Mozilla/5.0 (compatible; [url]Charlotte/1.0t; [searchme.com...]
robots.txt? NO
OrgName: Abovenet Communications, Inc
CustName: Kavam
Address: 1735 Lundy Ave, San Jose, CA 95131
NetRange: 209.249.86.0 - 209.249.86.255
CIDR: 209.249.86.0/24
[edited by: incrediBILL at 11:32 pm (utc) on Mar. 15, 2009]
[edit reason] fixed urls [/edit]
I'm wondering why some of the IPs you referenced aren't set up for reverse DNS yet?
Are those older IPs no longer in use as all the current ones hitting my server provide reverse DNS as I showed above.
2.) I can't really discuss the reverse DNS thing, sorry -- I'm a Web geek, not a Network geek:)
But I can tell you that on 02-16-09, the second IP I mentioned, 204.62.53.36, hit as a bare address. No hostname. The other address-only hits date back to 09-08 and I got their what-where data from WHO*S today. In between, I've seen host hits akin to your mention:
crawl1.nat.svl.searchme.com
crawl2.nat.svl.searchme.com
3.) Hmm. I wonder if the address-only hosts are sandboxes vis-a-vis the 1.0b and 1.0t versions? At this end, a quick skim suggests that the searchme.com host hits used 1.1 exclusively. (Haven't grepped for 0.05.)