Forum Moderators: DixonJones

Message Too Old, No Replies

QPCreep Test Rig

We are not indexing, just testing

         

pendanticist

2:18 am on Dec 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



69.28.130.*** - - [06/Dec/2003:17:06:54 -0800] "GET /robots.txt HTTP/1.1" 200 1524 "-" "QPCreep Test Rig ( We are not indexing, just testing )"
69.28.130.*** - - [06/Dec/2003:17:06:54 -0800] "GET / HTTP/1.1" 403 480 "-" "QPCreep Test Rig ( We are not indexing, just testing )"

Had this one previously banned on a hunch that it's a variant of QuepasaCreep [google.com].

We are not indexing, just testing...for whom?

Any ideas?

Pendanticist.

claus

4:30 am on Dec 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Caution, it could be an object of mass confusion... at least post #6 of this thread [webmasterworld.com] has a similar IP stated as Lawrence Livermore Labs. I really don't know what such noble scientists would need a web crawler for, i thought real scientists used google...

Otoh, same thread has it on another IP belonging to the SE on [quepasa.com...] ... and some other IP's as well. I don't really know what to think of it, as that text is clearly not in Spanish, although it's probably intended to offer a little comfort to worried webmasters.

/claus

Goober

4:07 pm on Dec 7, 2003 (gmt 0)

10+ Year Member



Got this on my site also:

Host: 69.28.130.*** Url: / Http Code : 200
Date: Dec 07 02:41:56 Http Version: HTTP/1.1 Size in Bytes: 11276
Referer: - Agent: QPCreep Test Rig ( We are not indexing, just testing ) /<THANKS>/

I checked on Arin and it gave this:

OrgName: Limelight Networks, LLC
OrgID: LLNW
Address: 8936 North Central Avenue
City: Phoenix
StateProv: AZ

Cheers,
Goober

dorward

10:07 am on Dec 8, 2003 (gmt 0)

10+ Year Member



I've been hit by it too.

69.28.130.*** - - [08/Dec/2003:09:40:25 +0000] "GET /tmp/konq.png HTTP/1.1" 404 3876 "-" "QPCreep Test Rig ( We are not indexing, just testing )" 0 dorward.me.uk
69.28.130.*** - - [08/Dec/2003:10:03:39 +0000] "GET /robots.txt HTTP/1.1" 200 84 "-" "QPCreep Test Rig ( We are not indexing, just testing )" 0 dorward.me.uk
69.28.130.*** - - [08/Dec/2003:10:03:39 +0000] "GET /foaf.rdf HTTP/1.1" 200 9295 "-" "QPCreep Test Rig ( We are not indexing, just testing )" 0 dorward.me.uk

... and its ignoring my robots.txt

User-agent: *
Disallow: /tmp/
Disallow: /images/
Disallow: /notes/
Disallow: /lib/

Oaf357

10:30 pm on Dec 9, 2003 (gmt 0)

10+ Year Member



69.28.130.229 - - [09/Dec/2003:14:39:14 -0600] "GET /robots.txt HTTP/1.1" 200 372 "-" "QPCreep Test Rig ( We are not indexing, just testing )"
69.28.130.229 - - [09/Dec/2003:14:39:15 -0600] "GET / HTTP/1.1" 200 5 "-" "QPCreep Test Rig ( We are not indexing, just testing )"

NSLOOKUP does not resolve but a WHOIS gives me:

[Query: 69.28.130.229, Server: whois.arin.net]

OrgName: Limelight Networks, LLC
OrgID: LLNW
Address: 8936 North Central Avenue
City: Phoenix
StateProv: AZ
PostalCode: 85020
Country: US

ReferralServer: rwhois://rwhois.llnw.net:4321/

NetRange: 69.28.128.0 - 69.28.191.255
CIDR: 69.28.128.0/18
NetName: LLNW-1
NetHandle: NET-69-28-128-0-1
Parent: NET-69-0-0-0-0
NetType: Direct Allocation
NameServer: DNS.LAX.LLNS.NET
NameServer: DNS.LGA.LLNS.NET
NameServer: DNS.SJC.LLNS.NET
NameServer: DNS.IAD.LLNS.NET
Comment: Network reassignments available via
Comment: rwhois.llnw.net 4321
RegDate: 2003-03-07
Updated: 2003-07-09

OrgAbuseHandle: WP215-ARIN
OrgAbuseName: Petrisko, William
OrgAbusePhone: +1-602-850-5095
OrgAbuseEmail: ipadmin@limelightnetworks.com

OrgTechHandle: WP5-ARIN
OrgTechName: Petrisko, William
OrgTechPhone: +1-602-850-3089
OrgTechEmail: billp@wjp.net

# ARIN WHOIS database, last updated 2003-12-08 19:15
# Enter? for additional hints on searching ARIN's WHOIS database.

[End of Data]

llns.net looks like a hosting provider so this could easily be anything at this point. I'm not going to block it yet but I'm interested to find out what it is.

jim_w

10:34 pm on Dec 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



As of lately, I stopped blocking pretty much everything. You can’t have too many links to your site. Of course my product can be used by any business, and there are no emails to harvest, so, let’em come.

alexwilli

1:12 pm on Dec 11, 2003 (gmt 0)

10+ Year Member



69.28.130.** Dec 11 05:00:00... Http Code : 404... QPCreep Test Rig ( We are not indexing, just testing )

The worst part about their robot, besides ignoring robots.txt, is that the thing is just plain dumb. I receive about 4 hits a day from it and it invariable gets 3 404's because the thing tries to read .htm files even though all my pages and internal links are .html.

I'd say their testing is going terribly.

mcdave

2:11 pm on Dec 11, 2003 (gmt 0)

10+ Year Member



I've banned it. I had over a thousand hits from it in a 12 hour period (400 odd on the robots.txt file, and 686 on the index page of my blog). It should have picked up that my site is in English, given that <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> might be a bit of a hint, as well as the .uk suffix, in which case it ought to have decided not to bother spidering me more often than once in the odd blue moon.
But as it serves no useful purpose (why should I help them test? they won't even index me because I'm not in Spanish) and as it's so stupid, banning seems the only option.

Oaf357

10:43 pm on Dec 11, 2003 (gmt 0)

10+ Year Member



I guess the question now is do we webmasters let them use us as guinea pigs or not?

I think it's great that they are trying to do real world testing but if this test is sucking up our bandwidth uselessly then why should we help?

Where should the line be drawn exactly because there will be more of this in the future.

alexwilli

10:51 pm on Dec 11, 2003 (gmt 0)

10+ Year Member



The one reason I'm tempted to turn it off is because they don't list an email adress or any other contact information in the User Agent, SOP for crawlers.

I'll give them another day, but Saturday they're going on my block list.

pendanticist

6:43 am on Dec 20, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



69.28.130.*** - - [19/Dec/2003:18:19:11 -0800] "GET /robots.txt HTTP/1.1" 200 1524 "-" "QuepasaCreep ( crawler'at'quepasacorp.com )"
69.28.130.*** - - [19/Dec/2003:18:19:11 -0800] "GET / HTTP/1.1" 403 480 "-" "QuepasaCreep ( crawler'at'[red]quepasacorp.com[/red] )"

As you can see, I've got it banned. However, you'll also note they seem to have changed their UA to include a contact addy.

Perhaps they're set to begin?

On the other hand:

[search.msn.com...]

Pendanticist.

keyplyr

7:32 am on Dec 20, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I was getting their test crawls for a few days, but now the 'new and improved' bot is coming round and this one requests, and is obeying robots.txt:

69.28.130.*** - - [19/Dec/2003:14:11:11 -0800] "GET /robots.txt HTTP/1.1" 200 734 "-" "QuepasaCreep ( crawler@quepasacorp.com )"

So h*tp://www.quepasa.com isn't who they are?

claus

11:20 am on Dec 20, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The domains quepasacorp.com and quepasa.com have the same owner, it's the US-Hispanic SE. Whois info for quepasacorp.com:

Registrant:
quepasa corporation
410 N. 44th Street Suite 450
Phoenix, AZ 85008
US

/claus

pendanticist

2:38 pm on Dec 20, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hmmmm. Since my site is an academic portal type domain, I'm wondering if I ought to allow them some latitude and see how this pans out.

Always a good thing to have bi/multi-lingual SEs. :)

Pendanticist.

aodonline

1:50 pm on Dec 24, 2003 (gmt 0)

10+ Year Member



"QPCreep Test Rig ( We are not indexing, just testing )"
I'm banning them since every other request is for the robots.txt file.

NetRange: 69.28.128.0 - 69.28.191.255
CIDR: 69.28.128.0/18

On the other hand has anyone been hit by these:
"Mozilla/4.0 compatible ZyBorg/1.0 Dead Link Checker (wn.zyborg@looksmart.net; [WISEnutbot.com)"...]
"SearchGuild_DMOZ_Experiment chris@searchguild.com"
"Exalead NG/MimeLive Client (convert/http/0.147)"
"Szukacz/1.5 (robot; www.szukacz.pl/jakdzialarobot.html; info@szukacz.pl)"
"Xenu Link Sleuth 1.2e"
"sitecheck.internetseer.com (For more info see: [sitecheck.internetseer.com)"...] (Never even requested there service)
"http://www.almaden.ibm.com/cs/crawler [c01]"

kevinpate

2:05 pm on Dec 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



aodonline,

The first three in your list I am notfamilar with.
Each of the remaining ones in your list are on my
'I hear ya knocking but ya can't come in' list, and have been for quite some time.

Oaf357

2:55 pm on Dec 24, 2003 (gmt 0)

10+ Year Member



I've never seen the .pl or SearchGuild one but all the others I've seen before and if my memory serves me right I allow them all (maybe not Xenu Link Sleuth but I can't remember off the top of my head).

Oaf357

2:55 pm on Dec 24, 2003 (gmt 0)

10+ Year Member



I've never seen the .pl or SearchGuild one but all the others I've seen before and if my memory serves me right I allow them all (maybe not Xenu Link Sleuth but I can't remember off the top of my head).

claus

3:18 pm on Dec 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Xenu is a piece of link checker software - i think you will be able to find info on all of them by using the "site search" on top of the page