homepage Welcome to WebmasterWorld Guest from 54.237.78.165
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
ResearchProject
Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4121207 posted 9:05 am on Apr 24, 2010 (gmt 0)

U.S. taxpayer dollars at work in higher ed...

planetlab4.csres.utexas.edu
ResearchProject/1.0 this_request_is_part_of_a_research_project_and_should_be_harmless

robots.txt? NO

 

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4121207 posted 8:47 pm on Apr 24, 2010 (gmt 0)

That was yesterday. Today, from Taiwan...

gate.tp2rc.edu.tw
ResearchProject/1.0 this_request_is_part_of_a_research_project_and_should_be_harmless

robots.txt? NO

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4121207 posted 9:35 pm on Apr 24, 2010 (gmt 0)

...And from Howard University (howard.edu) in Washington, D.C. ...

138.238.250.*
ResearchProject/1.0 this_request_is_part_of_a_research_project_and_should_be_harmless

robots.txt? NO

Per the OP's Host info, if this is yet another PlanetLab (planet-lab.org) thing, too bad its "researchers at top academic institutions and industrial research labs" apparently disdain standard UA ID and bot-running activity. Hrrmph.

[edited by: incrediBILL at 12:11 am (utc) on May 24, 2010]
[edit reason] Obscured IPs for HOWARD.EDU [/edit]

tangor

WebmasterWorld Senior Member tangor us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4121207 posted 11:20 pm on Apr 24, 2010 (gmt 0)

Dear Heart... do we expect anything different? The bots, bot-handlers, and those scraping the web think they can sell urls they think are merchandising worthy to the clueless seeking the holy grail of MFA hoping to make a killing in a saturated market where data is rapidly becoming mundane?

My fun thought is that no matter how big hard drives become, or how fast the connections, anyone attempting to "index the web" is butt stupid since there is way too much slop out there which is not worth a plug nickle. The real giggle is they have to spend tons of funds to play their games. We, as webmasters, spend our dollars in business/connection. And we can kill their business profile in .htaccess or similar, and they can't hurt us that much.

thetrasher

5+ Year Member



 
Msg#: 4121207 posted 2:02 pm on Apr 25, 2010 (gmt 0)

planetlab03.cs.washington.edu
planetlab4.ani.univie.ac.at
gschembra3.diit.unict.it

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4121207 posted 5:37 pm on Apr 25, 2010 (gmt 0)

@thethrasher: Did those Hosts run the "ResearchProject" bot against your site(s)? (There are 1090 planetlab nodes at 503 sites world-wide.)

FWIW: There's no "ResearchProject" per se on the planet-lab.org site, either by name or an as active or inactive project. Unless we see it run from another planetlab-specific subdomain somewhere, it may just be someone's pass-around effort.

Regardless of who's running it, I'm curious to know what it tries to do when not 403'd from the get-go but for robots.txt, which it neglects to get.

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4121207 posted 8:29 pm on Apr 25, 2010 (gmt 0)

Subdomain-wise, this name's close enough for me:

planet5.cs.ucsb.edu
ResearchProject/1.0 this_request_is_part_of_a_research_project_and_should_be_harmless

22:39:14 /
22:39:14 /

robots.txt? NO

thetrasher

5+ Year Member



 
Msg#: 4121207 posted 1:34 pm on May 23, 2010 (gmt 0)

ResearchProject has a new name

dplanet1.uoc.es
Mozilla/5.0 (Windows; U; Windows NT 6.1; pl; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3

planetlab1.pjwstk.edu.pl
Mozilla/5.0 (Windows; U; Windows NT 6.1; pl; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3

planetlab4.csres.utexas.edu
Mozilla/5.0 (Windows; U; Windows NT 6.1; pl; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4121207 posted 12:13 am on May 24, 2010 (gmt 0)

Planet Labs runs lots of different projects.

I block anything that returns RDNS with "planetlab" and have for years.

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4121207 posted 12:54 am on May 24, 2010 (gmt 0)

Ditto, Bill. And I redirect "^planet" now. And experientially troublesome domains including "planet" (theplanet.com, planetsmb.net, planet.com, planetarabia.com, planethutch.com, ipplanet.com, etc.).

Aside: Those names remind me of when "planet" had the same cache "cloud" does today.

enigma1

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4121207 posted 10:06 am on May 24, 2010 (gmt 0)

As far I know lots of universities offer proxies to the students to connect from anyplace and access various services (Eg: scholar documents). This means if the browser or system of a student is compromised an outsider can use the proxy and it will be the institute server that shows up. I do see quite few unrelated requests in my server logs that are similar.

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4121207 posted 6:51 pm on Oct 18, 2010 (gmt 0)

Another variation on the 'bad UAs (and/or bad-acting profs/students) from good schools come' theme:

sysnet95.ucsd.edu
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)

robots.txt? NO

rIP: 137.110.222.105
Hx: [botsvsbrowsers.com...]

From the looks of Bots vs Browsers's info, that IP's spoofed Googlebot two ways, too.

Dijkgraaf

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4121207 posted 8:47 pm on Oct 18, 2010 (gmt 0)

From ucsd.edu I've had both the spoofed GoogleBot and Slurp UA's as well as pretending to be a Firefox browser.

Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.3) Gecko/20100403 Firefox/3.6.3
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.9) Gecko/20100402 Ubuntu/9.10 (karmic) Firefox/3.5.9

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved