homepage Welcome to WebmasterWorld Guest from 204.236.255.69
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Quora link checkers
Something useful from AWS
incrediBILL




msg:4491717
 12:10 am on Sep 6, 2012 (gmt 0)


I was tinkering with Quora and posted a link to see what they used for validation and sure enough they're using AWS. Finally, something useful that gives me an actual reason to punch a hole in the AWS firewall.

50.16.83.146 - - [05/Sep/2012:23:59:18 +0000] "GET /sample.php HTTP/1.1" 200 8810 "-" "Quora Link Preview/1.0 (http://www.quora.com)"
23.20.62.58 - - [05/Sep/2012:23:59:19 +0000] "GET /sample.php HTTP/1.1" 200 8810 "-" "Quora Link Preview/1.0 (http://www.quora.com)"
23.20.14.25 - - [05/Sep/2012:23:59:35 +0000] "GET /sample.php HTTP/1.1" 200 8810 "-" "Python-urllib/2.7"
23.20.14.25 - - [05/Sep/2012:23:59:35 +0000] "GET /sample.php HTTP/1.1" 200 8810 "-" "Python-urllib/2.7"

Not sure why they needed to check it 4 times. Two IPs properly identified their user agents, the next two used default python UAs, pretty sloppy programming all around.

Sadly, punching a hole in the firewall for them leaves a gaping hole for scrapers using AWS. Need to put my noodle to work and figure out some ID scheme that allows vendors to use a shared modem pool and ID themselves without using rDNS because this situation is only going to get bigger as more sites transition to cloud computing.

 

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved