Quora link checkers

Something useful from AWS

12:10 am on Sep 6, 2012 (gmt 0)

I was tinkering with Quora and posted a link to see what they used for validation and sure enough they're using AWS. Finally, something useful that gives me an actual reason to punch a hole in the AWS firewall. - - [05/Sep/2012:23:59:18 +0000] "GET /sample.php HTTP/1.1" 200 8810 "-" "Quora Link Preview/1.0 (http://www.quora.com)" - - [05/Sep/2012:23:59:19 +0000] "GET /sample.php HTTP/1.1" 200 8810 "-" "Quora Link Preview/1.0 (http://www.quora.com)" - - [05/Sep/2012:23:59:35 +0000] "GET /sample.php HTTP/1.1" 200 8810 "-" "Python-urllib/2.7" - - [05/Sep/2012:23:59:35 +0000] "GET /sample.php HTTP/1.1" 200 8810 "-" "Python-urllib/2.7"

Not sure why they needed to check it 4 times. Two IPs properly identified their user agents, the next two used default python UAs, pretty sloppy programming all around.

Sadly, punching a hole in the firewall for them leaves a gaping hole for scrapers using AWS. Need to put my noodle to work and figure out some ID scheme that allows vendors to use a shared modem pool and ID themselves without using rDNS because this situation is only going to get bigger as more sites transition to cloud computing.