Forum Moderators: open

Message Too Old, No Replies

Getty Images (206.28.73.1) ignoring robots.txt

Fake user-agent, tries to grab all pages

         

jazzguy

8:18 pm on Aug 18, 2003 (gmt 0)

10+ Year Member



A bot from 206.28.73.1, which resolves to Getty Images, attempted to rapidly grab all pages linked from the home page of one of my sites. It did not read robots.txt first, and as a result, attempted to grab disallowed pages. The bot used a fake user-agent of "Mozilla/4.0 (compatible; MSIE 5.0; Windows NT)."

I don't know what their purpose was in trying to harvest my site, but I now have their entire range blocked.

Getty Images, Inc. CW-206-28-72-A (NET-206-28-72-0-1) 206.28.72.0 - 206.28.79.255

[edited by: jazzguy at 10:29 pm (utc) on Aug. 18, 2003]

wilderness

9:09 pm on Aug 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



from their page:
"At this time we are strictly a business-to-business website and provide photography only for commercial, corporate, editorial, publishing and similar uses."

Thanks for the heads up jazzguy
RewriteCond %{REMOTE_ADDR} ^206\.28\.(7[2-9])\. [OR]

jomaxx

5:01 pm on Aug 19, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'll bet they are spidering images and checking them for embedded watermarks, looking for sites using their pictures without paying royalties.