homepage Welcome to WebmasterWorld Guest from 54.242.140.11
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Amazon Image Cache/0.5 libwwwperl/5.808
brokaddr



 
Msg#: 4512891 posted 3:57 am on Oct 27, 2012 (gmt 0)

Is this truly "Amazon" scraping my images?

Page:
/thumb.php?img=images/widget.jpg&w=150&h=150

IP:
72.21.217.33

User Agent:
Amazon Image Cache/0.5 libwwwperl/5.808


If so, what are they doing? I'm not an Amazon client.

 

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4512891 posted 6:49 pm on Oct 27, 2012 (gmt 0)

some interesting reading in two of the serps [google.com]

FWIW, this UA (libwwwperl) is one that most every beginner blacklists, and nothing suggests a difference of opinion from that first impression

brokaddr



 
Msg#: 4512891 posted 9:26 pm on Oct 27, 2012 (gmt 0)

wilderness,

That link only gave 3 results for me, and the most interesting looking one couldn't be reached.

I thought I had libwwwperl blacklisted, but it turns out, I (sort of) did:
SetEnvIfNoCase User-Agent "libwww-perl/" bad_bot

I will be updating the entry for the new version!

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4512891 posted 9:55 pm on Oct 27, 2012 (gmt 0)

Unless you have particular need to let SOME amazon IPs into your site, block ALL amazon IP ranges. There are a few lists of them in this forum - go for the latest.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4512891 posted 10:16 pm on Oct 27, 2012 (gmt 0)

:: detour to own htaccess ::

Heh. I've got "libwww-perl" commented-out because it's what the Link Checker uses. But it would probably make more sense to un-comment the block (in my case part of a BrowserMatch list) and just restore the # when I actually do check links, which is not often.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4512891 posted 10:24 pm on Oct 27, 2012 (gmt 0)

just restore the # when I actually do check links, which is not often.


Same procedure I use for Xenu.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4512891 posted 3:23 am on Oct 28, 2012 (gmt 0)

That link only gave 3 results for me, and the most interesting looking one couldn't be reached.


One of the SERPS provided an explanation of how to use the Amazon Cache with WordPress pages to offer links to product catalogs.

FWIW, there were six results in the search. No idea why you saw less.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4512891 posted 5:34 am on Oct 28, 2012 (gmt 0)

I got six results-- but the first two are from the same domain and came up "missing" in the browser, and no. 4 (the least promising of the batch, I was just being thorough) threw a "page load error".

You can get those first two in -- hahaha -- cached versions. But they are not very interesting or useful. The WordPress link seems to work.
From the image information within the product data, an image for each product is fetched and staged. This is again implemented in perl making use of wget.

By amazing coincidence I've also got "wget" blocked. Or rather "Wget"; don't know why only that form.

Bewenched

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4512891 posted 1:34 am on Oct 31, 2012 (gmt 0)

Yea not liking the result about Product Image clouds at ALL!
I'd really hate to have to start water marking images again. GRRR.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved