Forum Moderators: skibum
Obvious things to do
- Value add.. well I always did that anyways
- Keep local copies of images. Always did that too
- "hide" affiliate links via redirect script & robots.txt.
But it occured to me that we are all likely using the same images. So it would be a simple matter of comparing the md5 sig of our images and if a lot of images with the same md5 sig appear elsewhere, say amazon, then the bot could conclude we are a scraper or affiliate.
Personally I am going to start adding random bits to the images I use to make them unique.
Well what do you think? paranoia?
1. getting an md5 of an image - i kind of understand the idea, but not the specifics. is it an md5 of just the file name or the actual image data? how do you get an md5 sig - what tools, languages, etc?
2. how to add random bits to images? i've seen com components that can manipulate images - is that the idea? how many random bits would you have to add to significantly change the md5?
aside from saving local copies of images - another option is to reverse proxy the image requests back to the merchant. this way you can save a ton of disk space (especially if using a datafeed and there are tons of images).
thanks!
Its a cryptographic hashing function that reduces an item to a unique 32 byte code. Almost no 2 sets of data will produce the same code ( a problem called data collision ). There are other types of hashing functions though.
You use a function in your language of choice or write one. In my case I am using php, I don't remember the dll for it in windows but its there.
===2. how to add random bits to images? i've seen com components that can manipulate images - is that the idea? how many random bits would you have to add to significantly change the md5?==
A single bit flip would be enough. Again I am using php so I just add a water mark via GD.
==aside from saving local copies of images - another option is to reverse proxy the image requests back to the merchant. this way you can save a ton of disk space (especially if using a datafeed and there are tons of images). ==
Yeah I agree, thats a great approach. I do something similiar but I maintain a cache of the image on my server after the first request.