Forum Moderators: open
The problem: I have a LARGE list of books, that I would like to index by author/title/isbn/reviews...
as you can imagine all this info is available on Amazon.
I read something about "Amazon web services", but it seems to me they're not appropriate to this project.
So I checked their robots.txt (message bottom), and it seems to me it doesnt require me to stay away from their resources...
so I wrote a spider in Perl and I'm currently running it.
My question for you is about its fairness:
1) am I allowed to do this?
2) currently I run a search every 3+int(rand(30)) seconds:
you think it's too low? Too high? Am I worrying too much?
sorry again if it's the wrong question/wrong place, but I found it hard to find info (apart from the good O'Reilly Spidering Hacks of course)
Thank you all!
Alessandro
--- amazon's robots.txt
# Disallow all crawlers access to certain pages.
User-agent: *
Disallow: /exec/obidos/account-access-login
Disallow: /exec/obidos/change-style
Disallow: /exec/obidos/flex-sign-in
Disallow: /exec/obidos/handle-buy-box
Disallow: /exec/obidos/tg/cm/member
Disallow: /gp/cart
Disallow: /gp/flex
Disallow: /gp/product/e-mail-friend
Disallow: /gp/product/product-availability
Disallow: /gp/product/rate-this-item
Disallow: /gp/sign-in
Disallow: /gp/reader
Disallow: /gp/sitbv3/reader
Disallow: /gp/richpub/syltguides/create
Disallow: /gp/customer-media
Disallow: /gp/gfix
Disallow: /gp/associations/wizard.html
Disallow: /gp/dmusic/order
Disallow: /gp/legacy-handle-buy-box.html
Disallow: /gp/aws/ssop
Disallow: /gp/yourstore
Disallow: /gp/gift-central/organizer/add-wishlist
Disallow: /gp/gurupamacro
Disallow: /gp/vote
Disallow: /gp/music/wma-pop-up
Disallow: /gp/customer-images