Forum Moderators: phranque
from their page: [web.archive.org...]
....http://web.archive.org/200109*/http://www.mysite.com*
This returns all URLs that begin with ...http://www.mysite.com which were archived in September 2001.
Also, anyone know why they lag so far behind in showing their archives?
(they state 6 to 12 months, but why so stale?)
I am hoping to use the archive.org to show copycat webmasters that my content was there before they copied it.
Alexa doesn't deep crawl sites with minimal traffic. So, you will need to install the Alexa toolbar and visit all the pages you want the crawler to capture... you can check and you will see that the crawler visits within 24 hours.
Alexa donates archived pages to the archive after six months... apparently this is to assuage copyright risk.
Regarding the crawler and your robots.txt, I believe that the wayback machine automatically checks your robots.txt file every time somebody does a search on your site, with minimal caching. This way you could exclude your site's content from the wayback machine by editing your robots.txt file, in real-time.
If the ia_archiver hasn't visited your site in over two months (it takes them two months to complete each crawl), it is probably because none of the Alexa toolbar users visited your site.