Forum Moderators: goodroi
At some point after our correspondence, a robots.txt exclusion request specific to the Wayback Machine was placed on the live blog. That request was automatically recognized and processed by the Wayback Machine and the blog archives were excluded, unbeknownst to us (the process is fully automated). The robots.txt exclusion from the web archive remains automatically in effect due to the presence of the request on the live blog. Also, the blog URL which previously pointed to an msnbc.com page now points to a generic parked page.
... a robots.txt exclusion request specific to the Wayback Machine was placed... . That request was automatically recognized and processed by the Wayback Machine and the blog archives were excluded, ... (the process is fully automated). The robots.txt exclusion from the web archive remains automatically in effect due to the presence of the request on the live blog ...
[edited by: not2easy at 9:10 pm (utc) on Apr 25, 2018]
[edit reason] splice cleanup [/edit]
The robots.txt file will do two things:[web.archive.org ]
1: It will remove documents from your domain from the Wayback Machine.
2: It will tell us not to crawl your site in the future.
The Internet Archive (Wayback Machine) has never been truthful about supporting robots.txt.
You can disallow their crawler (Archive-It) and they will still come back to scrape your pages spoofing as a human browser.