Forum Moderators: DixonJones

Message Too Old, No Replies

Wayback Machine UA?

         

keyplyr

9:49 am on Mar 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




Anyone know what will work to keep outa the wayback machine? I've requested by email numerous times, but no reply.

This used to work, but no longer:

In robots.txt:

User-agent: ia_archiver
Disallow: /

In .htaccess:

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver
RewriteRule ^.* - [F,L]

Brett_Tabke

11:10 am on Mar 13, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Still does work - they are just behind.

tigger

11:18 am on Mar 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>Anyone know what will work to keep outa the wayback machine

Just out of interest why?

Key_Master

1:56 pm on Mar 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You have to allow ia_archiver to visit robots.txt (without a 403) otherwise people will still be able to search the wayback machine for old copies of your site.

jdMorgan

2:27 pm on Mar 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, allow everyone to access robots.txt, always.

In .htaccess:


RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver
RewriteRule !^robots\.txt$ - [F]

(With [F], [L] is redundant)

Jim

keyplyr

6:54 pm on Mar 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Still does work - they are just behind. - Brett

Well "behind" makes no sense in our case since we just re-appeared. Coincidently, I just now received a reply stating they "would look into it." But from what I have read at numerous newsgroups, robots.txt is intermittently ignored by this crawler. Because our .htaccess is no longer working, I was thinking they had changed or added an additional identifier and would like to know what it is?

(keep outa the wayback machine) Just out of interest why? - tigger

Mainly because our website looked like crap 4 years ago - LOL

NorthernStudio

6:58 pm on Mar 23, 2003 (gmt 0)

10+ Year Member



I just now received a reply stating they "would look into it."

Don't expect a followup. I received the exact same message a month ago when I complained about their practice of posting public links to the robots.txt files of sites that block them. I find this extremely rude and spiteful.

Since it's now a public document thanks to Alexa, I've included text in our robots.txt file explaining our reasons for blocking them and also point out sections of their "privacy" policy that explains the tracking the Alexa toolbar does since most people aren't aware when the download what is considered by many to be "spyware."

I understand when people question why I won't have the site archived. I'd like to believe I have valid reasons when I update our sites and am the best judge of its timeliness and the relevance of the material. Links change, we've moved, content is no longer relevant, events have long passed, our opinions may have changed, etc.

WM

Brett_Tabke

7:35 pm on Mar 23, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Nice hack JD. Thanks.