Forum Moderators: open

Archive.org now sometimes using generic user-agent?

         

SumGuy

1:20 pm on Oct 24, 2025 (gmt 0)

5+ Year Member Top Contributors Of The Month



Near as I can tell, the internet archiver always (or almost always) uses these user-agents:

Mozilla/5.0 (compatible; archive.org_bot +http: // archive . org/details/archive.org_bot)
Mozilla/5.0 (compatible; archive.org_bot +http: // archive . org/details/archive.org_bot) Zeno/(something) warc/(something)
Mozilla/5.0 (compatible; LAC_IAHarvester/3.3.0; +https: // archive . org/details/archive.org_bot)
Mozilla/5.0 (compatible; special_archiver/3.1.1 +http: // www . archive . org/details/archive.org_bot)

Recently I got a hit from 207.241.225.61 (wwwb-spn35.us.archive.org) where the UA was this:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36

The stale version of Chrome resulted in them getting my "I think you're a bot" page.

Now why did they do that?

tangor

11:57 am on Oct 28, 2025 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



archive.org has been robots.txt complaint for as long as I can remember. I wonder if they have changed operations.*

*Things change all the time.

lucy24

5:28 pm on Oct 28, 2025 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Psst, SumGuy, you can put it all inside [ code ] tags.

robots.txt complaint
Mwa ha ha.

:: irrelevantly wondering how long it will take Hathi Trust to figure out that if they disallow w3-checklink* in robots.txt they won't have to keep serving up 403s ::


* A minor point of irritation every time I post a new ebook, which is preceded by thorough link checking.