Forum Moderators: open

Message Too Old, No Replies

Internet Archive Python

         

Angonasec

2:09 pm on Sep 26, 2016 (gmt 0)



I managed to get all our sites pulled from Archive.org some years ago. A pedestrian, though painless procedure.

Though this serpent obeyed robots... today, their CIDR is now blocked.

207.241.225.65 - - [25/Sep/2016] "GET /robots.txt HTTP/1.1" 301 596 "-" "python-requests/2.11.0"
207.241.225.65 - - [25/Sep/2016] "GET /robots.txt HTTP/1.1" 200 3144 "-" "python-requests/2.11.0"

NetName: INTERNET-ARCHIVE-1
207.241.224.0 - 207.241.239.255
207.241.224.0/20 Ber...locked

Kindly let me know if you dig up any more of their ranges.

keyplyr

2:19 pm on Sep 26, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I had a bit more trouble getting sites out of Archive.org. Took me over a year to remove several sites. They would confirm the site was removed, then a few weeks later I would find the sites back in their Wayback Machine. I eventually had to file DMCAs.

Ironically Archive.org does not support noarchive meta tags or X-Robots-Tag: noarchive.

FYI - in addition to the assigned crawl ranges Archive.org uses, they also come covertly form various other ranges not assigned to them, faking common browser UAs.

blend27

1:02 pm on Sep 27, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



INTERNET-ARCHIVE-2
208.70.24.0 - 208.70.31.255
208.70.24.0/21

Angonasec

12:41 pm on Sep 29, 2016 (gmt 0)



Thanks Alex:.. another cidr to block...

207.241.224.0/20 Internet Archive