I did a quick search online and didn't immediately see an answer to this question so I am posting here to see if anyone knows.
I noticed a web site had a large quantity of my lectures on their site. It would be a pain to have to list every single url on their site that is infringing.
To get safe harbor I would assume that the site would at least need to be able to show the documents were stored either because of some temporary, transitory process, or were posted by someone other than the site itself, or came about because of a non-human crawling activity. If they can't show this than it doesn't seem like they could argue that they are a safe harbor. If they are crawling, they need at some point to have stored the url and then the document associated with that url. This would constitute a proof that their acquisition of the document might have safe harbor status.
If I know that all documents in under some path on my website are copyright by me, it seems like I could ask for a take down of documents on their site obtained by their crawler when it downloaded urls that began with a fixed prefix. My question is to what degree the notion of identification has been worked out with regard to DMCA?