I don't think they're planning to obey robots.txt. I think they're just looking for ideas about what to get next.
Compliant entities are supposed to interpret robots.txt as broadly as possible, so if you have a rule matching "python" or "curl" (case-INsensitive) they should follow it. Someone hereabouts, possibly phranque, once explained it in some detail. But really, I tend to doubt that compliance forms any part of their intention.
:: detour to logs ::
Lot of this kind of thing:
aa.bb.cc.dd - - [31/Aug/2019:14:19:10 -0700] "GET /robots.txt HTTP/1.1" 200 3152 "-" "python-requests/2.22.0"
aa.bb.cc.dd - - [31/Aug/2019:14:19:10 -0700] "GET / HTTP/1.1" 403 1837 "-" "python-requests/2.22.0"
Well, they do tend to request robots.txt
before their other requests, in contrast to the popular malign-robot behavior of asking only after a series of (usually blocked) page requests.
At one time I must have seen a lot of “Python-urllib”, because I find a robots.txt disallow. They're still around, but haven't asked for robots.txt in the recent past. Over on the “install a deadbolt” side (as opposed to the robots.txt “post a No Admittance sign” side) I've got a comprehensive block on
^[Pp]ython
where the opening anchor doesn't mean “it’s OK if you say Python somewhere further along” but simply that Python always happens to come first--exceptions are vanishingly rare--so the server doesn't need to check the whole thing.
Edit: I've stopped checking for “Mozilla” at all. By this time, almost 90% of all requests--including almost 3/4 of blocked requests--claim to be Mozilla, and most of the rest are known quantities one way or the other. So it’s no longer as dispositive as it was a few years ago.
If someone comes in claiming to be Chrome or Firefox, I set an environmental variable called “lying_bot”. This is not used directly for access control, but causes robots.txt (which is really robots.php) to issue the minimalist
User-Agent: *
Disallow: /
version. Yes, this also means that if humans snoopily ask for robots.txt, they probably won't see the real thing. But oh well.