Forum Moderators: phranque
Forgive me if my quotes are less snazzy than the forum allows. If the mechanic is there to do it better, I can't find it.
they go from the index to a page in the hidden section that is impossible to get directly from the index.
Regarding your investigations, have you checked your raw access logs in relation to visitors who claim to be coming to the hidden URLs from "index"? It is quite common for bots to use http://example.com/ as a referrer, where example.com is your site.
I think that examination of raw access logs is more likely to point to non-human behavior - the tell is in the "coming from the index" part.
Looks to me like a human accessing your first page from a Yandex search (the search parameter looks common for Yandex) then the user turns on a scraping tool in stealth mode (hidden UA) to capture your assets.
Have you looked at changing file permissions (CHMOD) for specific resources so that the server does not allow access to "everyone"?
Does anyone other than you legitimately visit these URLs? Most of the time, it's enough to categorically block access to specified areas from everywhere but your own IP (assuming it doesn't change too often).
It's obviously nicest if you can stop unwanted visitors from making the request in the first place. But blocking is a solid second-best.
Compare that to the URLs labeled "2", it's a session that started about two hours after that sessionThat's when the user could have run the script (scraping tool.) Sorry, I still don't see the mystery other than you don't know who it is. I see lot of this.
It is absolutely impossible for anyone or thing to scrape, stumble upon, accidentally find, click on or see that path.What I am saying is the path is created the same way it is when you do it.
Are these visits with apparent Yandex referer human or humanoid? The main difference is that the humanoids don't request image files, just scripts and stylesheets, and they don't seem to execute scripts.
they execute any script on my webpages
do i block humans or bots?
Or is there a missing "never"?
Going by headers, they're fully humanoid; that's why they've been getting in.
We don't have a YandexDude, do we?
Pragma: no-cache
Accept-Language: en-us
Accept: */*
The Pragma: header is pretty rare, though not nonexistent, among humans-in-general; it's far more commonly sent by search engines, although not by the YandexBot. (Interestingly, it is also sent by another humanoid that's been active of late: the one from Drake Holdings at 204.79.180-181, which appears to be doing some kind of Bing-related investigation.) Accept-Language: ru-RU
Accept: image/gif, image/jpeg, image/pjpeg, image/pjpeg, application/x-shockwave-flash, */*
Note the “image/pjpeg, image/pjpeg” duplication. It isn't unique to these humans; it seems to be more common with older browsers and some MSIE versions. Claiming to speak Russian is understandable, since these requests come from bona fide Russian IPs. But, again, it's only Russian. Accept-Language: ru, uk;q=0.8, be;q=0.8, en;q=0.7, *;q=0.01
I block all traffic claiming a Yandex referer.
interestingly this massive fake traffic with yandex referrer initially appeared simultaneously with a really extensive yandex crawl on my sites.
Do you also deny the YandexBot, to eliminate legitimate humans who are legitimately using Yandex?
I do block the crawler