Hello there everyone!
I've written an e-commerce platform from scratch. There's an admin section and the template uses switches to display an admin link to only those with the sufficient privileges in the header as a convenience. Recently, I've begun getting some visits to pages that are supposed to be completely hidden from everyone else, bots included via Yandex and I'd love to find out how Yandex came upon these links.
So, here's what I've done so far to try to figure out how the links ended up visible(so far, all have been fruitless):
1) The first thing I did was try to decipher the Yandex redirect URL to backtrack. Unfortunately, I can find NO info on how to do this and the redirect does not match what it does when I enter a search term on the search engine. For instance, the "text" var is completely empty and all the info seems to be packed into the "etext" var, which I can't figure out how it's used. Maybe "encrypted text"? I tried plugging all the various vars into the search URL on the Yandex site but all my efforts resulted in a blank search page.
The URL in question:
[noparse]http://yandex.ru/clck/jsredir?from=yandex.ru%3Bsearch%3Bweb%3B%3B&text=&etext=1271.RJS9ZfLhVdj6nXam87qy4e0e-DG9BQd_KlyA1gFVBu1uuZOuUSRTgOEasX71Cupm.fe839c38b17c539463c0b2f7d01d86940f4b3320&uuid=&state=_BLhILn4SxNIvvL0W45KSic66uCIg23qh8iRG98qeIXmeppkgUc0YL_nDC5hqtEQ6WayFoZKRZE&data=UlNrNmk5WktYejY4cHFySjRXSWhXUFJiWDhna1NqZnBmd1YzNG43VS13RUpmdUZXdnBLOHdkMFlqUzVDamF1OVBVb2xkMmtvMUxXWUxJM1hSVW5hS2x5R1R6LVpCcGVXZFZZNkprR0JOSUVPc3d0ZnBVOXpDV295ckZDdFpqS3l4WkZSOFF3c0RmVTN2ZkhIYWIwT0JzNVQyWko5ME9vMw&b64e=2&sign=08505d8afebc7cb1b4568d3e92c11ecb&keyno=0&cst=AiuY0DBWFJ7IXge4WdYJQXbYQp9t5VF6sf_IfF4r6pdt0ojCe4cFQNegojWnJn8UToJJyLyR96RrC_bl9mqJxfCjbo3nl3EPqUjNd2ADc0Zxar8tKC1hQd4R3WTMI1AD3dVkg_IhwheNgkWXjuLnig&ref=orjY4mGPRjk5boDnW0uvlrrd71vZw9kp5uQozpMtKCXdCnh-_wii4V8gT36dWFhYdLgT8HVc5IPL1yluhUPYHlzmn9nr8Aaa3y8eC13fJRd5RgTTAPeGmg&l10n=ru&cts=1481853806438&mc=4.32492874929[/noparse]
Next, I downloaded the entire site via wget while using both a browser and Yandex search UA(it's how my site distinguishes bots to hide logins and human-specific content). Performing a search through all the downloaded content, I was unable to find any instance of the URLs in question.
I checked my sitemap.xml just to make sure it didn't get accidentally placed in there. All clean.
Finally, I did tons of searches on the Yandex site to see if I could stumble upon something but I can hardly find the site mentioned in the search engine, much less find the no-no URLs.
So, in the absence of any forward progress with this, I took the steps of forbidding any Yandex bots as well as automatically banning any user that is either showing the Yandex URL as a referrer or using Yandex's YaBrowser. This doesn't hurt the site as it sells product to 'Murrica only and Yandex has been the source of only malicious visits. Another point of interest is that Yandex is the only search engine to be the go-between for these hidden links.
There's a few scenarios I've imagined that could have been the genesis for these links getting seen. I'm keeping in mind that Yandex might not be the source. The links could have been picked up by Yandex on a malicious site sharing links or the visitors in question might be using the Yandex search engine to obfuscate the inbound links. At this point, I honestly have no clue. Regardless, here's my thoughts:
1) My code was faulty. Although all the pages check out now, maybe at one point my security checks weren't doing their job when the crawler hit. The fact that only one search engine is showing up with the links, makes this somewhat unlikely.
2) Site got hacked. It's not very likely, since the site keeps track of all visitors in a 30 day running window, I'm always on the site and constantly monitoring the visits to see what's going on. They'd have to find a way to bypass the tracking system, which is pretty unlikely.
3) Database got scraped. Maybe they got the links from the database, either on the web server or at the remote backup location.
4) I inadvertently shared them somehow. Often, when I'm asking for help on design or PHP forums, I'll save the generated HTML file on the server so others can see the page in question. I try to be careful to strip out the sensitive bits but perhaps I missed it once.
So that's it, I think. If you either have an idea for deciphering the Yandex redirect URL or one concerning how else I might track down the origination of these links on the web, I'd love to hear it. Thanks for your time!