Welcome to WebmasterWorld Guest from 54.198.3.15

Forum Moderators: phranque

Message Too Old, No Replies

Tracking down how "hidden" URLs showed up on search engine

Yandex seems to be directing visitors to the batcave

     
1:45 pm on Dec 21, 2016 (gmt 0)

New User

joined:Dec 21, 2016
posts: 15
votes: 1


Hello there everyone!

I've written an e-commerce platform from scratch. There's an admin section and the template uses switches to display an admin link to only those with the sufficient privileges in the header as a convenience. Recently, I've begun getting some visits to pages that are supposed to be completely hidden from everyone else, bots included via Yandex and I'd love to find out how Yandex came upon these links.

So, here's what I've done so far to try to figure out how the links ended up visible(so far, all have been fruitless):

1) The first thing I did was try to decipher the Yandex redirect URL to backtrack. Unfortunately, I can find NO info on how to do this and the redirect does not match what it does when I enter a search term on the search engine. For instance, the "text" var is completely empty and all the info seems to be packed into the "etext" var, which I can't figure out how it's used. Maybe "encrypted text"? I tried plugging all the various vars into the search URL on the Yandex site but all my efforts resulted in a blank search page.

The URL in question:
[noparse]http://yandex.ru/clck/jsredir?from=yandex.ru%3Bsearch%3Bweb%3B%3B&text=&etext=1271.RJS9ZfLhVdj6nXam87qy4e0e-DG9BQd_KlyA1gFVBu1uuZOuUSRTgOEasX71Cupm.fe839c38b17c539463c0b2f7d01d86940f4b3320&uuid=&state=_BLhILn4SxNIvvL0W45KSic66uCIg23qh8iRG98qeIXmeppkgUc0YL_nDC5hqtEQ6WayFoZKRZE&data=UlNrNmk5WktYejY4cHFySjRXSWhXUFJiWDhna1NqZnBmd1YzNG43VS13RUpmdUZXdnBLOHdkMFlqUzVDamF1OVBVb2xkMmtvMUxXWUxJM1hSVW5hS2x5R1R6LVpCcGVXZFZZNkprR0JOSUVPc3d0ZnBVOXpDV295ckZDdFpqS3l4WkZSOFF3c0RmVTN2ZkhIYWIwT0JzNVQyWko5ME9vMw&b64e=2&sign=08505d8afebc7cb1b4568d3e92c11ecb&keyno=0&cst=AiuY0DBWFJ7IXge4WdYJQXbYQp9t5VF6sf_IfF4r6pdt0ojCe4cFQNegojWnJn8UToJJyLyR96RrC_bl9mqJxfCjbo3nl3EPqUjNd2ADc0Zxar8tKC1hQd4R3WTMI1AD3dVkg_IhwheNgkWXjuLnig&ref=orjY4mGPRjk5boDnW0uvlrrd71vZw9kp5uQozpMtKCXdCnh-_wii4V8gT36dWFhYdLgT8HVc5IPL1yluhUPYHlzmn9nr8Aaa3y8eC13fJRd5RgTTAPeGmg&l10n=ru&cts=1481853806438&mc=4.32492874929[/noparse]

Next, I downloaded the entire site via wget while using both a browser and Yandex search UA(it's how my site distinguishes bots to hide logins and human-specific content). Performing a search through all the downloaded content, I was unable to find any instance of the URLs in question.

I checked my sitemap.xml just to make sure it didn't get accidentally placed in there. All clean.

Finally, I did tons of searches on the Yandex site to see if I could stumble upon something but I can hardly find the site mentioned in the search engine, much less find the no-no URLs.

So, in the absence of any forward progress with this, I took the steps of forbidding any Yandex bots as well as automatically banning any user that is either showing the Yandex URL as a referrer or using Yandex's YaBrowser. This doesn't hurt the site as it sells product to 'Murrica only and Yandex has been the source of only malicious visits. Another point of interest is that Yandex is the only search engine to be the go-between for these hidden links.

There's a few scenarios I've imagined that could have been the genesis for these links getting seen. I'm keeping in mind that Yandex might not be the source. The links could have been picked up by Yandex on a malicious site sharing links or the visitors in question might be using the Yandex search engine to obfuscate the inbound links. At this point, I honestly have no clue. Regardless, here's my thoughts:

1) My code was faulty. Although all the pages check out now, maybe at one point my security checks weren't doing their job when the crawler hit. The fact that only one search engine is showing up with the links, makes this somewhat unlikely.
2) Site got hacked. It's not very likely, since the site keeps track of all visitors in a 30 day running window, I'm always on the site and constantly monitoring the visits to see what's going on. They'd have to find a way to bypass the tracking system, which is pretty unlikely.
3) Database got scraped. Maybe they got the links from the database, either on the web server or at the remote backup location.
4) I inadvertently shared them somehow. Often, when I'm asking for help on design or PHP forums, I'll save the generated HTML file on the server so others can see the page in question. I try to be careful to strip out the sensitive bits but perhaps I missed it once.

So that's it, I think. If you either have an idea for deciphering the Yandex redirect URL or one concerning how else I might track down the origination of these links on the web, I'd love to hear it. Thanks for your time!
1:43 pm on Feb 18, 2017 (gmt 0)

New User

joined:Dec 21, 2016
posts: 15
votes: 1


Hi there MrKen and thanks for adding to the thread! I find the topic awfully interesting, to say the least.

First and foremost, NEVER pass your secret stuff via GET. Everyone and their grandmother can view that information nowadays.

Secondly, I suspect you're on the right track with the suspicion of the data being scraped from your browser history somehow. As I explained above, the dynamic URLs that I use for admin access are never posted anywhere to be found and are only usable if someone is logged in as me, which I would see in my tracker. They're getting the data somehow from the browser, I'm sure.

This is the last 24 hours of autobans:

[imgur.com...]

The admin session in those links was active and being used by me yesterday. They seem to be receiving and then using the data in real time. Also of interest is that the Yandex bans are only trying to access a restricted area. I almost never see one of these attempts trying to visit a publicly accessible page. I think the exploit bot is designed to scrape urls with keywords (edit, ban, admin, etc.)

Third: You're also right about the spoofed Yandex referring URLs. I've blocked Yandex for two months now and am still getting referral URLs from yandex.ru and they are admin URLs that were generated after the bot was blocked.

I don't use the WoT addon but do use many addons that are allowed to view and alter my web visits, like tampermonkey, UA switcher, Adguard, etc. so my information is getting siphoned via other means. Since I changed my URL composition, I no longer feel a sense of urgency to track that particular leak down but will eventually get around to it since it would be interesting to find out how it's happening.

I would mention that although in this case, I think it's useless to block it, many sites, mine included, see no negative impact in blocking Yandex. It all depends on the content/intended audience. I wouldn't block it on a site hoping for an audience wider in range than just 'Murrica.
3:55 pm on Feb 18, 2017 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:3852
votes: 215


14 seconds later there were simultaneously 8 different IP's at exactly the same 'second' to different js files
This does not indicate spoofed IP addresses, it indicates that the script is being run via bot-net. The IPs don't show up in block lists because they are real people whose computers have been compromised by some malicious "Amazing Free Download" or app and have no idea that "they" ever visited your site. They might wonder why their computer's performance has declined, but the visits are done via the bot net script in the background, not real visits.
5:26 pm on Feb 18, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14924
votes: 653


Dang, not2easy, you beat me to it ;) I was going to say simply:
You cannot spoof an IP address
UA, yes. Referer, yes. Miscellaneous headers, anything you like. But IP, no.
3:56 am on Feb 19, 2017 (gmt 0)

New User

joined:Feb 18, 2017
posts: 2
votes: 0


Dang, not2easy, you beat me to it ;) I was going to say simply:
You cannot spoof an IP address
UA, yes. Referer, yes. Miscellaneous headers, anything you like. But IP, no.

Really?
"the sender's address in the header can be altered, so that to the recipient it appears that the packet came from another source"
[en.wikipedia.org ]

Anyway, I'm not here to argue whether IPs or referrals can be spoofed. I'm not here to discuss the pros and cons of blocking Yandex referrals.
I just wanted to give a 'Heads Up' that there are people out there that have access to stuff that you don't really want them to have.

Keep in mind that we are not dealing with a bunch of High School hackers. We are dealing with professional hackers that know exactly what they are doing.
Right now, there are thousands of webmasters out there whose 'secret' urls have been compromised, and they don't know it.
6:25 am on Feb 19, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14924
votes: 653


:: detour to assorted legitimate lookups, including That Other Forum* ::

Yes, yes, all right, you can send a fake IP address with your request ... but unless the requester's sole purpose is DDoS, it won't do anything. It's not like a telemarketer sending fake Caller ID information to get you to pick up the phone, or a fake return address on mail (whether e or snail) to get you to open it.

If you, located at 11.22.33.44, send a request to example.com saying "I'm from 22.33.44.55 and I want to see suchandsuch file", then example.com will send the requested file to 22.33.44.55. This may hurt 22.33.44.55 or it may hurt example.com, depending on what else is going on, but the one thing it can never result in is someone at 11.22.33.44 seeing the requested material; they won't even know whether it was sent. In fact it's very unlikely that even 22.33.44.55 will see the material, since they didn't ask for it and have no reason to look for it.


* Oh, lord, can they get snippy. Today I learned about the "XY Question".
1:36 pm on Feb 19, 2017 (gmt 0)

New User

joined:Dec 21, 2016
posts: 15
votes: 1


Since MrKen isn't here to discuss the content of my thread, I vote we come back around to me :)

Yesterday, I took a gander at the traceroutes for all the hits with the Yandex referrers. 100% of them for the last month(the amount of time I keep detailed tracking logs for) originated in Moscow so I went ahead and blocked access for the city. I want to keep the ban list clear of them so I can see if any at all are originating from elsewhere. So far, I've not seen anything.

I'll update if anything fun pops up!
9:38 am on Feb 20, 2017 (gmt 0)

New User

joined:Feb 20, 2017
posts: 4
votes: 0


Hi there.

I stumbled upon this thread while investigating faked yandex-referrals to hidden URLs in my system. URLs which I believe have never been published at all - so I also suspect that some plugin/extension/whatever is leaking them. I've created a set of "bait" links and used them in different browsers, sent some by email etc to see if I can find the source.

I'll post an update here if I find anything interesting.
9:52 am on Feb 20, 2017 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12110
votes: 775


Hi hultee and welcome to WebmasterWorld [webmasterworld.com]
1:00 pm on Feb 20, 2017 (gmt 0)

New User

joined:Dec 21, 2016
posts: 15
votes: 1


I would love to hear more about what you find hultee. Unfortunately, the same angle didn't work for me, likely due to simply not using it enough.

Could you list all the addons you're utilizing? We can compare them and see if anything useful can be made of it.
1:13 pm on Feb 20, 2017 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12110
votes: 775


BTW - there's no such thing as a "hidden" url. If a file resides on the server and can be accessed by ftp or http, it can also be accessed by a remote request using those protocols.

The file does not need to be linked to from a web page or email. The 3rd party doesn't need to know the name of the file. All they need do is access your server, open the directory and get the files. Many ways to do it, all very simple.
1:14 pm on Feb 20, 2017 (gmt 0)

New User

joined:Feb 20, 2017
posts: 4
votes: 0


os x
chrome - the only extension I'm using right now is stylish (I removed/disabled a few extensions when I first started to suspect that it was my computer, but can't remember exactly which ones)

Also.. My system runs on Heroku. All the visited URLs are present in Heroku's router logs, which I also transfer to papertrail, though these are (hopefully) encrypted =)
1:22 pm on Feb 20, 2017 (gmt 0)

New User

joined:Feb 20, 2017
posts: 4
votes: 0


keyplyr: The URLs I'm talking about are hidden in the sense that they can't be found by navigating/crawling. My application is not file-based, there is no ftp access etc. Yes, you could potentially do a brute-force attempt at finding the URLs, but then I would see those attempts in my logs, and I don't. Some component, most likely on my local computer is leaking these URLs.
2:18 pm on Feb 20, 2017 (gmt 0)

New User

joined:Dec 21, 2016
posts: 15
votes: 1


There are such things as "hidden" URLs when you're combining path variables and these files or paths used in the URL don't actually exist in any form.

<no personal links please>

While you'll just get dropped to the index with a 404 response much in the same way as if you typed anything else behind the TLD, it works just fine for me, sending me to the tracking module in my admin panel. There are no instances of that URL anywhere on any page(this page excluded :) ), in any link or elsewhere. It's a random string stored in a database that's valid for just my authenticated user while everyone else simply gets responded with a 404.

That's pretty much the definition of "hidden".



[edited by: not2easy at 2:23 pm (utc) on Feb 20, 2017]
[edit reason] please see ToS/Charter [/edit]

6:39 pm on Mar 1, 2017 (gmt 0)

New User

joined:Feb 20, 2017
posts: 4
votes: 0


I tracked the leak down to chrome on my local computer using a set of bait links. Wireshark reveals that the stylish extension (custom CSS styling) makes a request to its "api" every single time I visit a page).

[ghacks.net...]

So basically the plugin is now owned by an analytics company monitoring every single URL that you visit, not even just hostnames.. classy stuff!
6:46 pm on Mar 1, 2017 (gmt 0)

New User

joined:Dec 21, 2016
posts: 15
votes: 1


Thanks so much for the update, hultee! I removed the extension and will see what happens to the traffic.
This 45 message thread spans 2 pages: 45