Forum Moderators: open

Message Too Old, No Replies

now what's Yandex up to?

         

lucy24

6:32 pm on Dec 27, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In the course of looking up something else, I found this headscratcher.

Over the past 6 months or so, the YandexBot--the real one afaik, not an impostor--has been asking for nonexistent URLs with name in the form
/\h{8}-\h{4}-\h{4}-\h{4}-\h{12}
for example
/374c902b-7a1f-4998-b7e0-878ca7b04e62
where \h is short for [a-f0-9] i.e. any hexadecimal character. (SubEthaEdit permits this useful shorthand; sadly I don't think anyone else does.) No extension.

Multiple sites, always a completely different string, but always the same pattern. Does anyone recognize the format? Is it, for example, something used by a major CMS (and, if so, would it be equivalent to google's routine soft-404 requests)? It's got no resemblance to the site-verification file* they use in their own WMT.

As noted in other threads, yandex does have a habit of doing weird and inexplicable things. But at least it doesn't prompt the kind of jaw-clenching terror you get when Google does something new.


* /yandex_\h{16}\.txt

keyplyr

9:47 pm on Dec 27, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I haven't noticed anything exceptional from Yandexbot. They've always had long-tail referrers at times. I attribute it to a unique data base structure.

In general though, non-existent file requests have been increasing on the sites I watch. It used to be Bing that had amnesia, but the other SEs slowly started requesting them as well; now to the point of normality.

keyplyr

6:06 am on Dec 30, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



BTW - thanks for reminding me Yandex has a WMT. I had forgetten and hadn't got around to opening an account.

Cool... new tools to play with!

keyplyr

12:52 pm on Jan 3, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Their robots.txt checker doesn't like my wildcard (*) syntax. I pretty much knew the other SEs besides Google would have problems with it, but had forgot to give it further thought. I ended up reverting back to standards (if there is such a thing for robots.txt.)

And they keep suggesting I choose a Host: example.com site as default to my mirrors... mirrors? Do they know something I don't?