thirteen

msg:4317256 | 2:52 am on May 25, 2011 (gmt 0) |
How do you submit a sitemap to Yandex so their bots will know to crawl your site?
|
Leosghost

msg:4317257 | 3:00 am on May 25, 2011 (gmt 0) |
Don't worry ..they will crawl ..unless you block them deliberately.
|
lucy24

msg:4317261 | 3:17 am on May 25, 2011 (gmt 0) |
Where "block" means htaccess, because they do not obey robots.txt. They read it avidly-- this fooled me for a long time-- but then ignore it.
|
Staffa

msg:4317349 | 10:20 am on May 25, 2011 (gmt 0) |
I agree with lucy24 and on top of that Yandex also comes regularly as a "search referrer" with the question being my domain name, yeah right.
|
jecasc

msg:4317364 | 11:41 am on May 25, 2011 (gmt 0) |
Yeah right Yandex. But as long as your bot constantly triggers DoS protection on my server you remain blocked. What is the use of hitting the same webpage a dozen times anyway? I do not change my content every 2 seconds.
|
dstiles

msg:4317650 | 10:02 pm on May 25, 2011 (gmt 0) |
Lucy, I have allowed yandex bots on my sites for quite a while and they do not hit pages "blocked" by robots.txt - at least, I haven't seen it happen in my security logs. In all, my experience of yandex bots on validated rDNS is a good one. Yandex look-alikes - now that may be something else.
|
lucy24

msg:4317689 | 11:29 pm on May 25, 2011 (gmt 0) |
I never realized until I took a closer look that Yandex always uses the identical IP: 77.88.26.26 for the regular textbot, 95.108.158.238 for the imagebot. That makes it easier to check. On 10 May I closed off directory /bbb/ to robots. Directory /aaa/ has been closed since more or less the day of its creation (at least 5 years ago). Textbot's activities, omitting other directories: 11 May: robots.txt 12 May: robots.txt 13 May: /bbb/file, robots.txt, /bbb/file 14 May: robots.txt 15 May: /bbb/file, robots.txt, /bbb/file (these are all different files, I think, but I didn't look closer) At this point I locked out 77.88.26.26. Forgot about 95.108.158.238, so they stuck around a few days longer. Incidentally, it took about this long-- five days-- for the googlebot to stop crawling directory /bbb/ Yandex carries on, in spite of meeting a steady stream of 403 instead of 200: 16 May: /bbb/file, robots.txt 17 May: /bbb/file, robots.txt 18 May: /bbb/file, /aaa/file, robots.txt, /bbb/file 19 May: /bbb/file, robots.txt, /aaa/file 20 May: /bbb/file, robots.txt 21 May: /bbb/file, robots.txt, /bbb/file 22 May: robots.txt 23 May: robots.txt 24 May: robots.txt, /bbb/file, /bbb/file Two weeks seems an awfully long time for a robot to not get the message that a particular directory is off limits.
|
dstiles

msg:4318270 | 10:01 pm on May 26, 2011 (gmt 0) |
Yandex has a much wider range than two IPs. I block the image bot but let in lots of IPs for the text bot.
|
lucy24

msg:4318294 | 10:34 pm on May 26, 2011 (gmt 0) |
Oh, I know. My actual blocking has 77.88.0.0/18 and 95.108.128.0/17. It's only when I looked in my raw logs that I realized how consistent they are. Regional, maybe. My text editor speaks fluent RegEx so I told it to find ^.*?(77\.88\.|Yandex) in the raw logs. The Find All window gives the results in two different colors, making it very easy to eyeball. I'm glad I didn't constrain the search to /bbb/ or I would never have noticed they're still trying to get into /aaa/ even though it has always been roboted.
|
lucy24

msg:4355777 | 2:50 am on Aug 27, 2011 (gmt 0) |
Wow, didn't realize how old this thread was. Seem to have misplaced a few months. Around mid-august I decided to give the regular yandexbot (77.88 range) another shot. So far they have been behaving nicely. Then a couple of days ago for arcane technical reasons I had to un-Deny the imagebot (95.108 range). Could have re-blocked it via mod_rewrite but didn't have the energy. It took the imagebot about 12 hours to realize that it was no longer blocked-- and then it went absolutely hysterical with excitement. It's been picking up everything in sight at a blazing pace, sometimes as little as 1 minute, 15 seconds apart. (I didn't make up this number. 75 seconds seems to be its absolute speed limit. Most of the time it's more like three minutes between hits. Do you suppose Yandex is on dialup? :)) I've been keeping close track and so far the imagebot has only made one visit to a place it wasn't supposed to go. But I've decided to cut it some slack because the last time it tried to get that specific file, the directory wasn't yet roboted-out. So the url was already on the shopping list.
|
umairrockx

msg:4357768 | 9:35 pm on Sep 1, 2011 (gmt 0) |
Truth is that there is no one better than google. They just claim, if they would have been better than they would have been on TOP! but they are not. Google's algorithm is far more better than any other search engine.
|
Staffa

msg:4357797 | 11:23 pm on Sep 1, 2011 (gmt 0) |
@lucy24 Have you had any visitors from Yandex yet ? And if so, where were they from ? The reason I'm asking, I just went through a bunch of log files and Yandexbot visits often though is blocked for the moment. I am debating whether or not to let it in but my main blocking point is their location. To get the most from any of my sites the visitor needs a reasonable level of understanding English (a translate service won't do) and I don't want to get just the RU script kiddies as visitors.
|
lucy24

msg:4357818 | 1:06 am on Sep 2, 2011 (gmt 0) |
Have you had any visitors from Yandex yet ? And if so, where were they from ? |
| You mean human visitors coming from yandsearch? I've always had a scattering of those. :: shuffling papers :: Oh dear God. Someone in Kazakhstan wants to read Grandmother Puss? Untranslated, at that. I'm glad you asked, because it turns out I've missed a whole group of robots. Call them secondary robots: they go to yandex.ru and search for my domain name. If they get redirected (to www.), they hand off to a brother robot to pick up the page. In general it's something like Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) from assorted places, generally in the 95. area. Three or four times a month, front page only, no images, no skin off my nose. Scattered amongst Messrs. Roboto are a handful of bona fide human searches. I particularly like the one who asked for сусанна дура and was handed the Susanna Memorial Doorway. Wonder what they were really looking for? Not a hotlink, anyway; I eventually had to block image searching in this area because people kept grabbing names.
|
Staffa

msg:4357830 | 2:24 am on Sep 2, 2011 (gmt 0) |
Thank you Lucy24, that was enlightening. As I said higher up in this thread, I have also seen Yandex as referrer for a search for any of my domain names .... as if ;o) I'll just keep Yandex blocked from crawling since it doesn't seem worth it to let them in.
|
dstiles

msg:4358153 | 10:24 pm on Sep 2, 2011 (gmt 0) |
The point of the OP is that yandex is now available in English so anyone looking for an alternative to a certain other SE now has another choice. I saw yandexbot coming from Turkey this week and earlier in August I got a bot IP from USA. Hopefully, at present, they share all of the information collected, since by far the most yandexbot hits I get are from Russia and they have been crawling (very politely) for a long time.
|
Staffa

msg:4358159 | 10:51 pm on Sep 2, 2011 (gmt 0) |
Thank you dstiles, that's interesting. I'll keep my eyes open for the bot coming from an outside RU IP
|
|