Welcome to WebmasterWorld Guest from 54.210.61.41

Forum Moderators: goodroi

Robots.txt code format.

     
1:55 pm on Jun 24, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 19, 2004
posts:951
votes: 12


So my robots.txt file starts with:

User-agent: *
Crawl-delay: 10

So I want to disallow Yandex bot. Is this the right code? :

====
User-agent: *
Crawl-delay: 10

User-agent: YandexDisallow: / # blocks access to the whole site
====
2:16 pm on June 24, 2018 (gmt 0)

Preferred Member

Top Contributors Of The Month

joined:Nov 13, 2016
posts:596
votes: 90


You are missing a line break after "Yandex", but I assume it's a typo.

[yandex.com...]

ps: hum, I might be wrong, apparently, we can do what you did, I didn't know that. sorry.
2:18 pm on June 24, 2018 (gmt 0)

Preferred Member

Top Contributors Of The Month

joined:Nov 13, 2016
posts:596
votes: 90


I am confusing myself, forget my comment.
2:22 pm on June 24, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 19, 2004
posts:951
votes: 12


Hmm, also it seems from google search that Yandex bot doesn't obey robots.txt ...
2:27 pm on June 24, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 19, 2004
posts:951
votes: 12


Yes I added the following to robots.txt :

User-agent: *
Crawl-delay: 10

User-agent: Yandex
Disallow: /
10:53 am on July 10, 2018 (gmt 0)

New User from IN 

joined:July 10, 2018
posts:2
votes: 1


If you want to block all the Yandex bots then:

User-agent: Yandex
Disallow: /
5:17 pm on July 10, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15450
votes: 739


it seems from google search that Yandex bot doesn't obey robots.txt
I think it had issues years ago, but currently itís compliant.

:: detour to check ::

Oh, cripes, what are they doing in a roboted-out directory? (No, not Yandex, someone else I'd authorized.)

:: wandering off to address unexpected and unrelated issue ::
8:36 pm on July 10, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:9056
votes: 752


@Shiv Bhan Singh

Happy to have you join Webmasterworld! Others will greet you as well, with a link to charter and all that happy stuff.

Brilliant in reminding all of us to check our robots.txt directives from time to time. :)
3:45 am on July 11, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 891


Hi Shiv Bhan Singh and welcome to WebmasterWorld [webmasterworld.com]

@born2run - Yandex does respect robots.txt, but does not support some directives that some other bots support.

This is the problem why robots.txt fails in much of what it was originally intended to do. It never did become a standard and is interpreted differently by different robots.

BTW - Why would anyone want to disallow one of the top search engines in the world? Yandex has offices in Silicon Valley close to Google, Yahoo & Facebook. They are a major player and contribute more that just search. They can be highly beneficial to websites. Open a Yandex Webmaster Tools [webmaster.yandex.com] account to manage your presence at Yandex and find out more about the way they support robots.txt.
10:39 am on Oct 10, 2018 (gmt 0)

New User

joined:Sept 26, 2018
posts:4
votes: 0


User-agent: *
Disallow: /framework/
Allow: /framework/admin-ajax.php

Can anyone verify this,

which i am using for<snip>

[edited by: engine at 10:48 am (utc) on Oct 10, 2018]
[edit reason] Please see WebmasterWorld TOS [/edit]

10:56 am on Oct 10, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 891


Sorry, that does not accomplush what you want.

When you Disallow: /framework/ you disalliw anything after that, so admin-ajax.php would also be disallowed.
5:12 pm on Oct 10, 2018 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4166
votes: 262


That is true for many bots, but not for Google. If you use the code as posted it can work for Googlebot. Once you Disallow the /framework/ directory, not all robots follow the Allow: permission, but Google does - as long as the Allow: follows the Disallow: directive.
7:24 pm on Oct 10, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 891


Good point, thanks for reminding me of that irregularity. It's things like this that sway me to use other tactics and avoid robots.txt.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members