Forum Moderators: goodroi

Message Too Old, No Replies

What Makes a Good robots.txt File?

         

Broaster

4:13 am on Apr 17, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



Im curious on what is a good robots.txt file

I always hear it can make or break your traffic or crawling time




[edited by: not2easy at 4:03 am (utc) on Apr 21, 2018]
[edit reason] cleanup [/edit]

lucy24

5:04 am on Apr 17, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I always hear
You are listening to the wrong people.

That is: obviously an inappropriate robots.txt can hurt your traffic... if it says in full
User-Agent: *
Disallow: /
and you were hoping from search-engine referrals. But I suspect that isn't what your sources meant.

Google--specifically--ignores the “Crawl-Delay” directive, if that's what you meant by “crawling time”. You have to set it in WMT/GSC instead, or trust them to pace themselves appropriately.

But what does any of this have to do with adsense? This thread had strayed somewhat from the original question, which was about allowing the adsensebot to do its thing.

Broaster

12:06 am on Apr 21, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



lucy24 are you replying to me?

What is a good robots.txt file? I keep getting different answers some say keep it simple and then some others will have a long list on their file.

I don't know I never asked about Adsense in this thread.

lucy24

1:48 am on Apr 21, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I never asked about Adsense
Well, you’re not OP. If you've got general questions about robots.txt, those are most appriately asked in the robots.txt subforum [webmasterworld.com].

A good robots.txt file is one that does what you want it to do. The reason the question gets different answers is that “what you want it to do” varies from one site to another.

The robots.txt file for my test site actually says, in full, what I proposed as a bad example above:
User-Agent: *
Disallow: /
For that specific site, it is a good robots.txt because it keeps law-abiding robots--notably search engines--from making further requests. (The only thing better than a blocked request is a request that isn't made in the first place.) Non-law-abiding robots are excluded by other means.

:: wandering off in search of someone with scissors ::

keyplyr

4:32 am on Apr 21, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Only a few of the active bots even support robots.txt. The major search engines support robots.txt directives... Google, Bing, Yandex, DuckDuckGo, and couple others, but most bots do not even request this file. A few bots request it, but disobey it. A few others use it to find out where you don't want them to go, then they go there.

The robots.txt file never did become a standard. It tried to be, but there are too many interpretation differences. Some bots support wild card (*) and others do not. Some support cross-domain, others do not, etc.

Broaster

4:47 am on Apr 21, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



I just have a wordpress blog and im scared to tamper with the robots.txt so im looking for what would be a decent simple file set up for someone who just blogs about sports for a news type website.

keyplyr

4:50 am on Apr 21, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Broaster - a simple file that disallows no one looks like this:
User-Agent: *
Disallow:
And has nothing else (or you could just leave it totally blank.)

You don't even need a robots.txt :)

lucy24

6:25 am on Apr 21, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



im scared to tamper with the robots.txt
What does it currently say? Does it even exist? Does WP create a robots.txt by default? This strikes me as a pretty pointless venture, since the disallowed directories aren't normally linked from anywhere--meaning that robots wouldn't even know about them unless you draw their attention--and malign robots certainly don't pay attention to robots.txt. You might think they'd at least use it to learn the exact names of disallowed directories ... but judging by the requests I receive every day on non-WP sites, robots seem to find it easier just to barge in and ask for everything regardless.

Travis

8:51 am on Apr 21, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



What Makes a Good robots.txt File?

First of all, a "good" robots.txt file; only works with "good" robots. As said, lot are simply ignoring it.

That being said, a good robots.txt file :

- doesn't (accidentally) block legitimate bots from indexing the content you want to be indexed.
- prevent bots from wasting your "craw budget" by visiting resources which are not intend to be indexed.
- has to be simple, because too complex rules might not be understood correctly by robots. (and it's easy to make mistake while writing complex rules)

keyplyr

9:01 am on Apr 21, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



im looking for what would be a decent simple file...
He's asking what time it is and you're all explaining how to build a clock.

Travis

10:35 am on Apr 21, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



Sorry if I misunderstood the question of the OP.

I thought it was the same as when someone asks which bots / IP to block, and instead of giving a list, you answer that one needs to do research, so I was just giving direction to research to write a good robots.txt . Sorry.

lucy24

5:30 pm on Apr 21, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



He's asking what time it is and you're all explaining how to build a clock.
Well, if he wants a prefab, one-size-fits-all robots.txt (Broaster? still with us?) he'll need to stroll over to That Other Forum ... where, instead of receiving the “We don’t write your code for you” lecture, he’ll receive the equally exasperating “This question has already been asked and answered” lecture.

Editorial comment: Benign, well-intentioned robots who ignore robots.txt can pretty well be counted on your fingers. So yes, robots.txt remains an effective way to separate the sheep from the goats. Or vice versa, depending on your personal ovicaprid preference.

Martin Potter

6:22 pm on Apr 21, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



Broaster, FWIW, here is my robots.txt file :

User-agent:SemrushBot
User-agent:Wotbox
User-agent:Baiduspider
User-Agent:Crawler
Disallow: /

User-agent: *
Crawl-delay: 2
Disallow: /data/

Sitemap: https://example.com/sitemap.xml

Not sure why I added Wotbox, maybe something I read about it. And I disallowed my /data/ directory because that's where I keep graphics for my project pages on another website (just easier this way).

Broaster

9:20 pm on Apr 23, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



Lucy this is my current robots

User-agent: *
Disallow: /suggest/?*

Sitemap: http:// www. NNNNNNNNN .com/sitemapindex.xml