Forum Moderators: martinibuster

Message Too Old, No Replies

Is my robot.txt ok?

         

ariel1238a

6:30 pm on Apr 7, 2018 (gmt 0)

5+ Year Member



I am using Blogger and in my Robot.txt file these lines appear is it ok?
---
User-agent: Mediapartners-Google
Disallow:

User-agent: *
Disallow:
Allow: /
---
Thanks

keyplyr

7:06 pm on Apr 7, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What is it you are trying to accomplish by using that file?

Currently it is not really doing anything.

ariel1238a

7:12 pm on Apr 7, 2018 (gmt 0)

5+ Year Member



Sorry my English not perfect..
I am want to allow Adsense to crawl my whole site, i mean that i dont want Adsense to block any URL
Is it ok?

keyplyr

7:33 pm on Apr 7, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, this robots.txt does not block Mediapartners-Google (the Adsense bot)

ariel1238a

8:43 pm on Apr 7, 2018 (gmt 0)

5+ Year Member



Thanks

azlinda

11:57 pm on Apr 7, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Does anyone even need a robots.txt file if they're not blocking access to anything?

keyplyr

12:24 am on Apr 8, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@azlinda - robots.txt does not block anything. You can request some files not to be indexed, but that is not blocking.

Only a few of the active bots even support robots.txt. The major search engines support robots.txt directives... Google, Bing, Yandex, DuckDuckGo, and couple others, but most bots do not even request this file. A few bots request it, but disobey it. A few others use it to find out where you don't want them to go, then they go there.

The robots.txt file never did become a standard. It tried to be, but there are too many interpretation differences. Some bots support wild card (*) and others do not. Some support cross-domain, others do not, etc.

tangor

12:33 am on Apr 8, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



robots.txt is valid for bots that respect it. Otherwise it is ignored by bad bots (who may or may not actually be "bad").

It is good form, as a webmaster, to have one, even if it is blank.

azlinda

1:42 am on Apr 8, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@Keyplyr - Sorry, I misspoke. I meant to say if they have nothing that they don't want indexed. Thanks for your response. It was helpful.

lucy24

6:18 am on Apr 8, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



if they have nothing that they don't want indexed
It's important to remember that a robots.txt Disallow--which prevents compliant robots such as search engines from crawling a page--has nothing to do with Noindex. Pages that have never been crawled are still theoretically in the index, because the search engine has seen links to the pages. The only way for a search engine to see a "noindex" instruction (whether in a header or an on-page meta) is to let it crawl.

robzilla

11:53 am on Apr 8, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



User-agent: Mediapartners-Google 
Disallow:

User-agent: *
Disallow:
Allow: /

Strictly speaking, this is pointless. You're telling the Mediapartners-Google bot that it's allowed to crawl the whole site, and then you're declaring that every bot is allowed to do so -- which obviously also includes Mediapartners-Google. So you might as well declare only this:

User-agent: *
Disallow:

(Note that the "Allow" directive is not part of the robots.txt standard, but some bots like Googlebot and bingbot do support it.)

Or this:
 

(i.e. an empty robots.txt file)

To be fair, your robots.txt works equally well, and I see Blogger adds in the Mediapartners-Google line by default, presumably because they don't any of the restrictions that may apply to other bots in succeeding lines restrict the movements of the Mediapartners-Google bot.

super70s

6:16 am on Apr 16, 2018 (gmt 0)

10+ Year Member Top Contributors Of The Month



My robots.txt looks like robzilla's second example but I also have my sitemap's URL on the third line, does anyone else do this? I must have read this was a good idea somewhere along the way.

robzilla

7:43 am on Apr 16, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It's one of the ways [sitemaps.org] you can tell search engines about your sitemap(s), so that's perfectly fine.

lucy24

10:05 pm on Apr 16, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I must have read this was a good idea somewhere along the way.
Google--to name but one search engine, selected wholly at random--recognizes the Sitemap: line.

not2easy

4:06 am on Apr 21, 2018 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



The question about What Makes a Good robots.txt File was split off to its own discussion, it can be found here: [webmasterworld.com...]