Link rel to robots.txt

Forum Moderators: goodroi

Message Too Old, No Replies

Link rel to robots.txt

what's the link tag to pull robots.txt

0verdose

4:47 pm on Aug 21, 2013 (gmt 0)

Sorry peeps, this one should be a standard easy thing I've managed to forget and get confused with now I'm getting on a bit.

I'm linking to robots.txt with two options, and neither seemt to work.

I've tried both the below in the <head>:

<link rel="robots" href="robots.txt">

and

<link rel="robots" href="/robots.txt">

Is there something I'm missing?

not2easy

6:44 pm on Aug 21, 2013 (gmt 0)

robots.txt does not need any links, they know where to look for it. If your question is because of some problem in GWT, it is more likely due to incorrect format than robots not finding the file. If you are doing something completely different that somehow requires this unusual link in the <head>, then just ignore my response, I have never seen anything where a link like that would serve any purpose.

g1smd

6:44 pm on Aug 21, 2013 (gmt 0)

You don't need to link to the robots.txt file at all.

It must be placed in the root of the site. Bots only look there.

0verdose

7:28 pm on Aug 21, 2013 (gmt 0)

I see... in that case the contents of the file is:

User-agent: *
Disallow:
Disallow: /cgi-bin/

I'm just starting to check google analytics for the first time and it says my robots file has never been checked?

Maybe it just hasn't updated correctly and I need to be more patient? Just seem strange as rest of site has been crawled at least 3 times recently..

phranque

8:32 pm on Aug 21, 2013 (gmt 0)

your robots.txt file will not be tracked by analytics.
have you tested your robots.txt file in GWT?

lucy24

10:42 pm on Aug 21, 2013 (gmt 0)

Disallow:
Disallow: /cgi-bin/

What is this pair of lines intended to mean? Ordinarily you'd only have

Disallow:

(without argument) if you wanted to make it plain that robots are allowed to run wild wherever they like.

0verdose

11:57 am on Aug 22, 2013 (gmt 0)

the disallow bit wasn't so important, I just left it in to remind myself how to disallow certain parts of the site... I can remove it.

So if just reads:

User-agent: *
Disallow:

and its named robots.txt placed at www.domain.com/robots.txt it will be found regardless? There's no other factors?

I'm sure I checked it and it works, will investigate further and find more details about where I saw it was never crawled.

Thanks so much for your response..

jimbeetle

1:13 pm on Aug 22, 2013 (gmt 0)

Web Robots Pages [robotstxt.org] will explain everything you need to know about robots.txt and a bit more. It includes the basics plus links to how the SEs have extended the protocol.

I usually think it best practice to simply use the straight protocol -- none of the SE extensions -- because I can then assume that *most* compliant bots will obey the directives.