homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

Link rel to robots.txt
what's the link tag to pull robots.txt

 4:47 pm on Aug 21, 2013 (gmt 0)

Sorry peeps, this one should be a standard easy thing I've managed to forget and get confused with now I'm getting on a bit.

I'm linking to robots.txt with two options, and neither seemt to work.

I've tried both the below in the <head>:

<link rel="robots" href="robots.txt">


<link rel="robots" href="/robots.txt">

Is there something I'm missing?



 6:44 pm on Aug 21, 2013 (gmt 0)

robots.txt does not need any links, they know where to look for it. If your question is because of some problem in GWT, it is more likely due to incorrect format than robots not finding the file. If you are doing something completely different that somehow requires this unusual link in the <head>, then just ignore my response, I have never seen anything where a link like that would serve any purpose.


 6:44 pm on Aug 21, 2013 (gmt 0)

You don't need to link to the robots.txt file at all.

It must be placed in the root of the site. Bots only look there.


 7:28 pm on Aug 21, 2013 (gmt 0)

I see... in that case the contents of the file is:

User-agent: *
Disallow: /cgi-bin/

I'm just starting to check google analytics for the first time and it says my robots file has never been checked?

Maybe it just hasn't updated correctly and I need to be more patient? Just seem strange as rest of site has been crawled at least 3 times recently..


 8:32 pm on Aug 21, 2013 (gmt 0)

your robots.txt file will not be tracked by analytics.
have you tested your robots.txt file in GWT?


 10:42 pm on Aug 21, 2013 (gmt 0)

Disallow: /cgi-bin/

What is this pair of lines intended to mean? Ordinarily you'd only have


(without argument) if you wanted to make it plain that robots are allowed to run wild wherever they like.


 11:57 am on Aug 22, 2013 (gmt 0)

the disallow bit wasn't so important, I just left it in to remind myself how to disallow certain parts of the site... I can remove it.

So if just reads:

User-agent: *

and its named robots.txt placed at www.domain.com/robots.txt it will be found regardless? There's no other factors?

I'm sure I checked it and it works, will investigate further and find more details about where I saw it was never crawled.

Thanks so much for your response..


 1:13 pm on Aug 22, 2013 (gmt 0)

Web Robots Pages [robotstxt.org] will explain everything you need to know about robots.txt and a bit more. It includes the basics plus links to how the SEs have extended the protocol.

I usually think it best practice to simply use the straight protocol -- none of the SE extensions -- because I can then assume that *most* compliant bots will obey the directives.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved