Welcome to WebmasterWorld Guest from 54.159.50.111

Forum Moderators: goodroi

Message Too Old, No Replies

Link rel to robots.txt

what's the link tag to pull robots.txt

     
4:47 pm on Aug 21, 2013 (gmt 0)

New User

joined:Aug 19, 2013
posts: 4
votes: 0


Sorry peeps, this one should be a standard easy thing I've managed to forget and get confused with now I'm getting on a bit.

I'm linking to robots.txt with two options, and neither seemt to work.

I've tried both the below in the <head>:

<link rel="robots" href="robots.txt">

and

<link rel="robots" href="/robots.txt">

Is there something I'm missing?
6:44 pm on Aug 21, 2013 (gmt 0)

Moderator from US 

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:2572
votes: 48


robots.txt does not need any links, they know where to look for it. If your question is because of some problem in GWT, it is more likely due to incorrect format than robots not finding the file. If you are doing something completely different that somehow requires this unusual link in the <head>, then just ignore my response, I have never seen anything where a link like that would serve any purpose.
6:44 pm on Aug 21, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


You don't need to link to the robots.txt file at all.

It must be placed in the root of the site. Bots only look there.
7:28 pm on Aug 21, 2013 (gmt 0)

New User

joined:Aug 19, 2013
posts: 4
votes: 0


I see... in that case the contents of the file is:

User-agent: *
Disallow:
Disallow: /cgi-bin/

I'm just starting to check google analytics for the first time and it says my robots file has never been checked?

Maybe it just hasn't updated correctly and I need to be more patient? Just seem strange as rest of site has been crawled at least 3 times recently..
8:32 pm on Aug 21, 2013 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10544
votes: 8


your robots.txt file will not be tracked by analytics.
have you tested your robots.txt file in GWT?
10:42 pm on Aug 21, 2013 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month

joined:Apr 9, 2011
posts:12721
votes: 244


Disallow:
Disallow: /cgi-bin/

What is this pair of lines intended to mean? Ordinarily you'd only have

Disallow:

(without argument) if you wanted to make it plain that robots are allowed to run wild wherever they like.
11:57 am on Aug 22, 2013 (gmt 0)

New User

joined:Aug 19, 2013
posts: 4
votes: 0


the disallow bit wasn't so important, I just left it in to remind myself how to disallow certain parts of the site... I can remove it.

So if just reads:

User-agent: *
Disallow:

and its named robots.txt placed at www.domain.com/robots.txt it will be found regardless? There's no other factors?

I'm sure I checked it and it works, will investigate further and find more details about where I saw it was never crawled.

Thanks so much for your response..
1:13 pm on Aug 22, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member jimbeetle is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Oct 26, 2002
posts:3292
votes: 6


Web Robots Pages [robotstxt.org] will explain everything you need to know about robots.txt and a bit more. It includes the basics plus links to how the SEs have extended the protocol.

I usually think it best practice to simply use the straight protocol -- none of the SE extensions -- because I can then assume that *most* compliant bots will obey the directives.