Welcome to WebmasterWorld Guest from 54.145.246.183

Forum Moderators: goodroi

Link rel to robots.txt

what's the link tag to pull robots.txt

   
4:47 pm on Aug 21, 2013 (gmt 0)



Sorry peeps, this one should be a standard easy thing I've managed to forget and get confused with now I'm getting on a bit.

I'm linking to robots.txt with two options, and neither seemt to work.

I've tried both the below in the <head>:

<link rel="robots" href="robots.txt">

and

<link rel="robots" href="/robots.txt">

Is there something I'm missing?
6:44 pm on Aug 21, 2013 (gmt 0)

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



robots.txt does not need any links, they know where to look for it. If your question is because of some problem in GWT, it is more likely due to incorrect format than robots not finding the file. If you are doing something completely different that somehow requires this unusual link in the <head>, then just ignore my response, I have never seen anything where a link like that would serve any purpose.
6:44 pm on Aug 21, 2013 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



You don't need to link to the robots.txt file at all.

It must be placed in the root of the site. Bots only look there.
7:28 pm on Aug 21, 2013 (gmt 0)



I see... in that case the contents of the file is:

User-agent: *
Disallow:
Disallow: /cgi-bin/

I'm just starting to check google analytics for the first time and it says my robots file has never been checked?

Maybe it just hasn't updated correctly and I need to be more patient? Just seem strange as rest of site has been crawled at least 3 times recently..
8:32 pm on Aug 21, 2013 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



your robots.txt file will not be tracked by analytics.
have you tested your robots.txt file in GWT?
10:42 pm on Aug 21, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Disallow:
Disallow: /cgi-bin/

What is this pair of lines intended to mean? Ordinarily you'd only have

Disallow:

(without argument) if you wanted to make it plain that robots are allowed to run wild wherever they like.
11:57 am on Aug 22, 2013 (gmt 0)



the disallow bit wasn't so important, I just left it in to remind myself how to disallow certain parts of the site... I can remove it.

So if just reads:

User-agent: *
Disallow:

and its named robots.txt placed at www.domain.com/robots.txt it will be found regardless? There's no other factors?

I'm sure I checked it and it works, will investigate further and find more details about where I saw it was never crawled.

Thanks so much for your response..
1:13 pm on Aug 22, 2013 (gmt 0)

WebmasterWorld Senior Member jimbeetle is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Web Robots Pages [robotstxt.org] will explain everything you need to know about robots.txt and a bit more. It includes the basics plus links to how the SEs have extended the protocol.

I usually think it best practice to simply use the straight protocol -- none of the SE extensions -- because I can then assume that *most* compliant bots will obey the directives.
 

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month