Forum Moderators: goodroi

Message Too Old, No Replies

robots.txt vs index,follow

Who will win?

         

brakkar

10:39 am on Oct 24, 2004 (gmt 0)

10+ Year Member



Hello,
lets say I have a directory disalowed for all robots trough robots.txt : /cgi-bin/

Now, on my frontpage, which is not in /cgi-bin/ I have the "index,follow" tag, and a link to a page in the /cgi-bin/ directory.

We are in a conflicting situation there: robots.txt tels the robots not to go in /cgi-bin/, but the meta tags on the page says to follow all links.

So who will win there? What will the robots do?

Brakkar

g1smd

5:02 pm on Oct 24, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



They will follow links to places that you have said they can go, and will not follow links to places you said to ignore.

That said, the URL in /cgi-bin/ that you link to will probably be indexed as there is a link to it. It will be indexed without title and description (because you don't want the content indexed). They will still keep the URL-only result because they have seen a link that points to that place, and they keep a list of all links that they have seen whether they work or not. A further flag then says whether that URL should appear in results or not. (The same is true for duplicate content. Google has a list of all the pages but only show one page in results.)

I have a number of pages with a noindex meta tag on them, and if you do a search for part of the page URL then you can find that page in the index (without a title and description) but you cannot find the page in any normal search because nothing on the page has been indexed.

Lord Majestic

6:13 pm on Oct 24, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



So who will win there? What will the robots do?

IMO robots.txt has priority and meta tags will only take effect for a robot scanning that page only, ie if someone else links to the page which was also mentioned on one of your pages with "no follow" link, then AFAIK, its still okay to request that page unless robots.txt say otherwise.