allowing a page to be indexed but not what it links to ?

 12:18 pm on May 18, 2008 (gmt 0)

I'm not sure how this can be implemented (or if it even can).

The issue is that i have a table of content list that id like the search engines to index. But that table of content list links to around a 1000 assets, which all require you to login first.

So the crawlers are just seeing a login page for all of them.

How can i tell the crawlers not to follow the links, but just to index the TOC page.

There is no common folder name that i can dissallow in the robots.txt file.

Eg, here is an example of a link to an asset:

The only possible option is to add rel="nofollow" to the links in the TOC page, however thats just an indication of not to rank page, they will be included anyways. (info from google)

Any suggestions ?



 10:47 pm on May 18, 2008 (gmt 0)

On the page that has the links:

<meta name="robots" content="nofollow">


 10:22 am on May 19, 2008 (gmt 0)

g1smd>> read a bit about it and seems like what I need. However its still a indication for the search engines not to rank the links. Its not an indication for not to follow it (even though thats what the words says - go figure).

Nevertheless, its now added. Thanks !


 4:46 pm on May 19, 2008 (gmt 0)

Well, not exactly. Nofollow means don't follow. But that doesn't necessarily prevent the engines from indexing the nofollowed URL (i.e., the target URL), so they may retain a record of the target URL, even if they don't index the target page's contents.

What you can do if you don't want the engines to see and record the URL's of the PW protected pages is to link to those pages via redirects, and run the redirect through a robots.txt protected directory. That way engines never get to the target URL's at all (assuming that no other pages link directly to the PW protected pages).


 4:58 pm on May 19, 2008 (gmt 0)

caveman > that's actually a good idea...


 5:03 pm on May 19, 2008 (gmt 0)

Caveman, I was under the impression that Matt stated they would drop the link entirely from their "discovery".

"The nofollow attribute is just a mechanism that gives webmasters the ability to modify PageRank flow at link-level granularity. Plenty of other mechanisms would also work (e.g. a link through a page that is robot.txt'ed out), but nofollow on individual links is simpler for some folks to use. There's no stigma to using nofollow, even on your own internal links; for Google, nofollow'ed links are dropped out of our link graph; we don't even use such links for discovery. By the way, the nofollow meta tag does that same thing, but at a page level."


 5:34 pm on May 19, 2008 (gmt 0)

venti, I was responding to the nofollow meta suggestion being discussed in the two posts immediately above mine; not the nofollow tag at the link level. Also, I wasn't referring just to G. Regarding G, you are correct, that is what Matt said. Note that he essentially equated the suggestion of redirecting through a robots'txt'ed page/directory, with use of nofollow at the link level.


 6:44 pm on May 19, 2008 (gmt 0)

*** The nofollow attribute is... ***

I didn't discuss the nofollow attribute.

I was talking about the nofollow meta tag.

That's a completely different thing.


 2:15 pm on May 21, 2008 (gmt 0)

<meta name="robots" content="noindex"> is not ok?


 12:47 am on May 24, 2008 (gmt 0)

Yes it should.

