allowing & disallowing (robots.txt)

Forum Moderators: goodroi

Message Too Old, No Replies

allowing & disallowing (robots.txt)

mike2010

8:06 pm on Apr 29, 2011 (gmt 0)

I have most of my site SEO'd with SEO urls. but there are still a lot of links (internally) that go to /folder/file.php

so I have in my robots.txt -

Disallow: /folder/file.php

rightfully so, as a lot of that junk I don't need to be in Google.

but there are like 4 directories now that transform into SEO'd directories once the right parameter is reached. And i'd like those 4 directories to be allowed by Google

example :

/folder/file.php?config=special

goes to the new SEO'd directory once clicked on -

/folder/file/special/

but /folder/file.php?config=old

goes to another non-seo'd .php file that I don't need to be in Google.

So how do I nitpick here with the /folder/file.php to allow certain directories and disallow others ?

the weird part is, the new SEO'd directories will display ONLY after the /folder/file.php?config=special part is clicked on. (redirected rewrite)

any help is much appreciated.

mike2010

3:51 pm on Apr 30, 2011 (gmt 0)

nevermind.. I guess i'll just use

..<a href="signin.php" rel="nofollow">sign in</a>..

on the php links I wanna block..

I shoulda thought of this earlier. duh

mike2010

5:49 pm on Apr 30, 2011 (gmt 0)

I've got another question..

How could I tell my "no follows" are working..if I recently added new ones towards particular links?

And does "no following" a link eventually remove it from google if it was previously cached before ?

But i'm expecting a reply is going to stay I need to restrict in Robots.txt to pull them from the cache..

If so, then to cut it short, THIS is basically what I need done -

I would like to disallow this

/folder/file.php?config=&forecast=zandh

but allow this -

/folder/file.php?config=&forecast=zandh&state=in&zipcode=&country=us

could someone please tell me if this is in anyway possible ?

The disallowed file above goes to a non-seo'd page. But the allowed file above goes to a seo'd page.

lucy24

2:53 am on May 1, 2011 (gmt 0)

How could I tell my "no follows" are working..if I recently added new ones towards particular links?

Just wait for the next new robot to show up. If you've got a decently sized site, that should be, oh, within the next half-hour or so.

And does "no following" a link eventually remove it from google if it was previously cached before ?

But i'm expecting a reply is going to stay I need to restrict in Robots.txt to pull them from the cache.

Far as I can tell, once google knows a page exists, they'll keep crawling it forever. And once they know a page doesn't exist, they'll still keep crawling it forever just to make sure it doesn't rematerialize. (Analogy to the common cold presents itself.)

You'd think, wouldn't you, that after trying a page eight or ten times and not finding it, the computer would go back one step and check whether your internal links to the page are still there, or whether it's still on your sitemap, or...

mike2010

3:20 pm on May 2, 2011 (gmt 0)

but blocking it in Robots.txt IS suppose to tell google to remove it , if its already in their index, right ?

"no follow" i'm guessing is just a minor deterrent..to help content ratio on a particular page... ?

phranque

1:02 am on May 3, 2011 (gmt 0)

blocking a url with robots.txt is supposed to exclude the robot from crawling that url.
it doesn't prevent that url from appearing in the index, either as a url-only snippet or possibly with information obtained from sources other than the document at that url, such as the anchor text of an inbound link.
it is only about crawling, so there is no implied instruction for removal from the index.

also note that a robots.txt-excluded url that has a noindex meta robots tag or X-Robots header may possibly be indexed because the robots.txt exclusion prevent the noindex instruction from being requested.

the rel="nofollow" [google.com] anchor attribute is intended to prevent the transfer of PR and anchor text to the target url and will also drop the target url from the link graph unless the url is otherwise discovered such as through an xml sitemap or another anchor that is not nofollowed.

mike2010

6:13 pm on May 4, 2011 (gmt 0)

can I have the tag to remove pages from google then.

noindex ?

nice explanation btw.

phranque

11:38 pm on May 4, 2011 (gmt 0)

the best way to remove a url from the index is to allow crawling of that url and provide a noindex meta robots tag or an X-Robots-Tag HTTP header.
if the content for a url no longer exists, the best way to remove that url from the index is to allow crawling of that url and provide a 410 Gone status code response.