homepage Welcome to WebmasterWorld Guest from 54.243.17.133
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
allowing & disallowing (robots.txt)
mike2010




msg:4306094
 8:06 pm on Apr 29, 2011 (gmt 0)

I have most of my site SEO'd with SEO urls. but there are still a lot of links (internally) that go to /folder/file.php

so I have in my robots.txt -

Disallow: /folder/file.php


rightfully so, as a lot of that junk I don't need to be in Google.

but there are like 4 directories now that transform into SEO'd directories once the right parameter is reached. And i'd like those 4 directories to be allowed by Google

example :

/folder/file.php?config=special

goes to the new SEO'd directory once clicked on -

/folder/file/special/


but /folder/file.php?config=old

goes to another non-seo'd .php file that I don't need to be in Google.

So how do I nitpick here with the /folder/file.php to allow certain directories and disallow others ?

the weird part is, the new SEO'd directories will display ONLY after the /folder/file.php?config=special part is clicked on. (redirected rewrite)

any help is much appreciated.

 

mike2010




msg:4306343
 3:51 pm on Apr 30, 2011 (gmt 0)

nevermind.. I guess i'll just use

..<a href="signin.php" rel="nofollow">sign in</a>..

on the php links I wanna block..

I shoulda thought of this earlier. duh

mike2010




msg:4306350
 5:49 pm on Apr 30, 2011 (gmt 0)

I've got another question..

How could I tell my "no follows" are working..if I recently added new ones towards particular links?

And does "no following" a link eventually remove it from google if it was previously cached before ?

But i'm expecting a reply is going to stay I need to restrict in Robots.txt to pull them from the cache..

If so, then to cut it short, THIS is basically what I need done -

I would like to disallow this

/folder/file.php?config=&forecast=zandh

but allow this -

/folder/file.php?config=&forecast=zandh&state=in&zipcode=&country=us

could someone please tell me if this is in anyway possible ?

The disallowed file above goes to a non-seo'd page. But the allowed file above goes to a seo'd page.

lucy24




msg:4306429
 2:53 am on May 1, 2011 (gmt 0)

How could I tell my "no follows" are working..if I recently added new ones towards particular links?

Just wait for the next new robot to show up. If you've got a decently sized site, that should be, oh, within the next half-hour or so.

And does "no following" a link eventually remove it from google if it was previously cached before ?

But i'm expecting a reply is going to stay I need to restrict in Robots.txt to pull them from the cache.

Far as I can tell, once google knows a page exists, they'll keep crawling it forever. And once they know a page doesn't exist, they'll still keep crawling it forever just to make sure it doesn't rematerialize. (Analogy to the common cold presents itself.)

You'd think, wouldn't you, that after trying a page eight or ten times and not finding it, the computer would go back one step and check whether your internal links to the page are still there, or whether it's still on your sitemap, or...

mike2010




msg:4306950
 3:20 pm on May 2, 2011 (gmt 0)

but blocking it in Robots.txt IS suppose to tell google to remove it , if its already in their index, right ?

"no follow" i'm guessing is just a minor deterrent..to help content ratio on a particular page... ?

phranque




msg:4307133
 1:02 am on May 3, 2011 (gmt 0)

blocking a url with robots.txt is supposed to exclude the robot from crawling that url.
it doesn't prevent that url from appearing in the index, either as a url-only snippet or possibly with information obtained from sources other than the document at that url, such as the anchor text of an inbound link.
it is only about crawling, so there is no implied instruction for removal from the index.

also note that a robots.txt-excluded url that has a noindex meta robots tag or X-Robots header may possibly be indexed because the robots.txt exclusion prevent the noindex instruction from being requested.

the rel="nofollow" [google.com] anchor attribute is intended to prevent the transfer of PR and anchor text to the target url and will also drop the target url from the link graph unless the url is otherwise discovered such as through an xml sitemap or another anchor that is not nofollowed.

mike2010




msg:4307942
 6:13 pm on May 4, 2011 (gmt 0)

can I have the tag to remove pages from google then.

noindex ?

nice explanation btw.

phranque




msg:4308078
 11:38 pm on May 4, 2011 (gmt 0)

the best way to remove a url from the index is to allow crawling of that url and provide a noindex meta robots tag or an X-Robots-Tag HTTP header.
if the content for a url no longer exists, the best way to remove that url from the index is to allow crawling of that url and provide a 410 Gone status code response.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved