homepage Welcome to WebmasterWorld Guest from 54.211.95.201
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Blogging SEO Blog Administration
Forum Library, Charter, Moderator: open

Blogging SEO Blog Administration Forum

    
Disallow Robot.txt
jshpik1




msg:4334377
 10:36 am on Jul 3, 2011 (gmt 0)

Hello,

Somehow Google has found URL's that I can't seemed to find the link for on the site but gives a 404. How would I Disallow this:

/c/clan-of-xymox/agonised-by-love/
/c/clarks/untitled-1/
/c/clay-aiken/because-you-loved-me/

In these examples my domain is before the c portion. It should disallow all links with this format. All of them seem to be from the category "C" but that page should not exist in that format. For example these should also be disallowed:

/a/clan-of-xymox/agonised-by-love/

All I can figure is that Google picked these up when I was playing with pretty links a very long time ago... Thanks!

 

incrediBILL




msg:4334385
 11:22 am on Jul 3, 2011 (gmt 0)

Go into Google WMTs and have the C and A directories removed from the index permanently which should stop the crawling. Blocking it in robots.txt is just a easy, block it as a path of 'c/' and 'a/' unless you use those for anything else.

jshpik1




msg:4334390
 11:33 am on Jul 3, 2011 (gmt 0)

I don't want to block c or a, I want to block posts with that URL layout. The example post looks like this:

/agonised-by-love/

Not this:

/c/clan-of-xymox/agonised-by-love/

g1smd




msg:4334434
 4:25 pm on Jul 3, 2011 (gmt 0)

If those URLs rank, I would probably (at least initially) redirect those requests so you retain the traffic.

RewriteRule ^c/([^/]+)/([^/]+)/$ http://www.example.com/$2/ [R=301,L]

Once the old URLs are deindexed, I might simply block them using the 410 code.

RewriteRule ^c/([^/]+)/([^/]+)/$ - [G]
zabalex




msg:4381767
 7:18 pm on Oct 31, 2011 (gmt 0)

Hi,
its not going to work blocking the urls or directories which do not present on your server. If they are not there, which you are going to block? The googlebot is picking it from somewhere else and not from your root directory. You can control your own server but not of others. You may have submitted an article that contains those urls as anchor link (of course by article directories' programming error), now you need to find out the source and correct the links to point to the valid url. In the meantime you can do a 301 redirect to some page on your website to avoid 404 error.

regards
zabalex

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Blogging SEO Blog Administration
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved