homepage Welcome to WebmasterWorld Guest from 54.211.113.223
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / Blogging SEO Blog Administration
Forum Library, Charter, Moderator: open

Blogging SEO Blog Administration Forum

    
Disallow Robot.txt
jshpik1

5+ Year Member



 
Msg#: 4334375 posted 10:36 am on Jul 3, 2011 (gmt 0)

Hello,

Somehow Google has found URL's that I can't seemed to find the link for on the site but gives a 404. How would I Disallow this:

/c/clan-of-xymox/agonised-by-love/
/c/clarks/untitled-1/
/c/clay-aiken/because-you-loved-me/

In these examples my domain is before the c portion. It should disallow all links with this format. All of them seem to be from the category "C" but that page should not exist in that format. For example these should also be disallowed:

/a/clan-of-xymox/agonised-by-love/

All I can figure is that Google picked these up when I was playing with pretty links a very long time ago... Thanks!

 

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4334375 posted 11:22 am on Jul 3, 2011 (gmt 0)

Go into Google WMTs and have the C and A directories removed from the index permanently which should stop the crawling. Blocking it in robots.txt is just a easy, block it as a path of 'c/' and 'a/' unless you use those for anything else.

jshpik1

5+ Year Member



 
Msg#: 4334375 posted 11:33 am on Jul 3, 2011 (gmt 0)

I don't want to block c or a, I want to block posts with that URL layout. The example post looks like this:

/agonised-by-love/

Not this:

/c/clan-of-xymox/agonised-by-love/

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4334375 posted 4:25 pm on Jul 3, 2011 (gmt 0)

If those URLs rank, I would probably (at least initially) redirect those requests so you retain the traffic.

RewriteRule ^c/([^/]+)/([^/]+)/$ http://www.example.com/$2/ [R=301,L]

Once the old URLs are deindexed, I might simply block them using the 410 code.

RewriteRule ^c/([^/]+)/([^/]+)/$ - [G]
zabalex



 
Msg#: 4334375 posted 7:18 pm on Oct 31, 2011 (gmt 0)

Hi,
its not going to work blocking the urls or directories which do not present on your server. If they are not there, which you are going to block? The googlebot is picking it from somewhere else and not from your root directory. You can control your own server but not of others. You may have submitted an article that contains those urls as anchor link (of course by article directories' programming error), now you need to find out the source and correct the links to point to the valid url. In the meantime you can do a 301 redirect to some page on your website to avoid 404 error.

regards
zabalex

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Blogging SEO Blog Administration
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved