Welcome to WebmasterWorld Guest from 54.167.22.37

Forum Moderators: open

Message Too Old, No Replies

Disallow Robot.txt

     

jshpik1

10:36 am on Jul 3, 2011 (gmt 0)

10+ Year Member



Hello,

Somehow Google has found URL's that I can't seemed to find the link for on the site but gives a 404. How would I Disallow this:

/c/clan-of-xymox/agonised-by-love/
/c/clarks/untitled-1/
/c/clay-aiken/because-you-loved-me/

In these examples my domain is before the c portion. It should disallow all links with this format. All of them seem to be from the category "C" but that page should not exist in that format. For example these should also be disallowed:

/a/clan-of-xymox/agonised-by-love/

All I can figure is that Google picked these up when I was playing with pretty links a very long time ago... Thanks!

incrediBILL

11:22 am on Jul 3, 2011 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Go into Google WMTs and have the C and A directories removed from the index permanently which should stop the crawling. Blocking it in robots.txt is just a easy, block it as a path of 'c/' and 'a/' unless you use those for anything else.

jshpik1

11:33 am on Jul 3, 2011 (gmt 0)

10+ Year Member



I don't want to block c or a, I want to block posts with that URL layout. The example post looks like this:

/agonised-by-love/

Not this:

/c/clan-of-xymox/agonised-by-love/

g1smd

4:25 pm on Jul 3, 2011 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



If those URLs rank, I would probably (at least initially) redirect those requests so you retain the traffic.

RewriteRule ^c/([^/]+)/([^/]+)/$ http://www.example.com/$2/ [R=301,L]


Once the old URLs are deindexed, I might simply block them using the 410 code.

RewriteRule ^c/([^/]+)/([^/]+)/$ - [G]

zabalex

7:18 pm on Oct 31, 2011 (gmt 0)



Hi,
its not going to work blocking the urls or directories which do not present on your server. If they are not there, which you are going to block? The googlebot is picking it from somewhere else and not from your root directory. You can control your own server but not of others. You may have submitted an article that contains those urls as anchor link (of course by article directories' programming error), now you need to find out the source and correct the links to point to the valid url. In the meantime you can do a 301 redirect to some page on your website to avoid 404 error.

regards
zabalex
 

Featured Threads

Hot Threads This Week

Hot Threads This Month