Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Using Robots.txt to Block googlebot

         

denisl

1:05 pm on May 23, 2011 (gmt 0)

10+ Year Member Top Contributors Of The Month



In another thread here, there is ref to Matt Cutts saying not to use robots.txt to block bot access to dupes [youtube.com...]
This has me concerned.

On one of my sites, a few years ago, I had stupid url redirect script where it turned out that any url could be added as a parameter in the address bar to redirect.
When my vistor numbers shot up from a couple of thousand a day to 50 thousand, I found there were many links to that script of mine from forums, with (well you can guess the kind of stuff that was added as a parameter).
I immediately removed the script and got G to remove that page from their index. I also but a block on that old (and now non existant) url in robots.txt

However there are still many links out there to that url (google seems to keep finding more) and I get a number of people hittimg my site for that url (and getting a 404) every day.

What concerns me is that in WMT, Google shows 47,000 urls blocked by robots, on a site with under a thousand pages indexed. Where do I go from here. Should I remove the block and let google bot get a 404 with all that c#!p in the url.

rainborick

2:24 pm on May 23, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I wouldn't be concerned about the high number in the blocked URLs report. I would be concerned about (a) my users getting 404s, and (b) links to my site being wasted. If you can set up a 301 redirect for these malformed URLs, you'll be solving all of these issues at once. Just be sure to unblock these URLs in robots.txt if you do install the redirect. Good luck!

denisl

2:38 pm on May 23, 2011 (gmt 0)

10+ Year Member Top Contributors Of The Month



@rainborick
Thabk you but those links were from bad neighbourhoods - at least they were using my redirect script to redirect to a bas neighbourhood,

tedster

2:47 pm on May 23, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In Matt's video, his point is that using robots.txt should be the "last resort" - but he never says "don't use it."

In your case, the robots.txt disallow was put in place while googlebot still saw the URLs resolving, correct? If so, then googlebot never gets a chance to see that they are now 404 - and that is Matt's point.

I'd suggest giving the video another close listen or two. It's a pretty highly nuanced communication.