Welcome to WebmasterWorld Guest from

Forum Moderators: goodroi

Message Too Old, No Replies

GoogleBot muxing URLS and I want to block it

12:25 pm on Jun 30, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 3, 2002
votes: 0

I am having a strange problem where Googlebot has found URLs that don't exist yet they serve 200. I would like to block all of these urls on the site. This has caused over 5000 pages of dupe content. I have not been able to find where the urls are coming from, whether malicious or not, and would like to make sure I am getting the synatx right before I add this to the robots.txt file.

All good urls should contain a ?_function=(whatever) in the beginning of the query string like this...


Googlebot (or some other entity) is finding query strings like this...


...and other variations of this BUT all the muxed variation start with ?ForumMasterThreads_uid1=(n)

I would like to disallow all of these but need to make sure to allow all others. Is this the correct syntax and would this work?

User-agent: Googlebot
Disallow: /*?ForumMasterThreads_uid1

By the way, I checked, double checked, triple checked and yes... quadriple checked my code and there are no pages that generate these URLs. Any help would be appreciated.

4:25 pm on June 30, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 27, 2001
votes: 0

I think you'll have to do this at the .htaccess level. Rewrite the request to a 404 if it has the strange parameter. For query string examples see:


6:16 pm on June 30, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 3, 2002
votes: 0

Well that's fine for apache, etc... what about windoze? I can achieve the same thing using the robots text, can I not? I do not want any queries going to...


6:31 pm on June 30, 2008 (gmt 0)

Senior Member

joined:Jan 27, 2003
votes: 0

Your disallow line is syntactically correct and will work for Googlebot: asterisk is a wildcard, a question mark is treated as a character rather than an operator, robots exclusion is prefix-matching.

Note, though that parameter order in a url is not normally significant. This URL would work in most instances:


So, as long as you're sure you aren't linking to parameters in different orders things should be OK.

Incidentally, there's a chance Google is creating these URLs as a result its 'form crawling' behaviour [webmasterworld.com].

5:25 pm on July 1, 2008 (gmt 0)

New User

5+ Year Member

joined:June 25, 2008
posts: 2
votes: 0

With Windows you can log into the IIS manager and manually click on each file and re-direct it. I just did this today for a bunch of old .html files that we had which are now being re-directed to .asp files. I checked with Google and they had duplicate content. Over time, all of the .html's will switch over to .asp's in their index once Google picks up on all of the 301 re-directs.
2:29 pm on July 11, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
votes: 0

The robots.txt rule will block spidering, but the URLs may well hang around in Google SERPs as URL-only listings for a very long time after that.

You are better off in the long run to add a redirect, but the robots rule is a good thing to start off with.


Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members