Robots.txt and Google

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Robots.txt and Google

realmaverick

1:17 pm on Jan 10, 2012 (gmt 0)

I've just noticed over 100,000 http errors in WMT.

They are coming from url's such as these:

http://www.example.com/index.php?app=core&module=reports&rcom=uploads&id=30786

These url's were obviously for users to report content, we've since removed this ability for guests. However, Google is still visiting them daily.

First question is, would over 100,000 of these errors effect quality score?

Second question is, via robots.txt, how do I deny these URL's, seeing as they're not in a directory.

Could I for example;

Disallow: http://www.example.com/index.php?app=core&module=reports

Would this catch them all or is there a way to use this as a wildcard? or is it not possible via robots.txt?

Thanks a lot.

netmeg

5:07 pm on Jan 10, 2012 (gmt 0)

Do you have any reason not to block URLs with ? in them?

User-agent: *
Disallow: /*?

Marketing Guy

5:14 pm on Jan 10, 2012 (gmt 0)

You can dictate how Google manages these via webmaster tools > URL parameters. Just select "narrows content" and "no URLs" (and double check the examples won't exclude URLs you might want to keep).

Had a similar issue with an ecom client site I just took on. 2k page site with 50k pages indexed. :-/ Can't say that removing these URLs had any noticable impact so I wouldn't worry about it too much. Google seems smart enough not to count the URLs in any significant way.