Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Googlebot trying to index comment post forms

         

catch2948

4:02 pm on Mar 20, 2006 (gmt 0)

10+ Year Member



I have a website, which is powered in part by visitor comments on various articles and tutorials making up the site.

Over the past 2 days, I have noticed that Googlebot is going directly after comment forms, in the following manner:

GET /widgets/widgetcomment.php?id=abc
GET /widgets/widgetcomment.php?id=abc
GET /widgets/widgetcomment.php?id=abc
...

with "abc" being a number (different in each case), denoting the article, or page id #.

Please notice from above that each request is for a specific comment, and is being requested in no given order.

While this in itself seems strange enough to me, the part that is even more strange is that the content (from which each comment form is linked) is entirely new, and has not been visited by Googlebot yet. So how is Googlebot getting all of these urls, if it hasn't visited the pages they are linked from, and why?

As each comment page that has been requested is only a form to enter a comment on a specific page, there is no real content on the page. Just a web form for comment entry. I am concerned about Google thinking that these are "spam" pages, or doorway pages of some sort.

Should I ban Googlebot from spidering these urls, and how would I go about finding out how they got these urls in the first place, without spidering the pages they came from. Or should I just not worry about it?

Thanks in advance ...

moltar

2:17 am on Mar 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Definetely deny all bots via robots.txt to those pages.

/widgets/widgetcomment.php?id=

should do the trick.

As for how G found those pages - are you sure it's a real G bot? Check the IP in the whois to make sure. It could be a spammer bot disguising as G bot.

catch2948

2:59 am on Mar 21, 2006 (gmt 0)

10+ Year Member



Already confirmed that it is definitely Googlebot. Will setup the exclusion in robots.txt ...

Thanks :-)