Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google indexes 1000s of my pages by searching my form

         

whatson

4:52 am on Sep 14, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



How do you stop Google from filling in a booking form, and indexing pages with all sorts of dates and combinations, but the exact same content?

lucy24

6:45 am on Sep 14, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



By going to the section of GWT where you tell them to ignore certain parameters.

:: avoiding the question of whether g### is personally filling in forms with its little robotic hands, or simply indexing whatever version of the page happened to be "up" when they stopped by ::

scooterdude

7:24 am on Sep 14, 2011 (gmt 0)

10+ Year Member



you could put robots noindex on the form

dstiles

7:26 pm on Sep 14, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



if real-browser then
show form on page
end if

Works for me. That way you catch bad SEs and bad bots as well.

walkman

8:39 pm on Sep 14, 2011 (gmt 0)



If the page is unique and don't use it you may also want to add a noindex nofollow to it. For example some do that to search.php, just in case so search?term=yahoo is not indexed.

Simsi

9:23 pm on Sep 14, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Not sure if this will work yet but I just spotted the same issue so I used the canonical tag to point it at the undecrorated URL.

whatson

7:21 am on Sep 15, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It creates new pages every day, and all of the urls are different.
Same with drop down boxes, e.g. results per page 10, 20, 50, etc. it makes urls for all of them.

tedster

7:26 am on Sep 15, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google has been doing that for several years - and they get more adventurous all the time, taking on more complex form inputs. It's really important, in my opinion, to block those URLs from being crawled, not just from being indexed. The robots.txt Disallow rule you need should be relatively simple unless you have many forms.

jinxed

8:04 am on Sep 15, 2011 (gmt 0)

10+ Year Member



I just add a canonical tag to the form index, never had any issues...

deadsea

10:11 am on Sep 15, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



put the form in robots.txt

Disallow: /bookingform.php

Sgt_Kickaxe

10:30 am on Sep 15, 2011 (gmt 0)



robots.txt is useless for blocking googlebot in some instances, for example Google is ignoring robots.txt on pages with their +1 button, just something to keep in mind.

whatson

11:16 am on Sep 15, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Could all these duplicates be a Panda offense by any chance?

jinxed

11:32 am on Sep 15, 2011 (gmt 0)

10+ Year Member



Well either way, its not helping that they are being indexed.

azeemtamboli

3:31 pm on Sep 15, 2011 (gmt 0)

10+ Year Member



you can generate robots.txt file for Google webmaster tool and can gave
Disallow: /bookingform.php condition in that file and then you have to upload that file on your server.

Robert Charlton

7:01 pm on Sep 15, 2011 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



robots.txt is useless for blocking googlebot in some instances, for example Google is ignoring robots.txt on pages with their +1 button, just something to keep in mind.

Sgt_Kickaxe - Not exactly. There's a fine point in the example you're citing, and it's important to keep it in mind. There's enough misinformation floating around as is. This was the discussion where the +1 button issue came up...

Googlebot getting caught in robots.txt spider trap
http://www.webmasterworld.com/google/4346138.htm [webmasterworld.com]

This was the clarification posted. I'm adding some emphasis this time....
Googler, Jenny Murphy, provides a quick answer:
The +1 Button interacts with robots.txt and other crawler directives in an interesting way. Since +1's can only be applied to public pages, we may visit your page at the time the +1 Button is clicked to verify that it is indeed public. This check ignores crawler directives. This does not, however, impact the behavior of Google web search crawlers and how they interact with your robots.txt file.

I feel that Google should pull back from indexing forms. That said, you would be amazed at how many academic sites there are out there, with often valuable content, that use form-driven navigation entirely.

But, regarding robots.txt, I don't think that Google is willfully disregarding robots.txt to unearth content we don't want shown. Just something to keep in mind.