If the form action starts http:// then I'd expect it to be crawled. At some point (maybe already?), other links may be followed too.
The question I have is: Why would G want to follow form links? Can't see it adding to the quality of its index, more probably degrading it. Seems like it would be simpler and better if G just ignored forms altogether.
|form action starts http:// |
Good point, but no it is a relative link.
|Why would G want to follow form links? |
That is what I am wondering, if these pages end up in the index then it really doesn't add anything useful - but I guess there may be cases where site navigation is done using this method.
Lots of scripts use default values, so even without a user input the output makes sense. If not, the useless page won't rank high and the quality of the index is not affected.
AFAIK Googlebot follows absolute and relative URLs in form action.
I've considered the value of looking at forms some time ago. Normally (best of the that class of sites) provides quality content! but not always, sometimes they lead to nothing more then PPC listing from a search engine or website made affilate program directery [spelling bad, websites bad].
My first thoughts when i read the thread title are that they are determining what class the page with the forms belongs too? The theme of that URL! But googlebot could also learn about other URLs by following and indexing pages those pages, instead of just checking them.
Very interesting I hope enough people post more information to follow this detail.
olias - I'm sorry, I don't want to offend you, but are you sure?
I had a shock a few weeks ago when I thought I had seen the same thing and even went so far as changing the form to a POST to see if googlebot would follow that too.
I had a random question appear on my site, with select or radio inputs for the answer. I saw in my logs a request that looked like this
pagename.php?question_id=7 coming from googlebot. I thought I was seeing something new until I realised that I had put a link at the bottom of the form saying 'click here to see results' and that googlebot was following that link. When a question is answered, the request actually looks like this
pagename.php?question_id=7&answer_id=15 because of a hidden variable containing the ID of the question that is being answered. With no question ID, I'd made the default action to be to display the previous answers to the questions.
Maybe something like this is happening for you too? If not then perhaps you want to try changing the form to a POST request to see if googlebot continues to crawl the form. Just out of interest...
Thanks for all your thoughts and ideas.
As an experiment I put the following form on one of my pages - but it was only served up to Googlebot, no actual browsers saw it. (I should add I am not normally into cloaking!)
|<form name="AForm" method="get" action="/spider-trap.php"> |
<select name="choice" size="1">
<option value="25">25. The Spider Trap</option>
<input type="submit" value="Result">
Sure enough a couple of days later /spider-trap.php?choice=25 was crawled by Googlebot.
For me that pretty much confirms it has effectively crawled the form by piecing together the fairly simple bits of information.
I'm not sure this is a good idea.
I offer a service that has a ludicrously simple sign-up procedure. You select your timezone (as an offset from GMT) from a drop-down box and hit "Go".
If I hadn't got the headsup on this, I'd have had Googlebot happily creating itself 25 odd accounts!
Like many sites, certain actions on my sites trigger email to be sent to both my and certain other addresses - again not something I want to be triggered automatically.
I'd be interested to hear opinions on this aswell. I guess the brains at the 'plex have thought about the consequences and decided to give the 'bot these skills, but i'd like to understand a bit more about their justification.
One thing googlebot might be looking for is 'quick link' select boxes... only real use I can think of... it could normally just parse those links from the form though.
I think it's a dumb idea too - more often than not, it will find no new links and just mess up people's questionnaires etc.
Olias - Did you notice anything different about the useragent version # or the request headers?