homepage Welcome to WebmasterWorld Guest from 54.166.39.179
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

    
Googlebot crawling GET forms with one variable?
any better explainations welcome!
olias

10+ Year Member



 
Msg#: 16379 posted 5:59 pm on Aug 22, 2003 (gmt 0)

Last night I added a little quiz with 24 options to one of my sites, people have to submit their choice from a SELECT box.

Basically that makes for the most simple form possible, all of the possible result pages can easily be derived by taking the form action, select name and option values.

This morning I find that Googlebot has managed to crawl 17 of the results pages despite there being no other links to them, my first reaction was that it was probably the Toolbar visit - crawling thing that has been the subject of much debate - but on investigation I found at least one page that has been crawled before anyone had ever viewed it.

To my mind that rules out Toolbar theories and the usual Googleguy response about referal leakages, so I am left with the idea that the form has been crawled - does anyone have any better ideas?

 

ciml

WebmasterWorld Senior Member ciml us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 16379 posted 6:06 pm on Aug 22, 2003 (gmt 0)

If the form action starts http:// then I'd expect it to be crawled. At some point (maybe already?), other links may be followed too.

jimbeetle

WebmasterWorld Senior Member jimbeetle us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 16379 posted 6:12 pm on Aug 22, 2003 (gmt 0)

The question I have is: Why would G want to follow form links? Can't see it adding to the quality of its index, more probably degrading it. Seems like it would be simpler and better if G just ignored forms altogether.

olias

10+ Year Member



 
Msg#: 16379 posted 7:05 pm on Aug 22, 2003 (gmt 0)

form action starts http://

Good point, but no it is a relative link.

Why would G want to follow form links?

That is what I am wondering, if these pages end up in the index then it really doesn't add anything useful - but I guess there may be cases where site navigation is done using this method.

SebastianX

10+ Year Member



 
Msg#: 16379 posted 7:16 pm on Aug 22, 2003 (gmt 0)

Lots of scripts use default values, so even without a user input the output makes sense. If not, the useless page won't rank high and the quality of the index is not affected.
AFAIK Googlebot follows absolute and relative URLs in form action.

wasmith

10+ Year Member



 
Msg#: 16379 posted 6:14 am on Aug 23, 2003 (gmt 0)

I've considered the value of looking at forms some time ago. Normally (best of the that class of sites) provides quality content! but not always, sometimes they lead to nothing more then PPC listing from a search engine or website made affilate program directery [spelling bad, websites bad].

My first thoughts when i read the thread title are that they are determining what class the page with the forms belongs too? The theme of that URL! But googlebot could also learn about other URLs by following and indexing pages those pages, instead of just checking them.

Very interesting I hope enough people post more information to follow this detail.

sabai

10+ Year Member



 
Msg#: 16379 posted 7:22 am on Aug 23, 2003 (gmt 0)

olias - I'm sorry, I don't want to offend you, but are you sure?

I had a shock a few weeks ago when I thought I had seen the same thing and even went so far as changing the form to a POST to see if googlebot would follow that too.

I had a random question appear on my site, with select or radio inputs for the answer. I saw in my logs a request that looked like this pagename.php?question_id=7 coming from googlebot. I thought I was seeing something new until I realised that I had put a link at the bottom of the form saying 'click here to see results' and that googlebot was following that link. When a question is answered, the request actually looks like this pagename.php?question_id=7&answer_id=15 because of a hidden variable containing the ID of the question that is being answered. With no question ID, I'd made the default action to be to display the previous answers to the questions.

Maybe something like this is happening for you too? If not then perhaps you want to try changing the form to a POST request to see if googlebot continues to crawl the form. Just out of interest...

olias

10+ Year Member



 
Msg#: 16379 posted 6:16 pm on Aug 29, 2003 (gmt 0)

Thanks for all your thoughts and ideas.

As an experiment I put the following form on one of my pages - but it was only served up to Googlebot, no actual browsers saw it. (I should add I am not normally into cloaking!)

<form name="AForm" method="get" action="/spider-trap.php">
<select name="choice" size="1">
<option value="25">25. The Spider Trap</option>
</select>
<input type="submit" value="Result">
</form>

Sure enough a couple of days later /spider-trap.php?choice=25 was crawled by Googlebot.

For me that pretty much confirms it has effectively crawled the form by piecing together the fairly simple bits of information.

dmorison

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 16379 posted 6:26 pm on Aug 29, 2003 (gmt 0)

I'm not sure this is a good idea.

I offer a service that has a ludicrously simple sign-up procedure. You select your timezone (as an offset from GMT) from a drop-down box and hit "Go".

If I hadn't got the headsup on this, I'd have had Googlebot happily creating itself 25 odd accounts!

Like many sites, certain actions on my sites trigger email to be sent to both my and certain other addresses - again not something I want to be triggered automatically.

I'd be interested to hear opinions on this aswell. I guess the brains at the 'plex have thought about the consequences and decided to give the 'bot these skills, but i'd like to understand a bit more about their justification.

sabai

10+ Year Member



 
Msg#: 16379 posted 5:47 am on Aug 30, 2003 (gmt 0)

One thing googlebot might be looking for is 'quick link' select boxes... only real use I can think of... it could normally just parse those links from the form though.

I think it's a dumb idea too - more often than not, it will find no new links and just mess up people's questionnaires etc.

Olias - Did you notice anything different about the useragent version # or the request headers?

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved