|Google Crawling Out of Referral logs?|
Google started indexing SERPS from our new site search engine. The question comes up, HOW are they getting those search links? These are full keyword searches as performed by users.
When someone searches Google/bing/yahoo/alltheweb and then comes to WebmasterWorld - we highlight the page with those kw's and we also print "try this search on webmasterworld" with a full keyword link to perform a search on our site search engine. So a raw http link is there on the screen, only when someone kicks out a referral from a search engine. How does that link get into GoogleBot?
The only way I can think of is if google is reading pages via the tool bar or via that google accelerator proxy? Or is this just a reconfirm that Google is crawling out of it's referral logs?
|brotherhood of LAN|
Perhaps you could try altering the order of the search GET variables to see which choice of events you think it is. Maybe too late now though?
via the spybar
Could it be a FireFox(safebrowsing) URL Data that is collected as well?
What if someone does a WebmasterWorld search, then links to one of the search results that includes the highlighted term in the URL? Is that the URL pattern you're seeing requested by Googlebot?
You might be able to find some of these links if you download and scan through the "Links to your site" from webmaster tools.
This could be a good case for the canonical tag.
Most logical answer for me is that it probably crawls the Web History that Google keeps from users logged into a G account.
> Could it be a FireFox(safebrowsing)
Might be Chrome safe browsing too.
Brett_Tabke, my understanding is No, it's not directly, I have it on one of our large sites, gbot is testing bogus query strings to see if they return a proper 404 error, in our case I managed to fix a major issues [snip potential security issue]
How gbot discovered it, as said above, it only needs one onpage link or bookmark the gtoolbar can follow to a search result page from a user, and that's passed for next crawling. Because it's a search string, gbot will test the bogus search request while they are at it.
Here are the steps:
User searches for say, ajax help, lands on a useless to Gbot link which is [webmasterworld.com...] as it's using POST not a GET form and all the rest of the search params are missing, so not crawlable, BUT on that page the search results returned, one of whom is:
Now that is bookmarkable and many will find it interesting enough to post on their sites. Gbot collects it and spiders it, AND finds inside the page the link for the other option which says search for this or that on WebmasterWorld, it was there few minutes ago, have you removed it Brett_Tabke, anyway that link is the one GBot tried as [snip] and also tries it as [snip].... or a similar request which it does not exist, though I believe ".../?terms=" structure existed but buggy probably that's why it is removed.
[edited by: Brett_Tabke at 4:45 am (utc) on Apr 22, 2010]
appreciate the thoughts dusky, but you are missing some stuff there. I not going to explain that here as it is a security issue you are putting us at risk for...
Either the toolbar or referral log is the general consensus.
MC Hammer - any comment?