Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google Indexed My Site Search Results Pages

         

bouncybunny

11:44 am on Feb 26, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have a site search engine script (one of those nice cheap and cheerful perl scripts that indexes site text and allows users to search your site).

On a whim I used Google's siteseatch feature – "site:example.com..." – to see how it responded. The result was that Google appeared to have been performing keyword searches using the search form and indexing the results.

The reason I know that Google is 'performing keyword searches' is that the URLs generally display the searched keyword in them. For example, a search for 'widgets' would look as follows;

www.example.com/cgi-bin/searchdirectory/searchscript.cgi?keyword=widgets
And these are the URLs that are in Google's index - hundreds of them.

Now, I haven't checked my user stats to see if many of these 'pages' have ever been returned in Google's SERPs and brought any visitors. But I doubt it, as they all have the same title tag and description and the results would be fairly short snippets of text, based on the results.

My main question is, firstly, how is Google doing this? Most of the 'URLs' indexed are ones using keywords that are very relevant to my site and the only way to view these URLs is to use the search form.

And secondly, should I be excluding Googlebot from indexing these 'pages' with a robots file? Is it giving a benefit to my site? And could it be doing any potential harm in the future?

Any thoughts?

tedster

3:21 pm on Feb 26, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Your observation parallels another current thread:

Google indexing large volumes of (unlinked?) dynamic pages [webmasterworld.com]

It's not yet clear whether this originates with Google, or perhaps with a third party who makes those search result URLs available on some page somewhere.

In answer to your second question, my practice is to excluded Google from site search results pages, usually with robots.txt. As you said, "the results would be fairly short snippets of text" and not likely to be of value to a searcher.

It just feels wrong to me to let Google be flooded with that kind of url - I'd rather see the source pages referred to on those search results given the traffic. Also, if Google starts getting a whole host of "no results" pages for a site, I think that could be a problem.

Receptional Andy

4:32 pm on Feb 26, 2008 (gmt 0)



At a guess bouncybunny, I'd say the results were all for single words, and that they exist somewhere on your site? (As per the thread linked by Tedster)

Mods: I don't know if some thread joining might be in order?

WiseWebDude

4:42 pm on Feb 26, 2008 (gmt 0)

10+ Year Member



Disallow: /searchscript.cgi

in your robots.txt and fast...

bouncybunny

10:57 pm on Feb 26, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for the comments, I'l get onto the existing discussion.

Indeed, this might be better merged.

In answer to your second question, my practice is to excluded Google from site search results pages, usually with robots.txt. As you said, "the results would be fairly short snippets of text" and not likely to be of value to a searcher.

Interestingly, this does ring true for me in that a number of results for recent searches I carried out have pointed me towards results pages on other web sites. So perhaps this is a new thing.

In practice this wasn't a completely useless user experience for me, as I then had a choice of a few places to go (I was on the tech support pages on IT company websites).

But I think I will exclude this in robots.txt. However, I will still allow the adsense robot access, as this appears to be returning very relevant results and so I can't see any harm in allowing this. Any thoughts?

bouncybunny

11:07 pm on Feb 26, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



At a guess bouncybunny, I'd say the results were all for single words, and that they exist somewhere on your site? (As per the thread linked by Tedster)

Not only do they exist, they are 95% spot on results for important keywords in the section that my search engine deals with (the search script only deals with a specific knowledge-base section of the site). From that point of view, what Google has done is brilliantly targeted. In fact, of the several dozen results that I have looked at, only one keyword has been off topic.

What appears to be more brilliant, is that it is not only obvious keywords (which Google will know from the general subject of my site), but really obscure technical (but extremely relevant) keywords from my knowledge base.

I'm actually quite bowled over by the whole thing. It almost seems a shame to exclude the robot.