Why does Google index search-results urls?

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Why does Google index search-results urls?

Aimee

3:14 am on Mar 22, 2006 (gmt 0)

I'm very surprised to see that googlebot crawl our site like this URL:

/DG/cecdg_do_search.php3?keywords=8810152342&wherekey=USER_ID&orderby=approve_date+desc

That's a search result page.

I don't know how googlebot find this url.

konrad

4:06 am on Mar 22, 2006 (gmt 0)

Yes! I'd like to know that too! The same happens to me.

frox

8:20 am on Mar 22, 2006 (gmt 0)

How many of these pages are there?

If it's just a few, random ones, then it might be that someone (maybe outside you site) is linking to that specific search result.

whatcartridge

8:54 am on Mar 22, 2006 (gmt 0)

Search engines find them in referrer logs too.

Aimee

9:01 am on Mar 22, 2006 (gmt 0)

How many of these pages are there?

About 85% or more.

Aimee

9:11 am on Mar 22, 2006 (gmt 0)

How many of these pages are there?

some number to be clear.
March 21 date.
Total googlebot crawl: 20,111
like above mentioned url: 19,943 almost 99%
March 20 date:
Total googlebot crawl: 21,851
like above mentioned url: 21,614 almost 98.9%

kaled

11:29 am on Mar 22, 2006 (gmt 0)

I don't know how googlebot find this url

That would be the Google Spybar.

If the PR tool is enabled, every page you visit is tracked. How Google use this data is questionable but they have this data nonetheless. This may be one of the reasons Google fought the DoJ so hard in court recently. If forced to hand over search records, they might be forced to hand over this data too - that would be even more contentious.

Incidentally, if you use the PR extension for Firefox, Google still gets the same data - this is probably the reason that Google tolerates its use.

Kaled.

zCat

11:36 am on Mar 22, 2006 (gmt 0)

If the PR tool is enabled, every page you visit is tracked.

This is why I have a seperate Firefox profile (with the official toolbar) which I only use on "special" occasions. (Also has the added benefit of sparing me from "green bar angst".

BillyS

11:56 am on Mar 22, 2006 (gmt 0)

>>That would be the Google Spybar.

This was a mistake I made early on. I kept that darn spybar on all the time when working on my site and testing pages. What a nightmare.

Now I just use the Firefox extention and only visit pages I want Google to know about.

Web_speed

12:07 pm on Mar 22, 2006 (gmt 0)

Sounds like the Google Spybar.

Aimee

12:46 am on Mar 23, 2006 (gmt 0)

Any more information?
Googe Spy bar? how it works?
I don't think some one would search the "keyword" in our site.

g1smd

12:53 am on Mar 23, 2006 (gmt 0)

Always put a <meta name="robots" content="noindex"> tag on every page of a site that is not supposed to be indexed, and then it never will be.

See also the related forums [webmasterworld.com] discussion.

tedster

1:00 am on Mar 23, 2006 (gmt 0)

I don't think some one would search the "keyword" in our site.

From monitoring site searches done on a few sites, it's clear to me that some users will search on anything at all, thinking that it's web search. Not all users are clear that the box will only search the site.

kaled

1:30 am on Mar 23, 2006 (gmt 0)

If all search results have the appearance of existing in a single directory (according to the url) then excluding it using robots.txt is perfectly viable. My robots.txt file excludes all bots from the cgi-bin - that's a pretty common strategy I think.

Kaled.

Aimee

1:33 am on Mar 23, 2006 (gmt 0)

The problem is all pages are value to be crawled by google but didn't, all of them, home page,sub-pages and article pages.
Googlebot crawl what? it crawl forum user's profile, the search result of forum user's post. that's all.

What I can do?

g1smd

1:40 am on Mar 23, 2006 (gmt 0)

See my post above for one solution to the problem.

abates

4:31 am on Mar 23, 2006 (gmt 0)

Does it not make sense to use method="post" instead of method="get" for search forms? That way the parameters will never appear in the URL and never appear in anyone's log files for Googlebot to find.

pageoneresults

4:37 am on Mar 23, 2006 (gmt 0)

Does it not make sense to use method="post" instead of method="get" for search forms?

It sure does and thanks for bringing it up. The above along with this (as mentioned above by g1smd)...

<meta name="robots" content="none">

...will keep your search results page out of the index. I've done it, and I've done it many times successfully.

There is one drawback, those search results cannot be bookmarked, emailed, etc. For the basic user anyway. ;)

And, if you really want to get granular with blocking the bots...

<meta name="googlebot" content="noindex, nofollow, noarchive">

<meta name="msnbot" content="noindex, nofollow">

In theory, and in practice, the standard...

<meta name="robots" content="none">

Should be sufficient to block all bots. There may be times though where you just want to block Googlebot and/or MSNBot.