Welcome to WebmasterWorld Guest from 54.166.152.121

Message Too Old, No Replies

Why does Google index search-results urls?

     
3:14 am on Mar 22, 2006 (gmt 0)

10+ Year Member



I'm very surprised to see that googlebot crawl our site like this URL:
/DG/cecdg_do_search.php3?keywords=8810152342&wherekey=USER_ID&orderby=approve_date+desc

That's a search result page.

I don't know how googlebot find this url.

4:06 am on Mar 22, 2006 (gmt 0)

10+ Year Member



Yes! I'd like to know that too! The same happens to me.
8:20 am on Mar 22, 2006 (gmt 0)

10+ Year Member



How many of these pages are there?

If it's just a few, random ones, then it might be that someone (maybe outside you site) is linking to that specific search result.

8:54 am on Mar 22, 2006 (gmt 0)

5+ Year Member



Search engines find them in referrer logs too.
9:01 am on Mar 22, 2006 (gmt 0)

10+ Year Member



How many of these pages are there?

About 85% or more.

9:11 am on Mar 22, 2006 (gmt 0)

10+ Year Member



How many of these pages are there?

some number to be clear.
March 21 date.
Total googlebot crawl: 20,111
like above mentioned url: 19,943 almost 99%
March 20 date:
Total googlebot crawl: 21,851
like above mentioned url: 21,614 almost 98.9%

11:29 am on Mar 22, 2006 (gmt 0)

WebmasterWorld Senior Member kaled is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I don't know how googlebot find this url

That would be the Google Spybar.

If the PR tool is enabled, every page you visit is tracked. How Google use this data is questionable but they have this data nonetheless. This may be one of the reasons Google fought the DoJ so hard in court recently. If forced to hand over search records, they might be forced to hand over this data too - that would be even more contentious.

Incidentally, if you use the PR extension for Firefox, Google still gets the same data - this is probably the reason that Google tolerates its use.

Kaled.

11:36 am on Mar 22, 2006 (gmt 0)

10+ Year Member



If the PR tool is enabled, every page you visit is tracked.

This is why I have a seperate Firefox profile (with the official toolbar) which I only use on "special" occasions. (Also has the added benefit of sparing me from "green bar angst".

11:56 am on Mar 22, 2006 (gmt 0)

WebmasterWorld Senior Member billys is a WebmasterWorld Top Contributor of All Time 10+ Year Member



>>That would be the Google Spybar.

This was a mistake I made early on. I kept that darn spybar on all the time when working on my site and testing pages. What a nightmare.

Now I just use the Firefox extention and only visit pages I want Google to know about.

12:07 pm on Mar 22, 2006 (gmt 0)



Sounds like the Google Spybar.
12:46 am on Mar 23, 2006 (gmt 0)

10+ Year Member



Any more information?
Googe Spy bar? how it works?
I don't think some one would search the "keyword" in our site.
12:53 am on Mar 23, 2006 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Always put a <meta name="robots" content="noindex"> tag on every page of a site that is not supposed to be indexed, and then it never will be.

See also the related forums [webmasterworld.com] discussion.

1:00 am on Mar 23, 2006 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I don't think some one would search the "keyword" in our site.

From monitoring site searches done on a few sites, it's clear to me that some users will search on anything at all, thinking that it's web search. Not all users are clear that the box will only search the site.

1:30 am on Mar 23, 2006 (gmt 0)

WebmasterWorld Senior Member kaled is a WebmasterWorld Top Contributor of All Time 10+ Year Member



If all search results have the appearance of existing in a single directory (according to the url) then excluding it using robots.txt is perfectly viable. My robots.txt file excludes all bots from the cgi-bin - that's a pretty common strategy I think.

Kaled.

1:33 am on Mar 23, 2006 (gmt 0)

10+ Year Member



The problem is all pages are value to be crawled by google but didn't, all of them, home page,sub-pages and article pages.
Googlebot crawl what? it crawl forum user's profile, the search result of forum user's post. that's all.

What I can do?

1:40 am on Mar 23, 2006 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



See my post above for one solution to the problem.
4:31 am on Mar 23, 2006 (gmt 0)

10+ Year Member



Does it not make sense to use method="post" instead of method="get" for search forms? That way the parameters will never appear in the URL and never appear in anyone's log files for Googlebot to find.
4:37 am on Mar 23, 2006 (gmt 0)

WebmasterWorld Senior Member pageoneresults is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Does it not make sense to use method="post" instead of method="get" for search forms?

It sure does and thanks for bringing it up. The above along with this (as mentioned above by g1smd)...

<meta name="robots" content="none">

...will keep your search results page out of the index. I've done it, and I've done it many times successfully.

There is one drawback, those search results cannot be bookmarked, emailed, etc. For the basic user anyway. ;)

And, if you really want to get granular with blocking the bots...

<meta name="googlebot" content="noindex, nofollow, noarchive">

<meta name="msnbot" content="noindex, nofollow">

In theory, and in practice, the standard...

<meta name="robots" content="none">

Should be sufficient to block all bots. There may be times though where you just want to block Googlebot and/or MSNBot.

 

Featured Threads

Hot Threads This Week

Hot Threads This Month