Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Webmaster Tools shows insane number of links - from site search template

         

johnnie

12:31 pm on Jun 23, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Okay, I have two sites (A and B) that I, ofcourse, like to interlink. And so I did. For B, this means I have included a link to B on A's template. Now, in google webmaster tools, the amount of links from A to B is insane (2500+)! This because google (thank you toolbar) is actually indexing site A's arbitrary search pages (like mysite.com/search.php?q=searchquery). Since the link is site-wide, all these search results occur as pages linking to B. Will the googlebot consider this as linkspam? If so, how do I prevent google from indexing arbitrary search result pages? Should I just make site A's search result page noindex?

tedster

1:52 pm on Jun 23, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you don't want to see your site searches in the index, I'd say a noindex is a good way to go. If you do, you could show the site search results on a more sparse template, without all the interlinking and possibly with next to no other navigation, either.

Receptional Andy

2:16 pm on Jun 23, 2008 (gmt 0)



The behaviour is most likely not as a result of the toolbar, but Googlebot's new "form crawling" behaviour. See Googlebot Now Crawls via HTML Forms [webmasterworld.com].

You can also block the crawling using wildcards in robots.txt. E.g. to block any URL containing a question mark (not suitable for all sites!), use:

User-agent: Googlebot
Disallow: /*?

johnnie

9:20 pm on Jun 23, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hmm.. I think I might go for the robots.txt option. Disallow: /?q should do nicely. I don't see the merit in indexing search results anyways.

Robert Charlton

1:19 am on Jun 24, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I don't see the merit in indexing search results anyways.

Google doesn't like them either. From Matt Cutts' blog...

Search results in search results
[mattcutts.com...]

The new webmaster guideline that you’ll see on that page says "Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don’t add much value for users coming from search engines."