Forum Moderators: Robert Charlton & goodroi
www.example.com/?s=keyword&submit=Search .
Often the keywords in question are just random words that can be found in the text of the site or even misspelled words. Could this be a competitor somehow trying to get my sites in trouble? The fact that the words are often so obscure would lead me to believe this. There seems to be no links pointing to these strange search strings. I don't understand how they got into the index.
[edited by: Robert_Charlton at 8:12 am (utc) on Feb. 23, 2008]
[edit reason] Used example.com. It can never be owned. [/edit]
Google indexing large volumes of (unlinked?) dynamic pages [webmasterworld.com]
That thread pretty much looks at these two possibilities:
- Someone linking to these pages for whatever reason
- Inexplicable spidering behaviour by Google.Receptional_Andy
I've seen some examples of this behavior too, although not in large volumes. It certainly can be a competitor, but now I'm beginning to lean toward experimental spidering behavior from Google. Could also be some malicious automated behavior from an unknown person, a competitor or just an experimenter.
Have you considered blocking the search results pages with robots.txt? At some point it doesn't matter HOW googlebot is getting these urls. If they resolve, then they can cause problems, so fixing the issue become the priority.
We've got a section here called Hot Topics area [webmasterworld.com], which is always pinned to the top of this forum's index page. In there is a post about WordPress that you may also find helpful:
WordPress And Google: Avoiding Duplicate Content Issues [webmasterworld.com]
Damn you try and make your site user friendly with a search box and it comes back to bite you!
Regarding getting the pages removed and stopping any future pages of this nature finding their way into Google. Would
Disallow: /?s=*
be the correct line to add to robots.txt? And how long would I be looking at for those pages to disappear?