Welcome to WebmasterWorld Guest from 54.205.75.60

Message Too Old, No Replies

Googlebot looking for strange non-existent URLs

   
11:46 pm on Jan 8, 2011 (gmt 0)



The URLs on my website is of the format /show/example.com. But googlebot's making these kinds requests intermittently:

/show/ACRS%20What%20does%20ACRS%20stand%20for%3F
/show/Financial%20Planning%20Corporate%20Budgets

and other TOTALLY unrelated queries.

Even though I am 404ing them, I am a little concerned why GB is looking for them. Could these cause loss in SERP?
12:13 am on Jan 9, 2011 (gmt 0)



Some more queries:

/show/Foundation%20Officers%20The%20Builders%20Foundation
/show/Underwriter%20Spreads%20on%20Eurobond%20Issues%20of
/show/Argentine%20Peso%20Exchange%20Rates%20Argentine%20Peso
/show/Beyond%20Basics%3A%20FMP(Fixed%20Maturity%20Plan)%20Vs

Strange!
9:51 pm on Jan 10, 2011 (gmt 0)



You need to check where the links to these urls are coming from, are they from your site or from an external site?
If they are from your site then they can cause a drop in SERP.
5:36 pm on Jan 11, 2011 (gmt 0)



I submitted some URLs in the sitemap, the content for which's generated from a db. Looked for LIKE '% %', there's no entry whatsoever like that. And yes, my site's SERP *is falling* :/

GB still continues to make those weird requests:

Corporate%20governance%20rules%20mandatory%20for%20state-run
devaluation%20(finance)%20Britannica%20Online%20Encyclopedia
ISO9000Council%3A%20Quality%20Information%20on%20ISO%209000

etc. Strangely, all of the strange queries have first SERP websites. Wonder if Google's messing up somewhere, or if it's a new hack to attack someone's SERP. I could find any website linking those queries to my website.
6:22 pm on Jan 11, 2011 (gmt 0)



Report from Webmaster Tools (Crawl Errors)

HTTP ‎(23)‎ - All those weird requests!
In Sitemaps ‎(293)‎ - Expected values

Looks like something is feeding GB with garbage for my website.
6:23 pm on Jan 11, 2011 (gmt 0)

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Have you checked your sitemap to make sure there are no spaces, commas and other irregular characters in your urls?
6:28 pm on Jan 11, 2011 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



The "%20" is an encoding for the space character. If "/show/" is a form input of some kind on your website, then googlebot may be testing that form to see if it lets them find deep or hidden content on your site.

If you don't have a /show/ directory, then returning a 404 is the right thing to do. This should not cause you any ranking problems, as long as you are not linking to those URLs yourself,either internally or in a sitemap.
6:53 pm on Jan 11, 2011 (gmt 0)



@goodroi, sitemap is absolutely fine, no chance of errors.

@tedster, I am redirecting /show/ (it is a form indeed) to homepage! Terrible mistake? I guess I better 404 /show/
7:03 pm on Jan 11, 2011 (gmt 0)

WebmasterWorld Administrator webwork is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Interesting thread. I was seeing the same results on a personal site. It was as if the bot was looking for something "that IT (the bot) was generating" (I sure wasn't) and, not finding "those pages" it was reporting a crawling error.

FWIW, there's no site map.

I haven't taken any steps as I was unable to determine how I was the source of . . the problem.

(Argh. I'm no SEO and this almost seems like G making me an even worse . . SEO. I mean, c'mon. Find my REAL pages, please. ;P)
8:08 pm on Jan 11, 2011 (gmt 0)

WebmasterWorld Senior Member bwnbwn is a WebmasterWorld Top Contributor of All Time 5+ Year Member



I am seeing them as well on a couple sites. It looks like the bot is indexing the urls incorrectly and reporting it in sitemaps as the problem. All the bad urls are hitting the custom 404 page due to me not allowing any spaces in our urls. These urls are not in the sitemap but this is were Google is saying the problem is. I am doing nothing because there really is nothing I can do.

It is best to 404 them.

I think the bot has hiccups and adds the spaces when it gets a case of the hiccups, because it is only a few sites with the issue and not all of them.
9:20 pm on Jan 11, 2011 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



The crawling of forms has been ramping up for a few years now. We've had a number of threads about it it recent years. This was the first public mention I recall:

In the past few months we have been exploring some HTML forms to try to discover new web pages and URLs that we otherwise couldn't find and index for users who search on Google. Specifically, when we encounter a <FORM> element on a high-quality site, we might choose to do a small number of queries using the form.

  • For text boxes, our computers automatically choose words from the site that has the form;

  • For select menus, check boxes, and radio buttons on the form, we choose from among the values...

    [googlewebmastercentral.blogspot.com...]
    You can take consolation in the fact they only do this on what they feel is a "high-quality site", I suppose.

    At the first PubCon in Austin, I had a chat with Matt Cutts about this behavior. He confirmed that content found only through form crawling gets established as a "virtual URL" on Google's back end, and that link equity, such as PR, can pass through.

    He also reinforced the idea that Google does not want to index site search results - so those search forms should be disallowed from googlebot.
  • 2:29 am on Jan 12, 2011 (gmt 0)



    Hey guys, thanks for your insights. It's consoling to note that it's a known "issue" for "high-quality" sites.

    At the same time if I were give G some feedback regarding the "guess algorithm", it would be this - it fails, GB is looking for something it will NEVER find on a website like mine. Needs a lot more work in the context discovery of the website. Good luck with the effort, though.
    2:32 pm on Jan 12, 2011 (gmt 0)

    WebmasterWorld Senior Member bwnbwn is a WebmasterWorld Top Contributor of All Time 5+ Year Member



    ted you could be right but why if this is the culprit is google telling us we got site map errors from the bot creating discovery urls. I got a bunch of them in webmaster tools saying the errors are on the sitemap when in fact the sitemap is fine.
    3:37 pm on Jan 12, 2011 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    I have encountered strange things with my internal search in the past. Google Bot tried random keywords from my websites and used them as search strings in my search form.

    like this:
    advanced_search.php?query=randomkeyword1
    advanced_search.php?query=randomkeyword2
    advanced_search.php?query=randomkeyword3
    advanced_search.php?query=randomkeyword3+randomkeyword1

    I first thought someone must be linking to my website and Google would only follow, but this was not the case. The crawler used random words from my website and submitted them.

    Are the keywords unrelated to your website or do they appear somewhere on your website? If they appear on your website than it is probably this.
    4:49 pm on Jan 12, 2011 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    MonkeyFace, did you just buy the domain from new? Or does it have history with other owners?
    4:55 pm on Jan 12, 2011 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    I know it is not exactly the same thing but I was looking at page popularity in Google Analytics recently and down at the bottom of the table there were quite a lot of pages that just don't exist nor afaikt have ever existed. Quite strange.
    10:04 pm on Jan 19, 2011 (gmt 0)



    It's an old domain I have been using for over a year. No history of earlier registration.