Welcome to WebmasterWorld Guest from 23.20.18.183

Forum Moderators: Robert Charlton & aakk9999 & andy langton & goodroi

Message Too Old, No Replies

Googlebot looking for strange non-existent URLs

     
11:46 pm on Jan 8, 2011 (gmt 0)

Junior Member

5+ Year Member

joined:July 24, 2010
posts: 58
votes: 0


The URLs on my website is of the format /show/example.com. But googlebot's making these kinds requests intermittently:

/show/ACRS%20What%20does%20ACRS%20stand%20for%3F
/show/Financial%20Planning%20Corporate%20Budgets

and other TOTALLY unrelated queries.

Even though I am 404ing them, I am a little concerned why GB is looking for them. Could these cause loss in SERP?
12:13 am on Jan 9, 2011 (gmt 0)

Junior Member

5+ Year Member

joined:July 24, 2010
posts: 58
votes: 0


Some more queries:

/show/Foundation%20Officers%20The%20Builders%20Foundation
/show/Underwriter%20Spreads%20on%20Eurobond%20Issues%20of
/show/Argentine%20Peso%20Exchange%20Rates%20Argentine%20Peso
/show/Beyond%20Basics%3A%20FMP(Fixed%20Maturity%20Plan)%20Vs

Strange!
9:51 pm on Jan 10, 2011 (gmt 0)

New User

5+ Year Member

joined:Jan 10, 2011
posts:35
votes: 0


You need to check where the links to these urls are coming from, are they from your site or from an external site?
If they are from your site then they can cause a drop in SERP.
5:36 pm on Jan 11, 2011 (gmt 0)

Junior Member

5+ Year Member

joined:July 24, 2010
posts: 58
votes: 0


I submitted some URLs in the sitemap, the content for which's generated from a db. Looked for LIKE '% %', there's no entry whatsoever like that. And yes, my site's SERP *is falling* :/

GB still continues to make those weird requests:

Corporate%20governance%20rules%20mandatory%20for%20state-run
devaluation%20(finance)%20Britannica%20Online%20Encyclopedia
ISO9000Council%3A%20Quality%20Information%20on%20ISO%209000

etc. Strangely, all of the strange queries have first SERP websites. Wonder if Google's messing up somewhere, or if it's a new hack to attack someone's SERP. I could find any website linking those queries to my website.
6:22 pm on Jan 11, 2011 (gmt 0)

Junior Member

5+ Year Member

joined:July 24, 2010
posts: 58
votes: 0


Report from Webmaster Tools (Crawl Errors)

HTTP ‎(23)‎ - All those weird requests!
In Sitemaps ‎(293)‎ - Expected values

Looks like something is feeding GB with garbage for my website.
6:23 pm on Jan 11, 2011 (gmt 0)

Administrator from US 

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 21, 2004
posts:3114
votes: 98


Have you checked your sitemap to make sure there are no spaces, commas and other irregular characters in your urls?
6:28 pm on Jan 11, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


The "%20" is an encoding for the space character. If "/show/" is a form input of some kind on your website, then googlebot may be testing that form to see if it lets them find deep or hidden content on your site.

If you don't have a /show/ directory, then returning a 404 is the right thing to do. This should not cause you any ranking problems, as long as you are not linking to those URLs yourself,either internally or in a sitemap.
6:53 pm on Jan 11, 2011 (gmt 0)

Junior Member

5+ Year Member

joined:July 24, 2010
posts: 58
votes: 0


@goodroi, sitemap is absolutely fine, no chance of errors.

@tedster, I am redirecting /show/ (it is a form indeed) to homepage! Terrible mistake? I guess I better 404 /show/
7:03 pm on Jan 11, 2011 (gmt 0)

Moderator

WebmasterWorld Administrator webwork is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 2, 2003
posts:7868
votes: 25


Interesting thread. I was seeing the same results on a personal site. It was as if the bot was looking for something "that IT (the bot) was generating" (I sure wasn't) and, not finding "those pages" it was reporting a crawling error.

FWIW, there's no site map.

I haven't taken any steps as I was unable to determine how I was the source of . . the problem.

(Argh. I'm no SEO and this almost seems like G making me an even worse . . SEO. I mean, c'mon. Find my REAL pages, please. ;P)
8:08 pm on Jan 11, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member bwnbwn is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Oct 25, 2005
posts:3509
votes: 11


I am seeing them as well on a couple sites. It looks like the bot is indexing the urls incorrectly and reporting it in sitemaps as the problem. All the bad urls are hitting the custom 404 page due to me not allowing any spaces in our urls. These urls are not in the sitemap but this is were Google is saying the problem is. I am doing nothing because there really is nothing I can do.

It is best to 404 them.

I think the bot has hiccups and adds the spaces when it gets a case of the hiccups, because it is only a few sites with the issue and not all of them.
9:20 pm on Jan 11, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


The crawling of forms has been ramping up for a few years now. We've had a number of threads about it it recent years. This was the first public mention I recall:

In the past few months we have been exploring some HTML forms to try to discover new web pages and URLs that we otherwise couldn't find and index for users who search on Google. Specifically, when we encounter a <FORM> element on a high-quality site, we might choose to do a small number of queries using the form.

  • For text boxes, our computers automatically choose words from the site that has the form;

  • For select menus, check boxes, and radio buttons on the form, we choose from among the values...

    [googlewebmastercentral.blogspot.com...]
    You can take consolation in the fact they only do this on what they feel is a "high-quality site", I suppose.

    At the first PubCon in Austin, I had a chat with Matt Cutts about this behavior. He confirmed that content found only through form crawling gets established as a "virtual URL" on Google's back end, and that link equity, such as PR, can pass through.

    He also reinforced the idea that Google does not want to index site search results - so those search forms should be disallowed from googlebot.
  • 2:29 am on Jan 12, 2011 (gmt 0)

    Junior Member

    5+ Year Member

    joined:July 24, 2010
    posts: 58
    votes: 0


    Hey guys, thanks for your insights. It's consoling to note that it's a known "issue" for "high-quality" sites.

    At the same time if I were give G some feedback regarding the "guess algorithm", it would be this - it fails, GB is looking for something it will NEVER find on a website like mine. Needs a lot more work in the context discovery of the website. Good luck with the effort, though.
    2:32 pm on Jan 12, 2011 (gmt 0)

    Senior Member

    WebmasterWorld Senior Member bwnbwn is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

    joined:Oct 25, 2005
    posts:3509
    votes: 11


    ted you could be right but why if this is the culprit is google telling us we got site map errors from the bot creating discovery urls. I got a bunch of them in webmaster tools saying the errors are on the sitemap when in fact the sitemap is fine.
    3:37 pm on Jan 12, 2011 (gmt 0)

    Senior Member

    WebmasterWorld Senior Member 10+ Year Member

    joined:May 8, 2003
    posts:1141
    votes: 0


    I have encountered strange things with my internal search in the past. Google Bot tried random keywords from my websites and used them as search strings in my search form.

    like this:
    advanced_search.php?query=randomkeyword1
    advanced_search.php?query=randomkeyword2
    advanced_search.php?query=randomkeyword3
    advanced_search.php?query=randomkeyword3+randomkeyword1

    I first thought someone must be linking to my website and Google would only follow, but this was not the case. The crawler used random words from my website and submitted them.

    Are the keywords unrelated to your website or do they appear somewhere on your website? If they appear on your website than it is probably this.
    4:49 pm on Jan 12, 2011 (gmt 0)

    Senior Member

    WebmasterWorld Senior Member 10+ Year Member

    joined:Feb 25, 2003
    posts:2527
    votes: 0


    MonkeyFace, did you just buy the domain from new? Or does it have history with other owners?
    4:55 pm on Jan 12, 2011 (gmt 0)

    Senior Member

    WebmasterWorld Senior Member 10+ Year Member

    joined:Nov 15, 2001
    posts:1436
    votes: 0


    I know it is not exactly the same thing but I was looking at page popularity in Google Analytics recently and down at the bottom of the table there were quite a lot of pages that just don't exist nor afaikt have ever existed. Quite strange.
    10:04 pm on Jan 19, 2011 (gmt 0)

    Junior Member

    5+ Year Member

    joined:July 24, 2010
    posts: 58
    votes: 0


    It's an old domain I have been using for over a year. No history of earlier registration.