homepage Welcome to WebmasterWorld Guest from 54.161.214.221
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Googlebot looking for strange non-existent URLs
MonkeyFace




msg:4250831
 11:46 pm on Jan 8, 2011 (gmt 0)

The URLs on my website is of the format /show/example.com. But googlebot's making these kinds requests intermittently:

/show/ACRS%20What%20does%20ACRS%20stand%20for%3F
/show/Financial%20Planning%20Corporate%20Budgets

and other TOTALLY unrelated queries.

Even though I am 404ing them, I am a little concerned why GB is looking for them. Could these cause loss in SERP?

 

MonkeyFace




msg:4250839
 12:13 am on Jan 9, 2011 (gmt 0)

Some more queries:

/show/Foundation%20Officers%20The%20Builders%20Foundation
/show/Underwriter%20Spreads%20on%20Eurobond%20Issues%20of
/show/Argentine%20Peso%20Exchange%20Rates%20Argentine%20Peso
/show/Beyond%20Basics%3A%20FMP(Fixed%20Maturity%20Plan)%20Vs

Strange!

Jonny6




msg:4251547
 9:51 pm on Jan 10, 2011 (gmt 0)

You need to check where the links to these urls are coming from, are they from your site or from an external site?
If they are from your site then they can cause a drop in SERP.

MonkeyFace




msg:4251985
 5:36 pm on Jan 11, 2011 (gmt 0)

I submitted some URLs in the sitemap, the content for which's generated from a db. Looked for LIKE '% %', there's no entry whatsoever like that. And yes, my site's SERP *is falling* :/

GB still continues to make those weird requests:

Corporate%20governance%20rules%20mandatory%20for%20state-run
devaluation%20(finance)%20Britannica%20Online%20Encyclopedia
ISO9000Council%3A%20Quality%20Information%20on%20ISO%209000

etc. Strangely, all of the strange queries have first SERP websites. Wonder if Google's messing up somewhere, or if it's a new hack to attack someone's SERP. I could find any website linking those queries to my website.

MonkeyFace




msg:4251998
 6:22 pm on Jan 11, 2011 (gmt 0)

Report from Webmaster Tools (Crawl Errors)

HTTP ‎(23)‎ - All those weird requests!
In Sitemaps ‎(293)‎ - Expected values

Looks like something is feeding GB with garbage for my website.

goodroi




msg:4251999
 6:23 pm on Jan 11, 2011 (gmt 0)

Have you checked your sitemap to make sure there are no spaces, commas and other irregular characters in your urls?

tedster




msg:4252001
 6:28 pm on Jan 11, 2011 (gmt 0)

The "%20" is an encoding for the space character. If "/show/" is a form input of some kind on your website, then googlebot may be testing that form to see if it lets them find deep or hidden content on your site.

If you don't have a /show/ directory, then returning a 404 is the right thing to do. This should not cause you any ranking problems, as long as you are not linking to those URLs yourself,either internally or in a sitemap.

MonkeyFace




msg:4252015
 6:53 pm on Jan 11, 2011 (gmt 0)

@goodroi, sitemap is absolutely fine, no chance of errors.

@tedster, I am redirecting /show/ (it is a form indeed) to homepage! Terrible mistake? I guess I better 404 /show/

Webwork




msg:4252030
 7:03 pm on Jan 11, 2011 (gmt 0)

Interesting thread. I was seeing the same results on a personal site. It was as if the bot was looking for something "that IT (the bot) was generating" (I sure wasn't) and, not finding "those pages" it was reporting a crawling error.

FWIW, there's no site map.

I haven't taken any steps as I was unable to determine how I was the source of . . the problem.

(Argh. I'm no SEO and this almost seems like G making me an even worse . . SEO. I mean, c'mon. Find my REAL pages, please. ;P)

bwnbwn




msg:4252078
 8:08 pm on Jan 11, 2011 (gmt 0)

I am seeing them as well on a couple sites. It looks like the bot is indexing the urls incorrectly and reporting it in sitemaps as the problem. All the bad urls are hitting the custom 404 page due to me not allowing any spaces in our urls. These urls are not in the sitemap but this is were Google is saying the problem is. I am doing nothing because there really is nothing I can do.

It is best to 404 them.

I think the bot has hiccups and adds the spaces when it gets a case of the hiccups, because it is only a few sites with the issue and not all of them.

tedster




msg:4252101
 9:20 pm on Jan 11, 2011 (gmt 0)

The crawling of forms has been ramping up for a few years now. We've had a number of threads about it it recent years. This was the first public mention I recall:

In the past few months we have been exploring some HTML forms to try to discover new web pages and URLs that we otherwise couldn't find and index for users who search on Google. Specifically, when we encounter a <FORM> element on a high-quality site, we might choose to do a small number of queries using the form.

  • For text boxes, our computers automatically choose words from the site that has the form;

  • For select menus, check boxes, and radio buttons on the form, we choose from among the values...

    [googlewebmastercentral.blogspot.com...]

  • You can take consolation in the fact they only do this on what they feel is a "high-quality site", I suppose.

    At the first PubCon in Austin, I had a chat with Matt Cutts about this behavior. He confirmed that content found only through form crawling gets established as a "virtual URL" on Google's back end, and that link equity, such as PR, can pass through.

    He also reinforced the idea that Google does not want to index site search results - so those search forms should be disallowed from googlebot.

    MonkeyFace




    msg:4252185
     2:29 am on Jan 12, 2011 (gmt 0)

    Hey guys, thanks for your insights. It's consoling to note that it's a known "issue" for "high-quality" sites.

    At the same time if I were give G some feedback regarding the "guess algorithm", it would be this - it fails, GB is looking for something it will NEVER find on a website like mine. Needs a lot more work in the context discovery of the website. Good luck with the effort, though.

    bwnbwn




    msg:4252375
     2:32 pm on Jan 12, 2011 (gmt 0)

    ted you could be right but why if this is the culprit is google telling us we got site map errors from the bot creating discovery urls. I got a bunch of them in webmaster tools saying the errors are on the sitemap when in fact the sitemap is fine.

    jecasc




    msg:4252399
     3:37 pm on Jan 12, 2011 (gmt 0)

    I have encountered strange things with my internal search in the past. Google Bot tried random keywords from my websites and used them as search strings in my search form.

    like this:
    advanced_search.php?query=randomkeyword1
    advanced_search.php?query=randomkeyword2
    advanced_search.php?query=randomkeyword3
    advanced_search.php?query=randomkeyword3+randomkeyword1

    I first thought someone must be linking to my website and Google would only follow, but this was not the case. The crawler used random words from my website and submitted them.

    Are the keywords unrelated to your website or do they appear somewhere on your website? If they appear on your website than it is probably this.

    internetheaven




    msg:4252436
     4:49 pm on Jan 12, 2011 (gmt 0)

    MonkeyFace, did you just buy the domain from new? Or does it have history with other owners?

    Mark_A




    msg:4252439
     4:55 pm on Jan 12, 2011 (gmt 0)

    I know it is not exactly the same thing but I was looking at page popularity in Google Analytics recently and down at the bottom of the table there were quite a lot of pages that just don't exist nor afaikt have ever existed. Quite strange.

    MonkeyFace




    msg:4255424
     10:04 pm on Jan 19, 2011 (gmt 0)

    It's an old domain I have been using for over a year. No history of earlier registration.

    Global Options:
     top home search open messages active posts  
     

    Home / Forums Index / Google / Google SEO News and Discussion
    rss feed

    All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
    Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
    WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
    © Webmaster World 1996-2014 all rights reserved