WMT requests for bad url

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

WMT requests for bad url

g1smd

3:37 pm on Apr 25, 2012 (gmt 0)

Loving the long list of requests for

com.google.crawl.wmconsole.fe.util.gxp.UrlItem$2@nnnnnnn

shown as returning 404 status and listed in the Google WMT Crawl Errors report (where nnnnnnn is a 6 or 7 character hexadecimal number).

WMT appears to be broken again (or do I find ever more inventive ways to break it?).

goodroi

5:07 pm on Apr 25, 2012 (gmt 0)

I think Google confirmed that is a known display issue last week.

g1smd

5:17 pm on Apr 25, 2012 (gmt 0)

I missed that announcement. Thanks!

lucy24

9:43 pm on Apr 25, 2012 (gmt 0)

Every time I see a new google 404 in logs I wait excitedly for it to show up as an Error in wmt so I can see whether it's linked from some other garbage site or simply a product of their own fevered imagination. With a side order of "Oh, ###, I didn't mistype a link somewhere did I?"

Their favorite with me is the opposite of yours: they offer up truncated versions of legitimate names. No, google, I do not have a page named /custom_ with trailing lowline and no extension. And since you can't point to a single place that links to it, let's just drop the whole thing shall we?

g1smd

9:52 pm on Apr 25, 2012 (gmt 0)

I see so many badly truncated URLs in the WMT crawl errors report that I've given up looking at those particular entries.

They are very infrequently worth reclaiming.

matrix_jan

10:21 pm on Apr 25, 2012 (gmt 0)

I have thousands of 404 errors in WT. There should be a button next to "mark as fixed" saying "yes, yes, and yes it IS a real 404 no need to fix it!"

Robert Charlton

10:35 pm on Apr 25, 2012 (gmt 0)

This thread might be helpful to those who haven't heard Google's "explanation"....

Google Following URLs Without Hyperlinks
http://www.webmasterworld.com/google/4389424.htm [webmasterworld.com]

To once again quote John Mueller about this...

I realize that this can lead to a somewhat cluttered crawl errors section in Webmaster Tools, so we're looking into ways of making that a bit clearer....

garyr_h

11:24 pm on Apr 25, 2012 (gmt 0)

Eh, all they have to do is separate it out. Add in errors through unlinked, add in "should be 404" and there you go.

Most of the truncated URL errors are from smaller search engines placing the truncated URL beneath the result. So the larger your site, the more of those errors you will see. I had around 1000 of those errors show up from one small search engine, for example.

Robert Charlton

5:56 am on Apr 27, 2012 (gmt 0)

Actually, if the WMT team would try to anticipate how non-engineers might react to some of their reports and build in a cheat-sheet for the average webmaster, they might head off a lot of angst.

matrix_jan

8:40 am on Apr 27, 2012 (gmt 0)

At least they worked on their new favicon :-/

enigma1

12:10 pm on Apr 27, 2012 (gmt 0)

I don't see any 404 errors whatsoever in gwt despite the thousands of invalid requests made to the server. Make sure the site hasn't got problems with invalid links displayed and the server sends the right response to the client.

A typical problem is many sites doing an internal domain redirect including the requested query ie: 301 to 404 and that's an error.

Another is broken links the site pages expose to spiders and many times hard to track without having some log mechanism in place.

From what I see googlebot is way more sensitive to incorrect server responses now than previosuly.

lucy24

5:57 pm on Apr 27, 2012 (gmt 0)

At least they worked on their new favicon :-/

Any idea what it's supposed to be? On my Bookmarks list it looks unnervingly like an eyeball peering out at me but in the browser's address bar the identical image looks more like a smiling suitcase. Huh?

Am I the only person who wishes that when they have a "Why not?" link it would lead to information about the specific category they don't have information on? If it's next to Crawl Errors, can't they just say "Woo hoo, ain't none today"?

And, uhm, why have they only just seen fit to complain about my front page's meta description tag which I believe has been identical for at least a year? Well, OK, I changed the word "seven" to "eight". But what kind of computerized doodad can that possibly have triggered? Is it supposed to be proportional to the size of the page? (I also added some 20 words or so to the visible text.) Hm.

Oops. Drifting a bit there.

matrix_jan

6:33 pm on Apr 27, 2012 (gmt 0)

Ok. The thing is that I happen to create my own framework. It's human made, and humans make mistakes, that's why there are updates. One of those errors generated a ghost page (when the requested url is not checked with the database and returns the template with no or bogus content). I fixed the bug a year ago, but google seemed to like the ghost page and even though it returns 404 for about a year G still follows the links on the ghost page which include obviously other ghost pages which return 404 for a year too.

I don't know how to tell G that the bug is fixed, the 1000+ 404s are meant to be and should be 404s no need to tell me about it.

Any idea what it's supposed to be?

It looks like a suitcase with croissant in it... ggg... I know it's a wrench, but croissant fits better smh.

About Ghost pages: most of the ghost pages were invented by G, by trying to access
category1/[category2 subcategory which should not be here] pages.

g1smd

11:19 pm on Apr 27, 2012 (gmt 0)

WMT reports for one site listed many Crawl Errors a few weeks ago (mostly 401 responses I think).

However, just days after the site was redirected to another domain and a removal request completed for the original domain, the Crawl Errors report then said "No errors in the last 90 days. Nice!"

g1smd

2:11 am on May 2, 2012 (gmt 0)

It is possible that the error mentioned in the original post occurs when the "Fetch as Googlebot" function is used with WebmasterTools. Anyone else able to confirm that?

lucy24

7:12 am on May 2, 2012 (gmt 0)

Ouch! I tried one page at random-- well, not really random, I've recently recoded it so may as well bring it to g###'s attention-- and when I verified the Fetch, I discovered I'd forgotten to include one image. By the time I'd got that sorted out, it was a bit harder to find the googlebot visit. Came through as a simple 200, so we'll see what it comes up with next.

And it really is the identical UA as a normal googlebot visit, unlike the Instant Preview charade, so no hanky-panky there.