Welcome to WebmasterWorld Guest from 54.167.29.254

Forum Moderators: Robert Charlton & aakk9999 & andy langton & goodroi

Message Too Old, No Replies

Crawl Errors in Webmaster Tools for URLs that don't exist

     
11:46 pm on Apr 5, 2011 (gmt 0)

Preferred Member

5+ Year Member

joined:Mar 20, 2011
posts:544
votes: 0


Today in Webmaster Tools I noticed that I'm getting thousands of crawl errors for URLs which don't exist, and have never existed on my site. They all return 404s.

The URLs are like real URLs, but just slightly broken, they're all missing one vital piece of the URL that would make them work.

Is there any way for me to know why/how/where Google is getting these URLs? And is this something I should be concerned about?
1:03 am on Apr 6, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


Any clues about the source under the "Linked From" column on the right? Or are they all listed as "unavailable"?

At any rate, no matter how googlebot it picking up those URLs, if they really should be 404, you shouldn't have any worries as long as the source isn't your own website.
6:25 am on Apr 6, 2011 (gmt 0)

Junior Member

5+ Year Member

joined:Apr 3, 2010
posts:92
votes: 0


There was an issue about this back in feb with urls appearing in the crawl errors with hot linking to images.


[webmasterworld.com...]

I had a few of these bogus errors appear and then vanish soon after. In the last week, they have reappeared with the same date stamp as before... so not sure if WMT is reporting these correctly.

This is also being discussed over on the Google Help Forum
7:01 am on Apr 6, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:Mar 9, 2010
posts:1806
votes: 9


I did see this and report it here two weeks back. I haven't seen these kind of non-existing urls being reported as errors before the panda update.
7:49 am on Apr 6, 2011 (gmt 0)

Preferred Member

5+ Year Member

joined:Mar 20, 2011
posts:544
votes: 0


The sources are all listed as "unvailable".

But did some digging and I think I figured out where they're coming from...

It's a problem with multiple layers..

First... Basically Google is for some reason, indexing the wrong version of URLs on my site. The versions it is indexing also work and take you to the post, and not a 404 but they are the wrong version of the URL and I have no idea how Google is getting them.

Second... on those wrong working (but wrong) versions of URLs that it's indexing, are relative links to other pages on the site. Those relative links are broken on the wrong, working versions of the URLs... because that page isn't supposed to exist.

Third, Google then tries to index those wrong, broken versions of URLs on the wrong, working pages it is already erroneously indexing... but can't because the links don't work.

----

I have already, theoretically fixed the first issue with indexing wrong, working URLs by putting canonicals on them, but this doesn't stop Google from the other, new error being caused by the original error.

So I wonder how damaging to me these 404s are?

It's worth noting that I NEVER got these 404 errors in my Webmaster tools before, until a few days ago. And nothing on my site has changed at all.
7:53 am on Apr 7, 2011 (gmt 0)

Junior Member

5+ Year Member

joined:Apr 3, 2010
posts:92
votes: 0


Just an update today to say that my old crawl errors have now gone from WMT.

Anyone else seeing this?
6:45 pm on Apr 7, 2011 (gmt 0)

Senior Member

joined:May 13, 2010
posts: 1054
votes: 0


Today we had over 150 crawl errors disappear from WMT's! None seemed to be legit problems.
7:27 pm on Apr 7, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 13, 2004
posts:826
votes: 10


Just a reminder, and it's weird to have to say this; Make sure you "view source" on the links Google displays in WMT. Many times there are non-printing or invisible characters in the href in the source code.

and example:
http://www.example.com/widget-widget-repair.htm
actually looks like this when you view source of the WMT page:
http://www.example.com/widget-widget-%E2%80%8Brepair.htm

Even the anchor text has hidden characters:
WMT displays
http://www.example.com/widget-widget-repair.htm
Actually in the source:
http://www.example.com/widget-widget-​repair.htm
I think this is some kind of line wrapping problem. I researched in the past but forget what I found.

In this case this is an example of a mystery 400 error. The links look fine in WMT until you view source!

Just added confusion.
6:29 am on Apr 8, 2011 (gmt 0)

Senior Member

joined:May 13, 2010
posts: 1054
votes: 0


& this morning we are back to 130 errors!
12:01 pm on Apr 8, 2011 (gmt 0)

Junior Member

5+ Year Member

joined:Apr 3, 2010
posts:92
votes: 0


same here - my old errors are back. ho hum.
12:17 pm on Apr 8, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 20, 2004
posts:2377
votes: 0


Check you source code. We had a similar problem. Cone to find out there was a broken link with no text/desc in the source of a page that we could not see!

Try Xenu. It can crawl and report such broken links.
12:30 pm on Apr 8, 2011 (gmt 0)

Senior Member

joined:May 13, 2010
posts: 1054
votes: 0


/\ But how would that explain them coming & going? The dates on ours are from months ago.
12:35 pm on Apr 8, 2011 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14650
votes: 94


I find mostly this problem comes from buggy kiddie scripts dumping out scraped auto-gen content. Some make all my URLs lower case, others replace the "&" in parameters with "~" and so on and so forth.

I'm actually starting to wonder if some of this nonsense isn't some new form of lame BH anti-SEO technique designed to drives sites down in the index based on mass quantities of bad IBLs. Maybe not, but it's happening way to much just for stupid chance and can there really be that many bad programmers cranking out broken auto-gen sites?
2:04 pm on Apr 8, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 20, 2004
posts:2377
votes: 0


I'm actually starting to wonder if some of this nonsense isn't some new form of lame BH anti-SEO technique designed to drives sites down in the index based on mass quantities of bad IBLs.


I use to think that was the case for me, until I found some bugs in my code. My best advice, if you combed over your site for bugs 10 times already, do it 10 more times. If you have a large site, you will be surprised what you can find.
3:07 pm on Apr 8, 2011 (gmt 0)

Junior Member

10+ Year Member

joined:June 23, 2004
posts:56
votes: 0


How about these 'not followed' crawl errors. They appeared yesterday. I've obviously edited the sitename and containing subfolder but the actual 'pages' are verbatim. There's nothing in my robots txt and they don't exist anywhere on the site.

www.mysite.com/mysubfolder/uqhjphcfeqirpr.html
www.mysite.com/mysubfolder/uddwkraqi.html
www.mysite.com/mysubfolder/tcsnlpnglaicwmb.html
4:58 pm on Apr 8, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:Mar 9, 2010
posts:1806
votes: 9


@Prudence are you seeing 404s for them in GWT?

incrediBILL, what you state has been happening for a long time.I have seen it even in 2007.

but these new errors in GWT, post panda, seem to be interesting as now I see more and more people reporting it.Are these seen only on panda affected sites or do people see it on sites that aren't affected by panda too?

The errors for pages that never existed on the site are something new and I see these only after panda.They seem to have introduced something new in the way they crawl.I am not sure whether these are bugs in what they have introduced or they are testing something by trying to crawl these non-existing ones.

[edited by: indyank at 5:38 pm (utc) on Apr 8, 2011]

4:59 pm on Apr 8, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 13, 2004
posts:826
votes: 10


They are all script kiddie junk sites. Pages with lots of incorrectly coded outbound links. Plus, they are hot linking to my photos in Google's gstatic domain. I don't think they're getting that code right either.

Why does Google let people hot link to Google's copies of our images?

[t0.gstatic.com...]

My back deck!

It really only takes one bad coder to make thousands of bad websites!
5:01 pm on Apr 8, 2011 (gmt 0)

Senior Member

joined:May 13, 2010
posts: 1054
votes: 0


WMT's now showing 5 errors! That utility mentioned above works well but found no errors on our sites.
5:10 pm on Apr 8, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 13, 2004
posts:826
votes: 10


Actually some may be legitimate, but hacked, sites with thousands of junk pages tacked on to advertise ... whatever.

One site on HostGator was flagged and had a comment "Webmaster, please contact support".

Url was cgi-sys/suspendedpage.cgi So hopefully the ISP's are slowly cleaning this up.
5:20 pm on Apr 8, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:Mar 9, 2010
posts:1806
votes: 9


They scrape your images and they let others scrape them. How noble!
11:54 pm on Apr 8, 2011 (gmt 0)

Junior Member

10+ Year Member

joined:June 23, 2004
posts:56
votes: 0


@indyank. Thanks for the response. They're under the notfollowed tab in crawl errors although there seems to be less since I last posted. What could they be? Never seen the like before in GWT.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members