homepage Welcome to WebmasterWorld Guest from 54.227.41.242
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Crawl Errors in Webmaster Tools for URLs that don't exist
Shatner




msg:4292778
 11:46 pm on Apr 5, 2011 (gmt 0)

Today in Webmaster Tools I noticed that I'm getting thousands of crawl errors for URLs which don't exist, and have never existed on my site. They all return 404s.

The URLs are like real URLs, but just slightly broken, they're all missing one vital piece of the URL that would make them work.

Is there any way for me to know why/how/where Google is getting these URLs? And is this something I should be concerned about?

 

tedster




msg:4292797
 1:03 am on Apr 6, 2011 (gmt 0)

Any clues about the source under the "Linked From" column on the right? Or are they all listed as "unavailable"?

At any rate, no matter how googlebot it picking up those URLs, if they really should be 404, you shouldn't have any worries as long as the source isn't your own website.

Pudders




msg:4292890
 6:25 am on Apr 6, 2011 (gmt 0)

There was an issue about this back in feb with urls appearing in the crawl errors with hot linking to images.


[webmasterworld.com...]

I had a few of these bogus errors appear and then vanish soon after. In the last week, they have reappeared with the same date stamp as before... so not sure if WMT is reporting these correctly.

This is also being discussed over on the Google Help Forum

indyank




msg:4292903
 7:01 am on Apr 6, 2011 (gmt 0)

I did see this and report it here two weeks back. I haven't seen these kind of non-existing urls being reported as errors before the panda update.

Shatner




msg:4292957
 7:49 am on Apr 6, 2011 (gmt 0)

The sources are all listed as "unvailable".

But did some digging and I think I figured out where they're coming from...

It's a problem with multiple layers..

First... Basically Google is for some reason, indexing the wrong version of URLs on my site. The versions it is indexing also work and take you to the post, and not a 404 but they are the wrong version of the URL and I have no idea how Google is getting them.

Second... on those wrong working (but wrong) versions of URLs that it's indexing, are relative links to other pages on the site. Those relative links are broken on the wrong, working versions of the URLs... because that page isn't supposed to exist.

Third, Google then tries to index those wrong, broken versions of URLs on the wrong, working pages it is already erroneously indexing... but can't because the links don't work.

----

I have already, theoretically fixed the first issue with indexing wrong, working URLs by putting canonicals on them, but this doesn't stop Google from the other, new error being caused by the original error.

So I wonder how damaging to me these 404s are?

It's worth noting that I NEVER got these 404 errors in my Webmaster tools before, until a few days ago. And nothing on my site has changed at all.

Pudders




msg:4293596
 7:53 am on Apr 7, 2011 (gmt 0)

Just an update today to say that my old crawl errors have now gone from WMT.

Anyone else seeing this?

ohno




msg:4293809
 6:45 pm on Apr 7, 2011 (gmt 0)

Today we had over 150 crawl errors disappear from WMT's! None seemed to be legit problems.

bumpski




msg:4293827
 7:27 pm on Apr 7, 2011 (gmt 0)

Just a reminder, and it's weird to have to say this; Make sure you "view source" on the links Google displays in WMT. Many times there are non-printing or invisible characters in the href in the source code.

and example:
http://www.example.com/widget-widget-repair.htm
actually looks like this when you view source of the WMT page:
http://www.example.com/widget-widget-%E2%80%8Brepair.htm

Even the anchor text has hidden characters:
WMT displays
http://www.example.com/widget-widget-repair.htm
Actually in the source:
http://www.example.com/widget-widget-​repair.htm
I think this is some kind of line wrapping problem. I researched in the past but forget what I found.

In this case this is an example of a mystery 400 error. The links look fine in WMT until you view source!

Just added confusion.

ohno




msg:4294040
 6:29 am on Apr 8, 2011 (gmt 0)

& this morning we are back to 130 errors!

Pudders




msg:4294098
 12:01 pm on Apr 8, 2011 (gmt 0)

same here - my old errors are back. ho hum.

maximillianos




msg:4294111
 12:17 pm on Apr 8, 2011 (gmt 0)

Check you source code. We had a similar problem. Cone to find out there was a broken link with no text/desc in the source of a page that we could not see!

Try Xenu. It can crawl and report such broken links.

ohno




msg:4294123
 12:30 pm on Apr 8, 2011 (gmt 0)

/\ But how would that explain them coming & going? The dates on ours are from months ago.

incrediBILL




msg:4294124
 12:35 pm on Apr 8, 2011 (gmt 0)

I find mostly this problem comes from buggy kiddie scripts dumping out scraped auto-gen content. Some make all my URLs lower case, others replace the "&" in parameters with "~" and so on and so forth.

I'm actually starting to wonder if some of this nonsense isn't some new form of lame BH anti-SEO technique designed to drives sites down in the index based on mass quantities of bad IBLs. Maybe not, but it's happening way to much just for stupid chance and can there really be that many bad programmers cranking out broken auto-gen sites?

maximillianos




msg:4294170
 2:04 pm on Apr 8, 2011 (gmt 0)

I'm actually starting to wonder if some of this nonsense isn't some new form of lame BH anti-SEO technique designed to drives sites down in the index based on mass quantities of bad IBLs.


I use to think that was the case for me, until I found some bugs in my code. My best advice, if you combed over your site for bugs 10 times already, do it 10 more times. If you have a large site, you will be surprised what you can find.

Prudence




msg:4294200
 3:07 pm on Apr 8, 2011 (gmt 0)

How about these 'not followed' crawl errors. They appeared yesterday. I've obviously edited the sitename and containing subfolder but the actual 'pages' are verbatim. There's nothing in my robots txt and they don't exist anywhere on the site.

www.mysite.com/mysubfolder/uqhjphcfeqirpr.html
www.mysite.com/mysubfolder/uddwkraqi.html
www.mysite.com/mysubfolder/tcsnlpnglaicwmb.html

indyank




msg:4294264
 4:58 pm on Apr 8, 2011 (gmt 0)

@Prudence are you seeing 404s for them in GWT?

incrediBILL, what you state has been happening for a long time.I have seen it even in 2007.

but these new errors in GWT, post panda, seem to be interesting as now I see more and more people reporting it.Are these seen only on panda affected sites or do people see it on sites that aren't affected by panda too?

The errors for pages that never existed on the site are something new and I see these only after panda.They seem to have introduced something new in the way they crawl.I am not sure whether these are bugs in what they have introduced or they are testing something by trying to crawl these non-existing ones.

[edited by: indyank at 5:38 pm (utc) on Apr 8, 2011]

bumpski




msg:4294265
 4:59 pm on Apr 8, 2011 (gmt 0)

They are all script kiddie junk sites. Pages with lots of incorrectly coded outbound links. Plus, they are hot linking to my photos in Google's gstatic domain. I don't think they're getting that code right either.

Why does Google let people hot link to Google's copies of our images?

[t0.gstatic.com...]

My back deck!

It really only takes one bad coder to make thousands of bad websites!

ohno




msg:4294266
 5:01 pm on Apr 8, 2011 (gmt 0)

WMT's now showing 5 errors! That utility mentioned above works well but found no errors on our sites.

bumpski




msg:4294274
 5:10 pm on Apr 8, 2011 (gmt 0)

Actually some may be legitimate, but hacked, sites with thousands of junk pages tacked on to advertise ... whatever.

One site on HostGator was flagged and had a comment "Webmaster, please contact support".

Url was cgi-sys/suspendedpage.cgi So hopefully the ISP's are slowly cleaning this up.

indyank




msg:4294282
 5:20 pm on Apr 8, 2011 (gmt 0)

They scrape your images and they let others scrape them. How noble!

Prudence




msg:4294449
 11:54 pm on Apr 8, 2011 (gmt 0)

@indyank. Thanks for the response. They're under the notfollowed tab in crawl errors although there seems to be less since I last posted. What could they be? Never seen the like before in GWT.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved