| 1:03 am on Apr 6, 2011 (gmt 0)|
Any clues about the source under the "Linked From" column on the right? Or are they all listed as "unavailable"?
At any rate, no matter how googlebot it picking up those URLs, if they really should be 404, you shouldn't have any worries as long as the source isn't your own website.
| 6:25 am on Apr 6, 2011 (gmt 0)|
There was an issue about this back in feb with urls appearing in the crawl errors with hot linking to images.
I had a few of these bogus errors appear and then vanish soon after. In the last week, they have reappeared with the same date stamp as before... so not sure if WMT is reporting these correctly.
This is also being discussed over on the Google Help Forum
| 7:01 am on Apr 6, 2011 (gmt 0)|
I did see this and report it here two weeks back. I haven't seen these kind of non-existing urls being reported as errors before the panda update.
| 7:49 am on Apr 6, 2011 (gmt 0)|
The sources are all listed as "unvailable".
But did some digging and I think I figured out where they're coming from...
It's a problem with multiple layers..
First... Basically Google is for some reason, indexing the wrong version of URLs on my site. The versions it is indexing also work and take you to the post, and not a 404 but they are the wrong version of the URL and I have no idea how Google is getting them.
Second... on those wrong working (but wrong) versions of URLs that it's indexing, are relative links to other pages on the site. Those relative links are broken on the wrong, working versions of the URLs... because that page isn't supposed to exist.
Third, Google then tries to index those wrong, broken versions of URLs on the wrong, working pages it is already erroneously indexing... but can't because the links don't work.
I have already, theoretically fixed the first issue with indexing wrong, working URLs by putting canonicals on them, but this doesn't stop Google from the other, new error being caused by the original error.
So I wonder how damaging to me these 404s are?
It's worth noting that I NEVER got these 404 errors in my Webmaster tools before, until a few days ago. And nothing on my site has changed at all.
| 7:53 am on Apr 7, 2011 (gmt 0)|
Just an update today to say that my old crawl errors have now gone from WMT.
Anyone else seeing this?
| 6:45 pm on Apr 7, 2011 (gmt 0)|
Today we had over 150 crawl errors disappear from WMT's! None seemed to be legit problems.
| 7:27 pm on Apr 7, 2011 (gmt 0)|
Just a reminder, and it's weird to have to say this; Make sure you "view source" on the links Google displays in WMT. Many times there are non-printing or invisible characters in the href in the source code.
actually looks like this when you view source of the WMT page:
Even the anchor text has hidden characters:
Actually in the source:
I think this is some kind of line wrapping problem. I researched in the past but forget what I found.
In this case this is an example of a mystery 400 error. The links look fine in WMT until you view source!
Just added confusion.
| 6:29 am on Apr 8, 2011 (gmt 0)|
& this morning we are back to 130 errors!
| 12:01 pm on Apr 8, 2011 (gmt 0)|
same here - my old errors are back. ho hum.
| 12:17 pm on Apr 8, 2011 (gmt 0)|
Check you source code. We had a similar problem. Cone to find out there was a broken link with no text/desc in the source of a page that we could not see!
Try Xenu. It can crawl and report such broken links.
| 12:30 pm on Apr 8, 2011 (gmt 0)|
/\ But how would that explain them coming & going? The dates on ours are from months ago.
| 12:35 pm on Apr 8, 2011 (gmt 0)|
I find mostly this problem comes from buggy kiddie scripts dumping out scraped auto-gen content. Some make all my URLs lower case, others replace the "&" in parameters with "~" and so on and so forth.
I'm actually starting to wonder if some of this nonsense isn't some new form of lame BH anti-SEO technique designed to drives sites down in the index based on mass quantities of bad IBLs. Maybe not, but it's happening way to much just for stupid chance and can there really be that many bad programmers cranking out broken auto-gen sites?
| 2:04 pm on Apr 8, 2011 (gmt 0)|
|I'm actually starting to wonder if some of this nonsense isn't some new form of lame BH anti-SEO technique designed to drives sites down in the index based on mass quantities of bad IBLs. |
I use to think that was the case for me, until I found some bugs in my code. My best advice, if you combed over your site for bugs 10 times already, do it 10 more times. If you have a large site, you will be surprised what you can find.
| 3:07 pm on Apr 8, 2011 (gmt 0)|
How about these 'not followed' crawl errors. They appeared yesterday. I've obviously edited the sitename and containing subfolder but the actual 'pages' are verbatim. There's nothing in my robots txt and they don't exist anywhere on the site.
| 4:58 pm on Apr 8, 2011 (gmt 0)|
@Prudence are you seeing 404s for them in GWT?
incrediBILL, what you state has been happening for a long time.I have seen it even in 2007.
but these new errors in GWT, post panda, seem to be interesting as now I see more and more people reporting it.Are these seen only on panda affected sites or do people see it on sites that aren't affected by panda too?
The errors for pages that never existed on the site are something new and I see these only after panda.They seem to have introduced something new in the way they crawl.I am not sure whether these are bugs in what they have introduced or they are testing something by trying to crawl these non-existing ones.
[edited by: indyank at 5:38 pm (utc) on Apr 8, 2011]
| 4:59 pm on Apr 8, 2011 (gmt 0)|
They are all script kiddie junk sites. Pages with lots of incorrectly coded outbound links. Plus, they are hot linking to my photos in Google's gstatic domain. I don't think they're getting that code right either.
Why does Google let people hot link to Google's copies of our images?
My back deck!
It really only takes one bad coder to make thousands of bad websites!
| 5:01 pm on Apr 8, 2011 (gmt 0)|
WMT's now showing 5 errors! That utility mentioned above works well but found no errors on our sites.
| 5:10 pm on Apr 8, 2011 (gmt 0)|
Actually some may be legitimate, but hacked, sites with thousands of junk pages tacked on to advertise ... whatever.
One site on HostGator was flagged and had a comment "Webmaster, please contact support".
Url was cgi-sys/suspendedpage.cgi So hopefully the ISP's are slowly cleaning this up.
| 5:20 pm on Apr 8, 2011 (gmt 0)|
They scrape your images and they let others scrape them. How noble!
| 11:54 pm on Apr 8, 2011 (gmt 0)|
@indyank. Thanks for the response. They're under the notfollowed tab in crawl errors although there seems to be less since I last posted. What could they be? Never seen the like before in GWT.