|How to narrow down "Not Found" Site Errors?|
Only want to see external "link from" urls
So, my domain is almost 9 years old. When I first launched my ecommerce store, I used specific shopping cart software but since has relaunched under a brand new setup.
The problem is, when I go to Google Webmaster Tools and view 404 crawl errors, some of these are old urls such as.
My new setup doesn't use /?orderby=price or any parameter. Except when I click to view "link from", there is no source as to where Googlebot found this url.
Is it possible that I submitted an old sitemap to Google and it is somehow constantly looking for dead urls? I just don't understand why it is attempting to find files such as this.
Google never stops looking for URLs it has seen. Unfortunately. If it does not exist anywhere but it once did and now returns a 404, that is normal and just part of housekeeping. If you make new changes and use 301 redirects to the replacement URLs you could avoid that, but retroactively it is what it is. Google considers it as "just letting you know" and it does not cause any issues with a site.
If you want to sort through them you can download the list and look through it. I wouldn't worry about the "link from" source for long gone URLs, they keep old sitemaps in cache somewhere and pull them out now and then. If you are seeing something current in the "link from" field that would be something you might want to act on.
You could also go into the parameters area of wmt and say that such-and-such parameter doesn't affect page content. It won't stop them asking for nonexistent URLs, but it might stop them from asking for huge numbers of them.
:: wandering off to check my own wmt, now that I'm reminded I haven't even looked in a few months ::
@not2easy... so basically if Google crawled an old page of mine five years ago called /blog which is now /our-company-blog, all of those dead links that show up as /blog is just Google "letting me know", right?
So... I guess my thought if, is the "link from" an old /blog url is coming from an external url and may be considered a backlink, obviously I want to redirect it. BUT let's say the "link from" url is another /blog url and /blog urls are gone, my hope is that somebody Google catches on to that facet that /blog is gone instead of having to redirect 100+ /blog urls. Does that make sense?
@luck24 How do I go about paramater changes? Like, let's say I want to ignore, "?orderby=price" but there is also "name, price how to low, low to high, popularity, newness, etc" ... Can I set something up like "?orderby=*" to apply to all?
Also, do you usually get a list of parameters from your shopping cart software, etc?
Thanks for your responses.
Google showed me old URLs this week that were removed from my site over 5 years ago. They have not been in any sitemap or menu or anywhere on the site for all that time, but they keep telling me they got 404 errors when they tried to crawl them. I mark them "Fixed" again and just resign myself to the fact that they will not ever stop looking for those pages.
In GWT if you download the links in either CSV or excel format you can sort through the list and see what URL parameters you might want to turn off or see if there are URLs that might have inbound links important enough to work on.
NOTE: I have found the only way to make them stop crawling (and indexing) some parameters is to forget the parameters and use robots.txt to block them.
BUT at the end of the day, it's really just annoying but it isn't hurting anything. I have broken links plugin in my blog, which should really tell me about broken internal links. As far as broken external links, there is no way to narrow these down specially to sites with "link from" external urls? Those are the broken links I care about.
I've tried your export idea to a spreadsheet. It tells me the dead url, etc but does not show me a list of what links to it. Otherwise, I'd have to click on each url one-by-one and view it.
If you were to click each link that was downloaded from your GWT's 404 errors, they would only go to your site's 404 error page. Maybe try an inurl search?
I have read that this doesn't work anymore, but I have also seen it listed with results - so I suggest it in case it might work for you.
Your records should show you any referring sites over time that you might want to check into keeping. You are not likely to have any good backlinks that send traffic to an URL that is a cart URL or query.
gwt's parameters area looks at the name of a parameter, not its value. That's the whole point of excluding a parameter. So if you tell it that "orderby" doesn't matter, it doesn't matter if it's "orderby=carbonfootprint" or "orderby=shippingweight" or "orderby=returnpercentage" or et cetera.
First step is to look in the parameters area and see what they've already got listed. They probably already have everything.