Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

WMT not found errors back again

         

Shepherd

1:12 pm on Aug 4, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Got another notice from WMT about not found errors. About 13,000 errors, again. Cleared them last year and now they're back. These are pages that are not there and they are not linked to from anywhere on our site, not sure why google keeps crawling them.

Thinking about just leaving them this time around instead of clearing them from WMT, maybe if we leave them google will stop crawling them. It's a bit of a pain in the rear to clear them anyway since we can only clear a 1000 a day.

not2easy

2:27 pm on Aug 4, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



It really is better to clear them if only to let Google know that you are aware that those URLs are not there. I saw them start again in mid July crawling o0Oold URLs that have not existed on that site for over 5 years. The worst problem I have with their 404 lists is that it is obvious they do not use current sitemaps or they would not be finding so many 404s. I jump through the hoops resubmitting sitemaps and it keeps the numbers down - for a short time anyway. (sigh.)

lucy24

3:32 pm on Aug 4, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Alternative answer: Ignore them and they'll go away. Clear the current list just so you'll notice more easily when they add new ones.

Does g### even know that you're cleared "errors"? That is, of course they can know, but do they gain anything by acting on the knowledge?

it is obvious they do not use current sitemaps

Their approach to sitemaps seems to be cumulative. If the URL has ever been listed on any sitemap anywhere, they'll remember it.

Shepherd

3:55 pm on Aug 4, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ignore them and they'll go away.


If only that were true. They keep going back though. I'm thinking maybe if I leave the error notice in WMT they might stop crawling the listed URLs, I don't know, really just tired of dealing with it.

not2easy

3:56 pm on Aug 4, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



True, it is an exercise in futility and a PITA to boot with their "daily limit". Since Google claims the errors are only listed to let you know that the page is not found, I try to let them know that it's OK. I do not think they take any action at all when the 404s are acknowledged, unfortunately. The same errors will be back next time they take a notion to check again. They should offer people a method to disavow old URLs. As mentioned, it makes it easier to notice any that should not be returning a 404.

lucy24

8:28 pm on Aug 4, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



They should offer people a method to disavow old URLs.

They do. Or rather, the http standard does. It's called a 410 ;)

Now, if only search engines (all of 'em) could learn to recognize 404s on URLs that have never existed -- doubly so if they were never linked from your own site. The ones that I tend to dismiss as having no existence outside of google's* fevered imagination.


* Specifically google, because bing has a rare and special talent for finding typos in links that I fixed within five minutes of creation-- i.e. one minute after bing first crawled the URL. But they do finally seem to have come to grips with the fact that I never had, and never will have, a page called innuuniq.html. Whew.

Robert Charlton

9:49 pm on Aug 4, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Resurgence of old 404s often happens at a time of a big change in the index. My interpretation is that Google apparently resurfaces its 404 lists to compare old lists with the present situation and eventually to generate clean lists over time.

I think Google believes they're doing us a favor to let us know about them.

See my comments prompted by an interview with the Google Sitemaps Team, noted in my last post on this thread...

17 May 2013 - GWT Sudden Surge in Crawl Errors for Pages Removed 2 Years Ago?
http://www.webmasterworld.com/google/4575982.htm [webmasterworld.com]

The thread covers some of the same ground we're covering now. Maybe this suggests that a big update is coming.

Shepherd

12:51 am on Aug 5, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Maybe this suggests that a big update is coming.


That would be welcomed by me. You are correct, now that I think about it, seems like there has been significant updates in the past soon after we have seen massive 404 notices.

Here's something interesting, WMT is telling me that google found over 4000 (of the 13,000) 404 urls on 8/1 - 8/2 BUT according to the crawl stats they only crawled 4075 pages total on those 2 days.

It's more than 4000, just don't know how many more cause we can only see 1,000/day. So basically they're telling me they found more bad pages than they actually crawled. Maybe the crawl that found the 404's is different than the normal crawl. If that's the case, Robert's theory of a impending update is even more likely.

Awarn

12:55 pm on Aug 5, 2014 (gmt 0)

10+ Year Member



They will keep bringing back 410 pages too so that doesn't work either (even when they show it the page shows it is 410). It is like Google is in an infinite loop and their rankings look like that too. I have at times wondered if this is a form of negative seo. Where another site purposely links to pages that are dead or they create nonexistent pages in an effort to generate 404s on a site. You could use the removal tool but that would take a long time.

iammeiamfree

1:37 pm on Aug 5, 2014 (gmt 0)

10+ Year Member



They probably just check the urls incase someone accidentaly reuploaded the data so they have it incase there is anything interesting on those pages .

<snip>

[edited by: goodroi at 1:51 pm (utc) on Aug 5, 2014]
[edit reason] Let's keep the focus on SEO in the Google SEO section :) [/edit]

Planet13

3:21 pm on Aug 5, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



They probably just check the urls incase someone accidentaly reuploaded the data so they have it incase there is anything interesting on those pages .


Could be.

I know that they have stated that they try to do stuff to prevent webmasters from shooting themselves in the foot. Searching old links might be one way.

Shepherd

3:42 pm on Aug 5, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm not sure that there is any benefit to google or anyone crawling and reporting non-existent urls over and over. Seems like an incredible waste of resources, theirs and mine.

lucy24

6:57 pm on Aug 5, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



They probably just check the urls incase someone accidentaly reuploaded the data

This is a bit silly of them, because if you did restore the URL, there would be fresh links to it and search engines would "discover" it all over again. They wouldn't need to search their database for material that was last seen in 2007.

Google, unlike certain other search engines, definitely understands the 410 response. They stop requesting the URL a lot sooner. Even with 404s, they give up after a while, again compared to That Other Search Engine. But that's a matter of overal request frequency, not wiped-from-memory-for-all-time-never-to-be-seen-again.

Shepherd

7:29 pm on Aug 5, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Would be nice to hear from google as to why they continue to crawl un-linked 404 urls.

denisl

8:31 pm on Aug 6, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



What I find strange is the 404s they are showing again as detected in the last week, where the "linked from page" has not existed for several years. So how they just re-descovered it?

Shepherd

9:15 pm on Aug 6, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So how they just re-descovered it?


I gota figure they are crawling from an internal database, not a regular crawl from links.