Welcome to WebmasterWorld Guest from 54.145.58.37

Forum Moderators: Robert Charlton & aakk9999 & andy langton & goodroi

Message Too Old, No Replies

Is Google Building a new Index?

     
7:39 am on Sep 5, 2008 (gmt 0)

Preferred Member

10+ Year Member

joined:Oct 19, 2004
posts:355
votes: 1


Hi,

Over the last 7 days Iíve seen a validated Googlebot requesting 1000ís of urlís that havenít been on my website for years.

These are all returning 404ís now and have been for a long time some of these URLís that are being requested are up to 4 years old.

Ok, some could be from external links, but the quantity of these requests leads me to believe that there is something else going on.

My assumptions are:

a)Googleís building new indices.
b)A major change coming to the algorithm in the next few weeks
c)My sites in deep do-do

We've got sitemap files so why ask for these URL's all of a sudden

Any one else report this type of activity.

Vimes.

8:17 am on Sept 5, 2008 (gmt 0)

New User

5+ Year Member

joined:Aug 19, 2008
posts:23
votes: 0


Hi there,

I posted the same almost the same time like you.

It's true. In my webmaster tool I can see 94 404's that have been disallowed by me long time ago and removed from the index with the url removal tool.

Jos

8:31 am on Sept 5, 2008 (gmt 0)

Preferred Member

10+ Year Member

joined:Oct 19, 2004
posts:355
votes: 1


Hi,
well these requests aren't disallowed URL's, either from robots text or the url removal tool.

The removal tool if i'm correct only has a limited period of time i think 6 months before google tries the re-index them.

the requests on my server haven't existed in years. it looks like they are re-crawling every known url they have ever had of my site.

I've checked the site and its clean its not coming from me.

Vimes.

8:35 am on Sept 5, 2008 (gmt 0)

New User

5+ Year Member

joined:Aug 19, 2008
posts:23
votes: 0


So if anybody had any information on his website he wants to completely be removed and never come back, he never will succeed, because google (or any other) will always keep some information stored internally?
9:21 am on Sept 5, 2008 (gmt 0)

Preferred Member

5+ Year Member

joined:Dec 19, 2007
posts:404
votes: 0


yes they must keep every record of everything they have learnt about any indexed website but that's not the same as thinking they would show that information again. I guess its possible though. The point i would think is that at some point you made that information publically avilable so they indexed it legitimately.
10:37 am on Sept 5, 2008 (gmt 0)

Preferred Member

10+ Year Member

joined:Oct 19, 2004
posts:355
votes: 1


if nobody else is getting this, then i can only assume its my site only.

yes its definitely a deep crawl but with deep crawls in the passed i've never had thousands of old 404 urls requested like i'm having at present.

makes me nerves to see this abnormal crawling happening.

Vimes

1:56 pm on Sept 5, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


I'm seeing a few of these 404s as well. However, I have very very few URLs that have ever changed or been removed (maybe a couple of dozen over 12 years), so the number of possible 404s caused by "old stale links" to my sites is very low.

Anyway, there's a second "sample" for your theory -- I doubt that we're the only ones, but maybe just the among the first who post here who've noticed this.

Jim

2:39 pm on Sept 5, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 15, 2003
posts:927
votes: 21


I've seen this happen from time to time for at least the past year, so I don't think it's anything really new. I've been particularly aware of it because I moved a directory from one site (where the pages didn't really ever belong) to a subdomain on an appropriate site of mine that I moved to its own domain last August. Google had discovered some test pages on the original site, and a few on the old subdomain, which had been deleted a 2+ years ago. Only one or two of these pages had ever been linked to, even indirectly, but I still see the 404's from Googlebot's attempts to crawl them from time to time.
Both of these sites perform as I'd expect, so I don't worry about these stray 404's. And in general, I don't think even thousands of 404's would cause any problems if your sitemap doesn't include the URLs and there's no internal links to them. Search engines have to allow for an occasional restructuring on a large site without any negative implications.
8:55 pm on Sept 5, 2008 (gmt 0)

Preferred Member

5+ Year Member

joined:Dec 19, 2007
posts:404
votes: 0


well why not stick up a 410? a 404 doesn't really say a file is gone forever.
8:58 pm on Sept 5, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


Technically that makes some sense, but Google treats a 410 and a 404 identically. They will continue to check for the url on a declining schedule far into the future.
9:05 pm on Sept 5, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Right, and it appears that "they're back" checking for removed URLs that are so old that I pulled the 410s several years ago just to save a little code space. That is all that was noticeable about what I saw in my logs.

Jim

12:18 am on Sept 6, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 22, 2003
posts:1230
votes: 0


I checked out what you said Vimes and Google did do a complete roll-through of my site 12 hours ago. They also did that last month. I wondered why they did it then because nothing was updated in the caches. My theory is they are realizing their crawlers are missing a lot with incremental crawls. Iím probably wrong and go with answer C.
10:12 am on Sept 7, 2008 (gmt 0)

Preferred Member

10+ Year Member

joined:Oct 19, 2004
posts:355
votes: 1


Well I seem to experience a deep crawl on the site just before my GWT updates itself for links extra so I wasnít surprised to see it as that normally happens for me at the end of each month, the only difference this time was the amount of old URLís that were requested, thatís why I left it a week before mentioning anything, normally this activity dies off after a day or three and things return to normal and life goes on. But this activity has only just dropped off, over the last 10 days Iíve had just over 25,000 404ís requested from Googlebot. 95% of these are URLís that disappeared off my site after we restructured the entire website just over 4 years ago, yes we had redirects on these for a while, these were probably removed about 2 years ago.

I recently added a 301 redirect to my root stopping any issues with the www.domain.com./ , I wouldnít have thought that this would have caused any huge requests for URLís redirecting to 404 pages as a page that isnít there just isnít there.
But I guess it might have channeled the Bot to recheck, Iím checking my 301 redirect logs now and so far havenít come across any www.doamin.com./ 301 redirects landing to a 404 page.

I really get nervous when googlebot does funky stuff like this, for me its never been a good.

Vimes.

11:34 am on Sept 7, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


You remove a page from you site.

Google sees the 404 and stops showing the URL in their results.

They test the URL again, from time to time, to see if it gets re-used.

Months later, they find a link to that page from a page they had never spidered before. What to do? Is this a new link to you, because your page has now come back? Is this an old link they hadn't previously noticed?

Whatever, once a URL "exists" it will be checked from time to time, forever, in case the status of the URL has changed in any way.

.

I don't think 410 can mean "forever".

Think about it.

I 410 www.domain.com/index.html and 5 years later let the domain lapse.

Someone else buys it a year or two later. Should the "410 Gone Forever" still apply?

No. Of course not.

Discovery of new links to a previously 404 or 410 URL may lead this process, as may change of ownership information.

.

This is why from Day Zero you should not let your website respond to *any* "stray" URL requests.

.

Now, say someone has linked to you as www.domain.com/index.hmtl, then that URL "exists", and will be internally indexed as a 404.

Google has to keep a record of that URL and the fact that it is "bad", otherwise they will have to go on discovery every time they spider the page the duff link is on.

What if that page has a large number of such duff links? Do you think they might have a routine to mark *that* page as bad instead/as well, and save some crawler work.

[edited by: Robert_Charlton at 7:44 am (utc) on Sep. 10, 2008]
[edit reason] fixed example per poster [/edit]

2:15 pm on Sept 7, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member billys is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 1, 2004
posts:3181
votes: 0


As long as a link is out there to a gone page, it's probably a good idea to return a 410 or 404. I agree with g1smd.
2:28 pm on Sept 7, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


I recently added a 301 redirect to my root stopping any issues with the www.example.com./ , I wouldnít have thought that this would have caused any huge requests for URLís redirecting to 404 pages as a page that isnít there just isnít there.
But I guess it might have channeled the Bot to recheck, Iím checking my 301 redirect logs now and so far havenít come across any www.example.com./ 301 redirects landing to a 404 page.

This would not have "caused" Googlebot to do anything, since Gbot would have to request example.com./ in order to "discover" this redirect. Otherwise, the addition of this redirect is invisible to Gbot, since your server only responds to client requests -- There is nothing in a server that will "send a notice" to search engine spiders about such changes; How would the server know who to notify?

More likely Gbot is just checking through its historical "dead link" data for each of our sites, and in a few cases, might have found an obsolete link out on the Web somewhere.

It's interesting to me that they're doing this all at once in a noticeably-large "batch" -- So possibly there is some kind of clean-up or archiving process taking place.

Jim

3:28 pm on Sept 7, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Google used to reindex their Supplemental database several times per year. At that time I would see a lot of changes. The fresh tags and Supplemental tags, as well as the cache date, were key to spotting what was going on - and Google has now hidden most of that data.

At the moment I don't have any sites that I could track to look for that same pattern. However, in any case, the Supplemental refresh is supposed to be a much more frequent and ongoing thing if I correctly understood what Matt Cutts said about that topic some time about a year or so ago.

12:29 am on Sept 10, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:Feb 16, 2007
posts: 846
votes: 0



Now, say someone has linked to you as www.domain.com/index.hmtl, then that URL "exists", and will be internally indexed as a 404.

So what to do if someone links to you as www.example.com/keywor/index.html - would it be sensible to create a 301 redirect to /keyword/index.html ?

[edited by: Robert_Charlton at 7:45 am (utc) on Sep. 10, 2008]
[edit reason] updated reference to earlier example [/edit]

1:57 am on Sept 10, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


> would it be sensible to create a 301 redirect to /keyword/index.html ?

Yes, sensible and advisable... Recover the traffic and recover the PageRank.

Jim

6:49 am on Sept 10, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


I would redirect to www.domain.com/keyword/ with a trailing / on the end.

The index.html part is redundant. Omit it.

That redirect, being "specific", would be placed before all of the other redirects.

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members