homepage Welcome to WebmasterWorld Guest from 54.227.215.140
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Is Google Building a new Index?
Vimes

10+ Year Member



 
Msg#: 3738885 posted 7:39 am on Sep 5, 2008 (gmt 0)

Hi,

Over the last 7 days Iíve seen a validated Googlebot requesting 1000ís of urlís that havenít been on my website for years.

These are all returning 404ís now and have been for a long time some of these URLís that are being requested are up to 4 years old.

Ok, some could be from external links, but the quantity of these requests leads me to believe that there is something else going on.

My assumptions are:

a)Googleís building new indices.
b)A major change coming to the algorithm in the next few weeks
c)My sites in deep do-do

We've got sitemap files so why ask for these URL's all of a sudden

Any one else report this type of activity.

Vimes.

 

bolognese

5+ Year Member



 
Msg#: 3738885 posted 8:17 am on Sep 5, 2008 (gmt 0)

Hi there,

I posted the same almost the same time like you.

It's true. In my webmaster tool I can see 94 404's that have been disallowed by me long time ago and removed from the index with the url removal tool.

Jos

Vimes

10+ Year Member



 
Msg#: 3738885 posted 8:31 am on Sep 5, 2008 (gmt 0)

Hi,
well these requests aren't disallowed URL's, either from robots text or the url removal tool.

The removal tool if i'm correct only has a limited period of time i think 6 months before google tries the re-index them.

the requests on my server haven't existed in years. it looks like they are re-crawling every known url they have ever had of my site.

I've checked the site and its clean its not coming from me.

Vimes.

bolognese

5+ Year Member



 
Msg#: 3738885 posted 8:35 am on Sep 5, 2008 (gmt 0)

So if anybody had any information on his website he wants to completely be removed and never come back, he never will succeed, because google (or any other) will always keep some information stored internally?

santapaws

5+ Year Member



 
Msg#: 3738885 posted 9:21 am on Sep 5, 2008 (gmt 0)

yes they must keep every record of everything they have learnt about any indexed website but that's not the same as thinking they would show that information again. I guess its possible though. The point i would think is that at some point you made that information publically avilable so they indexed it legitimately.

Vimes

10+ Year Member



 
Msg#: 3738885 posted 10:37 am on Sep 5, 2008 (gmt 0)

if nobody else is getting this, then i can only assume its my site only.

yes its definitely a deep crawl but with deep crawls in the passed i've never had thousands of old 404 urls requested like i'm having at present.

makes me nerves to see this abnormal crawling happening.

Vimes

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3738885 posted 1:56 pm on Sep 5, 2008 (gmt 0)

I'm seeing a few of these 404s as well. However, I have very very few URLs that have ever changed or been removed (maybe a couple of dozen over 12 years), so the number of possible 404s caused by "old stale links" to my sites is very low.

Anyway, there's a second "sample" for your theory -- I doubt that we're the only ones, but maybe just the among the first who post here who've noticed this.

Jim

rainborick

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3738885 posted 2:39 pm on Sep 5, 2008 (gmt 0)

I've seen this happen from time to time for at least the past year, so I don't think it's anything really new. I've been particularly aware of it because I moved a directory from one site (where the pages didn't really ever belong) to a subdomain on an appropriate site of mine that I moved to its own domain last August. Google had discovered some test pages on the original site, and a few on the old subdomain, which had been deleted a 2+ years ago. Only one or two of these pages had ever been linked to, even indirectly, but I still see the 404's from Googlebot's attempts to crawl them from time to time.
Both of these sites perform as I'd expect, so I don't worry about these stray 404's. And in general, I don't think even thousands of 404's would cause any problems if your sitemap doesn't include the URLs and there's no internal links to them. Search engines have to allow for an occasional restructuring on a large site without any negative implications.

santapaws

5+ Year Member



 
Msg#: 3738885 posted 8:55 pm on Sep 5, 2008 (gmt 0)

well why not stick up a 410? a 404 doesn't really say a file is gone forever.

tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3738885 posted 8:58 pm on Sep 5, 2008 (gmt 0)

Technically that makes some sense, but Google treats a 410 and a 404 identically. They will continue to check for the url on a declining schedule far into the future.

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3738885 posted 9:05 pm on Sep 5, 2008 (gmt 0)

Right, and it appears that "they're back" checking for removed URLs that are so old that I pulled the 410s several years ago just to save a little code space. That is all that was noticeable about what I saw in my logs.

Jim

outland88

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3738885 posted 12:18 am on Sep 6, 2008 (gmt 0)

I checked out what you said Vimes and Google did do a complete roll-through of my site 12 hours ago. They also did that last month. I wondered why they did it then because nothing was updated in the caches. My theory is they are realizing their crawlers are missing a lot with incremental crawls. Iím probably wrong and go with answer C.

Vimes

10+ Year Member



 
Msg#: 3738885 posted 10:12 am on Sep 7, 2008 (gmt 0)

Well I seem to experience a deep crawl on the site just before my GWT updates itself for links extra so I wasnít surprised to see it as that normally happens for me at the end of each month, the only difference this time was the amount of old URLís that were requested, thatís why I left it a week before mentioning anything, normally this activity dies off after a day or three and things return to normal and life goes on. But this activity has only just dropped off, over the last 10 days Iíve had just over 25,000 404ís requested from Googlebot. 95% of these are URLís that disappeared off my site after we restructured the entire website just over 4 years ago, yes we had redirects on these for a while, these were probably removed about 2 years ago.

I recently added a 301 redirect to my root stopping any issues with the www.domain.com./ , I wouldnít have thought that this would have caused any huge requests for URLís redirecting to 404 pages as a page that isnít there just isnít there.
But I guess it might have channeled the Bot to recheck, Iím checking my 301 redirect logs now and so far havenít come across any www.doamin.com./ 301 redirects landing to a 404 page.

I really get nervous when googlebot does funky stuff like this, for me its never been a good.

Vimes.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3738885 posted 11:34 am on Sep 7, 2008 (gmt 0)

You remove a page from you site.

Google sees the 404 and stops showing the URL in their results.

They test the URL again, from time to time, to see if it gets re-used.

Months later, they find a link to that page from a page they had never spidered before. What to do? Is this a new link to you, because your page has now come back? Is this an old link they hadn't previously noticed?

Whatever, once a URL "exists" it will be checked from time to time, forever, in case the status of the URL has changed in any way.

.

I don't think 410 can mean "forever".

Think about it.

I 410 www.domain.com/index.html and 5 years later let the domain lapse.

Someone else buys it a year or two later. Should the "410 Gone Forever" still apply?

No. Of course not.

Discovery of new links to a previously 404 or 410 URL may lead this process, as may change of ownership information.

.

This is why from Day Zero you should not let your website respond to *any* "stray" URL requests.

.

Now, say someone has linked to you as www.domain.com/index.hmtl, then that URL "exists", and will be internally indexed as a 404.

Google has to keep a record of that URL and the fact that it is "bad", otherwise they will have to go on discovery every time they spider the page the duff link is on.

What if that page has a large number of such duff links? Do you think they might have a routine to mark *that* page as bad instead/as well, and save some crawler work.

[edited by: Robert_Charlton at 7:44 am (utc) on Sep. 10, 2008]
[edit reason] fixed example per poster [/edit]

BillyS

WebmasterWorld Senior Member billys us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3738885 posted 2:15 pm on Sep 7, 2008 (gmt 0)

As long as a link is out there to a gone page, it's probably a good idea to return a 410 or 404. I agree with g1smd.

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3738885 posted 2:28 pm on Sep 7, 2008 (gmt 0)

I recently added a 301 redirect to my root stopping any issues with the www.example.com./ , I wouldnít have thought that this would have caused any huge requests for URLís redirecting to 404 pages as a page that isnít there just isnít there.
But I guess it might have channeled the Bot to recheck, Iím checking my 301 redirect logs now and so far havenít come across any www.example.com./ 301 redirects landing to a 404 page.

This would not have "caused" Googlebot to do anything, since Gbot would have to request example.com./ in order to "discover" this redirect. Otherwise, the addition of this redirect is invisible to Gbot, since your server only responds to client requests -- There is nothing in a server that will "send a notice" to search engine spiders about such changes; How would the server know who to notify?

More likely Gbot is just checking through its historical "dead link" data for each of our sites, and in a few cases, might have found an obsolete link out on the Web somewhere.

It's interesting to me that they're doing this all at once in a noticeably-large "batch" -- So possibly there is some kind of clean-up or archiving process taking place.

Jim

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3738885 posted 3:28 pm on Sep 7, 2008 (gmt 0)

Google used to reindex their Supplemental database several times per year. At that time I would see a lot of changes. The fresh tags and Supplemental tags, as well as the cache date, were key to spotting what was going on - and Google has now hidden most of that data.

At the moment I don't have any sites that I could track to look for that same pattern. However, in any case, the Supplemental refresh is supposed to be a much more frequent and ongoing thing if I correctly understood what Matt Cutts said about that topic some time about a year or so ago.

caribguy

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3738885 posted 12:29 am on Sep 10, 2008 (gmt 0)


Now, say someone has linked to you as www.domain.com/index.hmtl, then that URL "exists", and will be internally indexed as a 404.

So what to do if someone links to you as www.example.com/keywor/index.html - would it be sensible to create a 301 redirect to /keyword/index.html ?

[edited by: Robert_Charlton at 7:45 am (utc) on Sep. 10, 2008]
[edit reason] updated reference to earlier example [/edit]

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3738885 posted 1:57 am on Sep 10, 2008 (gmt 0)

> would it be sensible to create a 301 redirect to /keyword/index.html ?

Yes, sensible and advisable... Recover the traffic and recover the PageRank.

Jim

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3738885 posted 6:49 am on Sep 10, 2008 (gmt 0)

I would redirect to www.domain.com/keyword/ with a trailing / on the end.

The index.html part is redundant. Omit it.

That redirect, being "specific", would be placed before all of the other redirects.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved