homepage Welcome to WebmasterWorld Guest from 23.20.44.136
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Google is continually discovering old website content
Sgt_Kickaxe




msg:4316100
 2:36 am on May 23, 2011 (gmt 0)

When using the "as_qdr=d7" parameter in Google search to see what Google has discovered on my site over the past 7 days I'm seeing some old content being considered new. In some cases the content is over 3 years old yet Google declares it "found" within the past 7 days.

example of a webmasterworld search: [google.com...]

What's interesting is that these pages have been dormant for some time and received no traffic, until being "discovered".

I checked two old sites of mine, neither of which is the type that requires updating, and both show several entries of old being considered new. Try it on a site that you haven't updated in some time and see if Google "just found" some of your content.

Possible use: This might be interesting if you go about getting links(internal or external) to a page that has received no Google traffic since publish and want to see if Google discovers the page after seeing new incoming links.

 

Samizdata




msg:4316107
 3:25 am on May 23, 2011 (gmt 0)

I tried your link and got the default ten results under the heading "Past Week".

Odd that the most recent ("Active Post List... 5 minutes ago") was in tenth position.

...

Sgt_Kickaxe




msg:4316123
 5:10 am on May 23, 2011 (gmt 0)

"as_qdr=d7" tells Google to display only content it has found in the past 7 days while doing a site search, your results sound right. Try it on a site that has had no updates in a week, did Google find a page during the week anyway?

I'm thinking Google already knew about the page(s) but it didn't "qualify" to be ranked, probably because it was buried deep within my archives and had no incoming links.

sanjuu




msg:4316231
 1:00 pm on May 23, 2011 (gmt 0)

What should that query be showing?

I've done the same query on a site I'm working on, and it's showing pages in the results that Google has indexed ages ago, and these pages haven't changed for months and months.

Some seem to be new pages it's found, whereas others are pages that were indexed and ranking a long time ago.

Broadway




msg:4316246
 1:15 pm on May 23, 2011 (gmt 0)

I don't really understand the results I see.
WMT (for months on end) says all of the URL's in my sitemap are in the index.
Yet when I do the "past week" search mentioned here lists 7 "new" URL's. Some of which at least I have visited with in the last week. (so they're not no-traffic pages).
Possibly I've updated the content (no matter how minor) on all of these pages since their last indexing.
Maybe that is what they consider "new" about these pages.

londrum




msg:4316248
 1:23 pm on May 23, 2011 (gmt 0)

i think that google has actually being crawling links from one of their old indexes, or from old pages that they've archived.

that is because during the last month or two i have been getting bazillions of new 404s errors in WMT. they all appeared out of the blue, and they're for very old pages that no longer exist on my site -- they've been deleted for 12 months or more. there is no way that google could have suddenly "discovered" them now. i tried to visit the links, just to be sure, and they dont exist.

and they are not the kind of pages to attract backlinks either, so it cant be because they've followed a link from another site.

the only reason that i can think of is that they've been crawling an old version of their index, or old pages that they've archived.

indyank




msg:4316268
 2:01 pm on May 23, 2011 (gmt 0)

It probably suggests that there is some huge movement happening in the back end infrastructure.

Broadway




msg:4316283
 2:10 pm on May 23, 2011 (gmt 0)

Londrum, I noticed the exact same thing within the last week with WMT reported 404's. Out-of-the-blue 404's for long-gone content (at both ends of the link, source and destination).

sanjuu




msg:4316309
 3:26 pm on May 23, 2011 (gmt 0)

I've noticed the WMT 404s for ancient URLs (long gone and 'deindexed') since before Panda hit the UK (11 April), and they're still coming in.

indyank




msg:4316310
 3:28 pm on May 23, 2011 (gmt 0)

sanjuu, This happeneded for many after Feb 24. Were you seeing these errors bet. Feb 24 and April 11? This might provide some clue as to whether these were really triggered by Panda or it is just that google was updating the WMT backend.

StoutFiles




msg:4316341
 4:24 pm on May 23, 2011 (gmt 0)

Discovering? You mean thieving, right?

sanjuu




msg:4316381
 5:09 pm on May 23, 2011 (gmt 0)

sanjuu, This happeneded for many after Feb 24. Were you seeing these errors bet. Feb 24 and April 11? This might provide some clue as to whether these were really triggered by Panda or it is just that google was updating the WMT backend.


They might have started in February (can't be sure), so yes, they might coincide with Panda when it first started in the States.

indyank




msg:4316383
 5:15 pm on May 23, 2011 (gmt 0)

sanjuu, you can see the discovery dates in GWT. If you look for the discovery dates of the earliest ones, it will provide a clue.

londrum




msg:4316435
 6:42 pm on May 23, 2011 (gmt 0)

this might be part of the reason why sites are getting "punished" by panda (...albeit a very small part)

google obviously believes that these pages are still part of our site, or they wouldn't suddenly be appearing like this. if google is punishing thin content, then having hundreds of 404s and hundreds of non-returnables is bound to have an effect.

walkman




msg:4316461
 7:19 pm on May 23, 2011 (gmt 0)


this might be part of the reason why sites are getting "punished" by panda (...albeit a very small part)

google obviously believes that these pages are still part of our site, or they wouldn't suddenly be appearing like this. if google is punishing thin content, then having hundreds of 404s and hundreds of non-returnables is bound to have an effect.


I doubt it, probably trying to re-index all the web again. I posted here a few days of very old (deleted) pages being asked for by Googlebot.

Hopefully we'll see some improvement after this.

tedster




msg:4316473
 7:54 pm on May 23, 2011 (gmt 0)

if google is punishing thin content, then having hundreds of 404s ...

Only if YOUR site links to those 404 URLs.

londrum




msg:4316480
 8:05 pm on May 23, 2011 (gmt 0)

Only if YOUR site links to those 404 URLs.


but if they're factoring in all these old pages, then it would. they are crawling the links on an old version of your page.

walkman




msg:4316584
 10:52 pm on May 23, 2011 (gmt 0)

but if they're factoring in all these old pages, then it would. they are crawling the links on an old version of your page.


They are NOT dude. Google has a b list of really, really old links. They check them every few months to see.

sanjuu




msg:4316876
 2:26 pm on May 24, 2011 (gmt 0)


sanjuu, you can see the discovery dates in GWT. If you look for the discovery dates of the earliest ones, it will provide a clue.


Some of them go back to 2008, some show yesterday as the discovered date. In both cases both of the URLs are linked to from a page that was removed from the index months ago (successfully, as the removed page no longer appears in any searches, so seems to be de-indexed; but unsuccessfully in terms of google still using it as a source of internal links it seems).

kellyman




msg:4317087
 8:51 pm on May 24, 2011 (gmt 0)

Think Google is a bit buggy lately, its bot was visiting my old site URL's that was replaced in June 2010, i was thinking i had let loose some old pages, WMT's will show any real issues, just ignore it im sure Google will fix it soon

indyank




msg:4317271
 3:53 am on May 25, 2011 (gmt 0)

Kellyman, they aren't buggy. They have all your pages on their servers and this includes versions of your pages.They don't need any links to get to those pages as they have all of them on their servers.

I strongly believe that they are validating whatever they have and trying to create a new database (I don't mean a database in the technical sense) for all the sites.They are surely doing a massive backend exercise and trying to have a fresh copy of every site.

Broadway




msg:4317279
 4:19 am on May 25, 2011 (gmt 0)

I was checking my custom 404 page.
I'm using google's widget on this page where it makes a recommendation as to the closest matching URL.

On an exercise, such as looking for the non-existant page: www.example.com/pagee1.htm

It suggests you goto page:
www.example.com/Page1.htm

Years ago, while on a windows server, I made a lot of capitalization errors with URL's. I corrected this over 7 years ago.
I stayed on that windows server for years, so no redirect was needed, but last year I switched to a *nix server and did make a 301 for the mis-capitalization URL.

So the page, with that capitalization hasn't existed for 7 years. There's been a 301 for the mis-capitalization for almost a year, yet that URL is what google selects from its index instead of the existing, lower case version of that page.

That just seems strange to me.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved