Welcome to WebmasterWorld Guest from 54.163.35.238

Message Too Old, No Replies

Google is continually discovering old website content

     

Sgt_Kickaxe

2:36 am on May 23, 2011 (gmt 0)

WebmasterWorld Senior Member sgt_kickaxe is a WebmasterWorld Top Contributor of All Time 5+ Year Member



When using the "as_qdr=d7" parameter in Google search to see what Google has discovered on my site over the past 7 days I'm seeing some old content being considered new. In some cases the content is over 3 years old yet Google declares it "found" within the past 7 days.

example of a webmasterworld search: [google.com...]

What's interesting is that these pages have been dormant for some time and received no traffic, until being "discovered".

I checked two old sites of mine, neither of which is the type that requires updating, and both show several entries of old being considered new. Try it on a site that you haven't updated in some time and see if Google "just found" some of your content.

Possible use: This might be interesting if you go about getting links(internal or external) to a page that has received no Google traffic since publish and want to see if Google discovers the page after seeing new incoming links.

Samizdata

3:25 am on May 23, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



I tried your link and got the default ten results under the heading "Past Week".

Odd that the most recent ("Active Post List... 5 minutes ago") was in tenth position.

...

Sgt_Kickaxe

5:10 am on May 23, 2011 (gmt 0)

WebmasterWorld Senior Member sgt_kickaxe is a WebmasterWorld Top Contributor of All Time 5+ Year Member



"as_qdr=d7" tells Google to display only content it has found in the past 7 days while doing a site search, your results sound right. Try it on a site that has had no updates in a week, did Google find a page during the week anyway?

I'm thinking Google already knew about the page(s) but it didn't "qualify" to be ranked, probably because it was buried deep within my archives and had no incoming links.

sanjuu

1:00 pm on May 23, 2011 (gmt 0)



What should that query be showing?

I've done the same query on a site I'm working on, and it's showing pages in the results that Google has indexed ages ago, and these pages haven't changed for months and months.

Some seem to be new pages it's found, whereas others are pages that were indexed and ranking a long time ago.

Broadway

1:15 pm on May 23, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I don't really understand the results I see.
WMT (for months on end) says all of the URL's in my sitemap are in the index.
Yet when I do the "past week" search mentioned here lists 7 "new" URL's. Some of which at least I have visited with in the last week. (so they're not no-traffic pages).
Possibly I've updated the content (no matter how minor) on all of these pages since their last indexing.
Maybe that is what they consider "new" about these pages.

londrum

1:23 pm on May 23, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



i think that google has actually being crawling links from one of their old indexes, or from old pages that they've archived.

that is because during the last month or two i have been getting bazillions of new 404s errors in WMT. they all appeared out of the blue, and they're for very old pages that no longer exist on my site -- they've been deleted for 12 months or more. there is no way that google could have suddenly "discovered" them now. i tried to visit the links, just to be sure, and they dont exist.

and they are not the kind of pages to attract backlinks either, so it cant be because they've followed a link from another site.

the only reason that i can think of is that they've been crawling an old version of their index, or old pages that they've archived.

indyank

2:01 pm on May 23, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



It probably suggests that there is some huge movement happening in the back end infrastructure.

Broadway

2:10 pm on May 23, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Londrum, I noticed the exact same thing within the last week with WMT reported 404's. Out-of-the-blue 404's for long-gone content (at both ends of the link, source and destination).

sanjuu

3:26 pm on May 23, 2011 (gmt 0)



I've noticed the WMT 404s for ancient URLs (long gone and 'deindexed') since before Panda hit the UK (11 April), and they're still coming in.

indyank

3:28 pm on May 23, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



sanjuu, This happeneded for many after Feb 24. Were you seeing these errors bet. Feb 24 and April 11? This might provide some clue as to whether these were really triggered by Panda or it is just that google was updating the WMT backend.

StoutFiles

4:24 pm on May 23, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Discovering? You mean thieving, right?

sanjuu

5:09 pm on May 23, 2011 (gmt 0)



sanjuu, This happeneded for many after Feb 24. Were you seeing these errors bet. Feb 24 and April 11? This might provide some clue as to whether these were really triggered by Panda or it is just that google was updating the WMT backend.


They might have started in February (can't be sure), so yes, they might coincide with Panda when it first started in the States.

indyank

5:15 pm on May 23, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



sanjuu, you can see the discovery dates in GWT. If you look for the discovery dates of the earliest ones, it will provide a clue.

londrum

6:42 pm on May 23, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



this might be part of the reason why sites are getting "punished" by panda (...albeit a very small part)

google obviously believes that these pages are still part of our site, or they wouldn't suddenly be appearing like this. if google is punishing thin content, then having hundreds of 404s and hundreds of non-returnables is bound to have an effect.

walkman

7:19 pm on May 23, 2011 (gmt 0)




this might be part of the reason why sites are getting "punished" by panda (...albeit a very small part)

google obviously believes that these pages are still part of our site, or they wouldn't suddenly be appearing like this. if google is punishing thin content, then having hundreds of 404s and hundreds of non-returnables is bound to have an effect.


I doubt it, probably trying to re-index all the web again. I posted here a few days of very old (deleted) pages being asked for by Googlebot.

Hopefully we'll see some improvement after this.

tedster

7:54 pm on May 23, 2011 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



if google is punishing thin content, then having hundreds of 404s ...

Only if YOUR site links to those 404 URLs.

londrum

8:05 pm on May 23, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Only if YOUR site links to those 404 URLs.


but if they're factoring in all these old pages, then it would. they are crawling the links on an old version of your page.

walkman

10:52 pm on May 23, 2011 (gmt 0)



but if they're factoring in all these old pages, then it would. they are crawling the links on an old version of your page.


They are NOT dude. Google has a b list of really, really old links. They check them every few months to see.

sanjuu

2:26 pm on May 24, 2011 (gmt 0)




sanjuu, you can see the discovery dates in GWT. If you look for the discovery dates of the earliest ones, it will provide a clue.


Some of them go back to 2008, some show yesterday as the discovered date. In both cases both of the URLs are linked to from a page that was removed from the index months ago (successfully, as the removed page no longer appears in any searches, so seems to be de-indexed; but unsuccessfully in terms of google still using it as a source of internal links it seems).

kellyman

8:51 pm on May 24, 2011 (gmt 0)



Think Google is a bit buggy lately, its bot was visiting my old site URL's that was replaced in June 2010, i was thinking i had let loose some old pages, WMT's will show any real issues, just ignore it im sure Google will fix it soon

indyank

3:53 am on May 25, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Kellyman, they aren't buggy. They have all your pages on their servers and this includes versions of your pages.They don't need any links to get to those pages as they have all of them on their servers.

I strongly believe that they are validating whatever they have and trying to create a new database (I don't mean a database in the technical sense) for all the sites.They are surely doing a massive backend exercise and trying to have a fresh copy of every site.

Broadway

4:19 am on May 25, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I was checking my custom 404 page.
I'm using google's widget on this page where it makes a recommendation as to the closest matching URL.

On an exercise, such as looking for the non-existant page: www.example.com/pagee1.htm

It suggests you goto page:
www.example.com/Page1.htm

Years ago, while on a windows server, I made a lot of capitalization errors with URL's. I corrected this over 7 years ago.
I stayed on that windows server for years, so no redirect was needed, but last year I switched to a *nix server and did make a 301 for the mis-capitalization URL.

So the page, with that capitalization hasn't existed for 7 years. There's been a 301 for the mis-capitalization for almost a year, yet that URL is what google selects from its index instead of the existing, lower case version of that page.

That just seems strange to me.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month