Welcome to WebmasterWorld Guest from 54.205.106.138

Forum Moderators: Robert Charlton & aakk9999 & andy langton & goodroi

Message Too Old, No Replies

Google crawls and indexes Archive.org

     
2:53 pm on Jan 13, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 13, 2003
posts:701
votes: 0


site:archive.org today gives 9.75 million pages in G.

How does your duplicate content on those pages affect your site's G ranking?

We don't know, we just hope G don't ever get it wrong.

I've used G alerts since they began, but I've never seen an archive.org url cited in an alert... until today.

It's not an exact match of the phrase, just parts of it, nevertheless the point is, it is an
[archive.org...] url and so must be incorporated in the Google index at some level.

Deliberate or a slip up?

Another very good reason to get your sites pulled from archive.org as we have done.

20+ sites, but It wasn't too laborious a process.
Completed in 3-4 days. Gone from archive.org, wayback, and the dreaded alexa :)

4:23 pm on Jan 13, 2009 (gmt 0)

Senior Member

joined:Jan 27, 2003
posts:2534
votes: 0


Interesting that you had something show in Google alerts, but note that web.archive.org (where all the web pages are stored) is excluded via robots exclusion:

[web.archive.org...]

The URLs can still get URL-only listings if people link to them, but I've not seen anything else.

Was it a URL-only listing you had in Google alerts? (Note that sometimes such listings have a title of the link text pointing to them.)

The only other possibility is if archive.org are accidentally exposing their listings.

6:27 pm on Jan 13, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 22, 2003
posts:1230
votes: 0


As I reported in another thread Google is now allowing all sorts of duplicate content from various search engines and domain lookup services. Quite a bit is now competitive with your natural results or will be soon.
1:13 pm on Jan 14, 2009 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member 10+ Year Member

joined:June 12, 2003
posts:723
votes: 17


[google.com...] returns only two results for me. the main www.archive.org is different, no? The web.archive.org is where the dup issues may come from?
2:18 am on Jan 15, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 13, 2003
posts:701
votes: 0


The Google Alert displays both a heading and the searched for terms, and an archive.org url like this:

www.archive.org/stream/<rest of url>

The G serp shows a normal; Title, url and a snippet.

The target page in www.archive.org is cached in G, and the cache shows the G Alert terms.

The G cache url is formatted like this:

[209.85.175.132...] of url>

It looks to me like trouble at Google, and for everyone still unfortunate enough to be in archive.org

3:00 am on Jan 15, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


And as Receptional_Andy already posted, all the archived versions of websites are served from web.archive.org - NOT www.archive.org

There is no problem here.

1:39 am on Jan 16, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 13, 2003
posts:701
votes: 0


I understand tedster, but there are indeed 9.75 million pages in G for the term site:archive.org, >>all copies of our content<<, and the Alert shows they are in the mix. That is the point.
3:36 am on Jan 16, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


And there is just one result for site:web.archive.org/web/ - that's where all the Wayback machine copies are served. And even that one page is a url-only result, thanks to the robots.txt file.

Did the alert you received point to a copy of one of your web pages?

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members