| 4:23 pm on Jan 13, 2009 (gmt 0)|
Interesting that you had something show in Google alerts, but note that web.archive.org (where all the web pages are stored) is excluded via robots exclusion:
The URLs can still get URL-only listings if people link to them, but I've not seen anything else.
Was it a URL-only listing you had in Google alerts? (Note that sometimes such listings have a title of the link text pointing to them.)
The only other possibility is if archive.org are accidentally exposing their listings.
| 6:27 pm on Jan 13, 2009 (gmt 0)|
As I reported in another thread Google is now allowing all sorts of duplicate content from various search engines and domain lookup services. Quite a bit is now competitive with your natural results or will be soon.
| 1:13 pm on Jan 14, 2009 (gmt 0)|
[google.com...] returns only two results for me. the main www.archive.org is different, no? The web.archive.org is where the dup issues may come from?
| 2:18 am on Jan 15, 2009 (gmt 0)|
The Google Alert displays both a heading and the searched for terms, and an archive.org url like this:
www.archive.org/stream/<rest of url>
The G serp shows a normal; Title, url and a snippet.
The target page in www.archive.org is cached in G, and the cache shows the G Alert terms.
The G cache url is formatted like this:
[188.8.131.52...] of url>
It looks to me like trouble at Google, and for everyone still unfortunate enough to be in archive.org
| 3:00 am on Jan 15, 2009 (gmt 0)|
And as Receptional_Andy already posted, all the archived versions of websites are served from web.archive.org - NOT www.archive.org
There is no problem here.
| 1:39 am on Jan 16, 2009 (gmt 0)|
I understand tedster, but there are indeed 9.75 million pages in G for the term site:archive.org, >>all copies of our content<<, and the Alert shows they are in the mix. That is the point.
| 3:36 am on Jan 16, 2009 (gmt 0)|
And there is just one result for site:web.archive.org/web/ - that's where all the Wayback machine copies are served. And even that one page is a url-only result, thanks to the robots.txt file.
Did the alert you received point to a copy of one of your web pages?