Receptional Andy

msg:3825409 | 4:23 pm on Jan 13, 2009 (gmt 0) |
Interesting that you had something show in Google alerts, but note that web.archive.org (where all the web pages are stored) is excluded via robots exclusion: [web.archive.org...] The URLs can still get URL-only listings if people link to them, but I've not seen anything else. Was it a URL-only listing you had in Google alerts? (Note that sometimes such listings have a title of the link text pointing to them.) The only other possibility is if archive.org are accidentally exposing their listings.
|
outland88

msg:3825501 | 6:27 pm on Jan 13, 2009 (gmt 0) |
As I reported in another thread Google is now allowing all sorts of duplicate content from various search engines and domain lookup services. Quite a bit is now competitive with your natural results or will be soon.
|
rustybrick

msg:3826101 | 1:13 pm on Jan 14, 2009 (gmt 0) |
[google.com...] returns only two results for me. the main www.archive.org is different, no? The web.archive.org is where the dup issues may come from?
|
Angonasec

msg:3826684 | 2:18 am on Jan 15, 2009 (gmt 0) |
The Google Alert displays both a heading and the searched for terms, and an archive.org url like this: www.archive.org/stream/<rest of url> The G serp shows a normal; Title, url and a snippet. The target page in www.archive.org is cached in G, and the cache shows the G Alert terms. The G cache url is formatted like this: [209.85.175.132...] of url> It looks to me like trouble at Google, and for everyone still unfortunate enough to be in archive.org
|
tedster

msg:3826699 | 3:00 am on Jan 15, 2009 (gmt 0) |
And as Receptional_Andy already posted, all the archived versions of websites are served from web.archive.org - NOT www.archive.org There is no problem here.
|
Angonasec

msg:3827555 | 1:39 am on Jan 16, 2009 (gmt 0) |
I understand tedster, but there are indeed 9.75 million pages in G for the term site:archive.org, >>all copies of our content<<, and the Alert shows they are in the mix. That is the point.
|
tedster

msg:3827578 | 3:36 am on Jan 16, 2009 (gmt 0) |
And there is just one result for site:web.archive.org/web/ - that's where all the Wayback machine copies are served. And even that one page is a url-only result, thanks to the robots.txt file. Did the alert you received point to a copy of one of your web pages?
|
|