Welcome to WebmasterWorld Guest from 54.161.116.225

Forum Moderators: open

Featured Home Page Discussion

Wikipedia's 9 million 404s Links Recovered Thanks to Internet Archive

     
2:46 pm on Oct 3, 2018 (gmt 0)

Administrator from GB 

WebmasterWorld Administrator engine is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month Best Post Of The Month

joined:May 9, 2000
posts:25609
votes: 773


Wikipedia is not unlike any large site where it accumulates broken links over time: Link rot.

Collaborating with the Internet Archive, over 9 million links were fixed.
So grateful for the extraordinary work our friends at @internetarchive are doing to fight 404s and digitally preserve millions of links to websites and sources Wikipedians cite, as they build the world's largest encyclopedia.

[twitter.com...]

And for the past 3 years, we have been running a software robot called IABot on 22 Wikipedia language editions looking for broken links (URLs that return a ‘404’, or ‘Page Not Found’). When broken links are discovered, IABot searches for archives in the Wayback Machine and other web archives to replace them with. Restoring links ensures Wikipedia remains accurate and verifiable and thus meets one of Wikipedia’s three core content policies: ‘Verifiability’.

[blog.archive.org...]
3:54 pm on Oct 3, 2018 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Sept 13, 2018
posts:147
votes: 31


What if the page was removed, because the content was inaccurate / wrong ? So Wikipedia will continue to spread a false information, arguing that, at some point in the past, someone, somewhere, for some reason wrote it ?
4:11 pm on Oct 3, 2018 (gmt 0)

Administrator from GB 

WebmasterWorld Administrator engine is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month Best Post Of The Month

joined:May 9, 2000
posts:25609
votes: 773


Well, that's a little negative: It's more likely the resource has just gone.
As far as accuracy is concerned, it's where the editors come in to make corrections, plus, you can use the procedure noted at Wikipedia's Accuracy and dispute information. [en.wikipedia.org...]
8:21 pm on Oct 3, 2018 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12786
votes: 878


Better check and see if they also "recovered" my plagiarized articles I had to jump through hoops in order to get them to remove.
8:41 pm on Oct 3, 2018 (gmt 0)

Administrator from GB 

WebmasterWorld Administrator engine is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month Best Post Of The Month

joined:May 9, 2000
posts:25609
votes: 773


Better check and see if they also "recovered" my plagiarized articles


If the link from wikipedia is gone, then no problem. It's only for existing links.
8:41 pm on Oct 3, 2018 (gmt 0)

Senior Member from FR 

WebmasterWorld Senior Member leosghost is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Feb 15, 2004
posts:7139
votes: 410


If you blocked archive.org's crawlers, you should be OK..
8:43 pm on Oct 3, 2018 (gmt 0)

Senior Member from FR 

WebmasterWorld Senior Member leosghost is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Feb 15, 2004
posts:7139
votes: 410


Ah..no..I just realised, archive.org's crawlers, crawled wikipedia for years, did wikipedia 410 your copied stuff, or just 404 it..
9:19 pm on Oct 3, 2018 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12786
votes: 878


If you blocked archive.org's crawlers, you should be OK..
Just a bit of background... after several months of emailed removal requests, DMCA submissions, C&D notices and attempted phone calls (when they were in San Francisco) without any responsive action on their part whatsoever, I came to the conclusion I had to take alternative methods to get my intellectual property (articles written my myself) removed from their "encyclopedia" that BTW was ranking higher than my page for the same article (go figure.)

I became an editor and one by one either removed the articles entirely or reduced the plagiarized content to a more acceptable "fair use" level if I decided to leave or include the citation w/ link to my site. I also witnessed first hand the sanctimonious temperment of some of the other editors. Some (probably not all) feel their site is the all time gift to mankind and they should be exempt from standard laws of respect for property.

This was years before they added the utility to remove your site. When this feature was installed, I added my sites to their exclude list.

After the 2016 election, for fear of internet instability, Archive.org mirrored its data to Canada. Since then, their bots attempt (they're all blocked) to scrape my content at least once a week, despite being disallowed in robots.txt. They change IP ranges, change User Agents, pretend to be humans with normal browsers, ad infinitum.

I'm not a fan of this so called "archive" and over the years have been transparent about my experiences when voicing my opinion whether they should be allowed in citation for student work at the several schools I've held positions at.

Even if your content is not openly plagiarized & your pages are not published in their archive, you can bet they're attempted to scrape your property and in all likelihood have it somewhere on their servers.

Some content owners see a benefit having their webpages cached (copied) on a remote site. I am not one of them.

Related discussion: [webmasterworld.com...]
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members