Welcome to WebmasterWorld Guest from 34.238.189.171

Forum Moderators: not2easy

Message Too Old, No Replies

Chilling Effects DMCA Archive Deletes Self From Google

     
1:20 am on Jan 13, 2015 (gmt 0)

Administrator from JP 

WebmasterWorld Administrator bill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Oct 12, 2000
posts:15157
votes: 170


https://www.techdirt.com/articles/20150112/06545529675/chilling-effects-chilling-effects-as-dmca-archive-deletes-self-google.shtml [techdirt.com]

Chilling Effects On Chilling Effects As DMCA Archive Deletes Self From Google

Over the weekend, TorrentFreak noted that the website Chilling Effects had apparently removed itself from Google's search index after too many people complained.
This week, however, we were no longer able to do so. The Chilling Effects team decided to remove its entire domain from all search engines, including its homepage and other informational and educational resources.
1:27 am on Jan 13, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member ken_b is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Oct 5, 2001
posts:5885
votes: 117


Seems to be a bit of confusion about this. From the same article:
Meanwhile, Chilling Effects founder, Wendy Seltzer, seems to insist that this was an implementation mistake and that the team never meant to remove the whole domain:
1:35 am on Jan 13, 2015 (gmt 0)

Administrator

WebmasterWorld Administrator rogerd is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Aug 2, 2000
posts:9687
votes: 1


I've seen this happen more than once... a development site has indexing blocked, and then gets pushed live without anyone noticing. (If the second comment in the article is correct.)
2:54 am on Jan 13, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15762
votes: 828


<meta name="ROBOTS" content="NOODP" />


I took my best guess about the domain name and then looked at some random pages.

Psst! ChillingEffects! HTML doesn't require a closing slash.
9:45 am on Jan 13, 2015 (gmt 0)

Senior Member from KZ 

WebmasterWorld Senior Member lammert is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 10, 2005
posts: 2952
votes: 35


Looking at the robots.txt I wouldn't call this an implementation mistake by some developers. Everything except the /pages subtree where the about page resides is deliberately blocked for all bots. It's easy to reverse but they didn't do it until now, so they probably want it this way.

Anyone knows what the Google-Legal-Removals bot is doing BTW? Is that bot used to scrape the DMCA notices from the chilling effects site?

User-agent: Google-Legal-Removals
Disallow:

User-agent: Googlebot
Noindex: /
Allow: /pages

User-agent: *
Disallow: /
Allow: /pages
3:35 pm on Jan 13, 2015 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4397
votes: 314


No need to guess at the domain name, it is linked to from here in the Charter: [webmasterworld.com...]

Idly wondering if their content is being "blocked" from any other crawlers?
7:31 pm on Jan 13, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15762
votes: 828


User-agent: Googlebot
Noindex: /
Allow: /pages

wtf? Does Allow: also mean "override a Noindex: directive"? Does anyone other than Google use this?

Everything except the /pages subtree

... and that's only if the crawler understands the Allow: formulation. They don't have to; so far there's no robots.txt 2.0 standard.

When a robots.txt file mentions user-agents other than yourself or * doesn't it tend to mean that everyone sees the same file?
3:51 am on Jan 14, 2015 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Apr 30, 2008
posts:2630
votes: 191



User-agent: Googlebot
Noindex: /
Allow: /pages

wtf? Does Allow: also mean "override a Noindex: directive"? Does anyone other than Google use this?


I have used "Allow: some-url-pattern" before. It is Google specific: Block URLs with robots.txt [support.google.com]

Noindex: /

Never heard of this within robots.txt. As far as I know, noindex is a directive that is declared only in meta robots or is returned within X-Robots-Tag as part of HTTP response headers.

So if they were to block everything apart from /pages then I would expect their robots.txt to look like this instead:

User-agent: Googlebot
Disallow: /
Allow: /pages


But would not expect to see Noindex: / syntax. Have I missed something new?
4:09 am on Jan 14, 2015 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4397
votes: 314


The Noindex: / directive in a robots.txt is meaningless - unless it is a recent secret change not mentioned anywhere - it does nothing. Most of that file would only confuse Googlebot imho.

Official standards haven't been updated since 1997 (for HTML 4.01) but Google and Bing both recognize and follow the Allow:

I have read that they use the subsequent length of the text string that follows both Disallow: and Allow: to decide whether to pay attention.. sort of iffy.
5:38 am on Jan 14, 2015 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11774
votes: 225


at one time google recognized the Noindex: robots.txt directive as an undocumented feature.
7:47 am on Jan 14, 2015 (gmt 0)

Senior Member from KZ 

WebmasterWorld Senior Member lammert is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 10, 2005
posts: 2952
votes: 35


Noindex: validates in Google's robots.txt checker, so it should be OK.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members