Welcome to WebmasterWorld Guest from 34.229.194.198

Forum Moderators: Robert Charlton & goodroi

Completely removing site from Google index

     
6:17 am on Jan 20, 2019 (gmt 0)

Preferred Member

10+ Year Member

joined:Oct 30, 2000
posts:520
votes: 4


Hi, I am having problems with getting Adsense approved for one of my sites with the reason 'scraped content'. I have fixed some of it, where certain companies can submit their description, photos etc as a lot of them just cut and pasted. I have another section to this site that is about traveling to this particular country. I simply used the articles that my wife wrote about this country that were on another site of mine, I closed this site down 3 months ago, it was a Drupal site and I put it in maintenance mode. If I search for bits of this text this site still comes up, no cache and a maintenance mode message if clicked through. My question is how do I completely remove all traces of this in Googles index? Do I have to submit each page in removal request in webmasters console? Does anyone know how long it takes for the site to disappear completely naturally?
11:59 am on Jan 20, 2019 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2000
posts:12225
votes: 361


Hi maccas, as I see it, you've got more than the usual number of issues involved when you remove these pages. I can answer some, but not all of your questions... In particular, I'm not familiar with the Drupal module, and I'm not an IT programmer. That said, I'm thinking the Drupal module returning a 200 response rather than a 503 is a potential cause of the issue.

You wrote...
If I search for bits of this text this site still comes up, no cache and a maintenance mode message if clicked through.

Drupal maintence mode
It struck me... when you mentioned Drupal maintenance mode... that the module might not be configured to return the necessary 503s. A 503, among other things, should tell Goolegbot when to come back and recheck the server.

Searching for [drupal maintenance mode 503], I ended up at this page...

Maintenance 200
https://www.drupal.org/project/maintenance200 [drupal.org].

I think the 200 response in the chain could be hiding the removed pages from Google. Lots of CMSs mess up at this point. I would expect a 200 response from IIS, but Drupal surprises me, as they appear to know better. Drupal's argument makes sense, except that in this case it doesn't....
The Maintenance 200 module allows a site to return a Status code of 200 rather than the default 503 (Service Unavailable) code.

"But wait," you ask, "why would I want that? The site is truly in a 503 state and should report that." The reason you'd want to return a 200 is so that your CDN or caching layer (e.g., Varnish) will cache the maintenance page and serve it to new requests rather than passing the request down to your origin server.

Admittedly, this is kind of a double edge sword, since once the page is in cache you'll have to flush your cache to bring the site back up...

IMO, this "double edge sword" could be part of your problem.... While you're removing your unwanted page content, your server should be set to return 503 responses to let bots know you're busy. The 503 also tells them to come back and check when your maintenance is finished.

Then, when Googlebot later returns, after the unwanted pages are removed, Google should see the 404 or 410 responses you've set up for the removed urls. Without this extra step, though, I'm thinking that Google perhaps hasn't seen the 404/410 error responses for the deleted pages and not know that content is gone..

I'm not familiar with the Drupal setup, but Drupal's help article suggests that the 200 response that's diverting Googlebot from the 503 needs to be removed, as it could be preventing Google from seeing the 404s or 410s.


Don't use Google GSC removal requests
As for your questions about using the GSC to remove pages...
My question is how do I completely remove all traces of this in Google's index? Do I have to submit each page in removal request in webmasters console?

All Google spokesmen that I've seen are emphatic that you should NOT use the URL removal tool in the Search Console for changes like this. Here's a writeup from john Mueller, which covers his preferred methods of handling site changes....

Bulk Content Removal
[productforums.google.com...]

- The URL removal tool is not meant to be used for normal site maintenance like this. This is part of the reason why we have a limit there.
- The URL removal tool does not remove URLs from the index, it removes them from our search results. The difference is subtle, but it's a part of the reason why you don't see those submissions affect the indexed URL count.


And...
...if you have the ability to use a 410 for content that's really removed, that's a good practice.

For large-scale site changes like this, I'd recommend:
- don't use the robots.txt
- use a 301 redirect for content that moved
- use a 410 (or 404 if you need to) for URLs that were removed
- make sure that the crawl rate setting is set to "let Google decide" (automatic), so that you don't limit crawling
- use the URL removal tool only for urgent or highly-visibile issues.

It's worth considering how long to keep your old site up before taking it down, to allow for slow spidering of your old site if it's an obscure one. I'd recommend at least a couple of months. If you'vce got a spider tool and can check the status of your old pages, I'd do that before taking them down.

2:16 pm on Jan 20, 2019 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4159
votes: 262


I am not a Drupal user, but isn't there a way to let Google crawl the pages and find a "noindex, noarchive" for their robots? Either via meta tags or X-Robots?
3:12 pm on Jan 20, 2019 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Sept 26, 2018
posts:53
votes: 15


I'd use the 410 gone response. I'm not sure I'd use meta noindex in this situation. I'd want my old site to be considered 'gone' rather than 'not appearing in SERPs'. So if you continue to see your old site in SERPs then you know it hasn't 'gone'.

But who knows what will please will the Adsense bot - old site only removed from SERPs or old site fully removed at a deeper database level?

Maybe temporarily remove the travel articles from the new site too; whatever it takes to get rid of the 'scraped content' stigma in one clean move. And delay reapplying to Adsense until everything has been cleaned up.
3:25 am on Jan 21, 2019 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11566
votes: 182


whether drupal was configured to provide a 200 or a 503, neither will remove those requested urls from the index.

the preferred methods to remove urls from the index include:
  1. a Client Error (4xx) class status code
    The 410 (Gone) status code SHOULD be used if the server knows, through some internally configurable mechanism, that an old resource is permanently unavailable and has no forwarding address. This (404) status code is commonly used when the server does not wish to reveal exactly why the request has been refused, or when no other response is applicable.

    The 410 response is primarily intended to assist the task of web maintenance by notifying the recipient that the resource is intentionally unavailable and that the server owners desire that remote links to that resource be removed.

    (from https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4)

  2. supply a noindex meta directive in the response, using either a X-Robots-Tag noindex HTTP Response header or a meta robots noindex element in the <head> of the html document supplied in the response.


the url removal tool in GSC will remove a url from search results for 90 days but it won't permanently remove the url from the index.

more here:
https://support.google.com/webmasters/answer/1663419#make-permanent

you should also create web properties in google search console for the various versions of your legacy site (http/https&www/non-www) to you can see what's happening with crawling and indexing on that domain.
7:58 am on Jan 21, 2019 (gmt 0)

Preferred Member

10+ Year Member

joined:Oct 30, 2000
posts:520
votes: 4


Thanks everyone, made it 410, will wait until I can no longer find these articles in search and reapply.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members