Welcome to WebmasterWorld Guest from 54.167.110.211

Forum Moderators: Robert Charlton & aakk9999 & andy langton & goodroi

Message Too Old, No Replies

Best way to *remove* pages from Google

Nice 404 page, or 301 to home page?

     
2:04 am on Jun 8, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:June 10, 2003
posts:410
votes: 0


We tried an experiment last winter that resulted in a whole lot of new pages for a site that was rather large already. The new pages are thin and really don't do anything for users, and based on traffic, Google seems to agree; pages are indexed, but no real SERPs.

We would like to remove these pages and restore focus to the quality content on the site.

We could return a 404 "Not Found" or a 301 "Moved Permanently". In either case, the resulting page could know how to get users close to what they were looking for by examining the URL (e.g. "WidgetMaster 2003 is no longer available, here are other widgets from Widgets, Inc.").

Any suggestions?

Thanks!

11:37 am on June 8, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2004
posts:693
votes: 0


I went through the same process.
Don't do 404. google will keep requesting it, pages related to it will go supplemental, google referrals will plummet. Not good.

here is what you do.

Disallow the removed pages in robots.txt
Use a 301 via .htaccess to direct any referrals to an alternate page.

Submit your robots.txt file to the google url removal tool.
It has an option where you submit your robots.txt
within 48 hrs those pages will be wiped from google.
Those url's will not be requested again for 6 months (or maybe never) so write-off those filenames for future use.
Make sure you validate your robots.txt first, anything disallowed which appears in the index will be removed.

12:40 pm on June 8, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:June 10, 2003
posts:410
votes: 0


Reid --

Thanks very much! I am glad I asked and appreciate your reply.

sublime1

3:23 pm on June 8, 2005 (gmt 0)

New User

10+ Year Member

joined:Apr 11, 2005
posts:11
votes: 0


I requested to remove 2 directory of my China travel website, but I met the "Significant" Google Update, only 300 URLs left in Google, all web page content disappeared, but I am not sure whether it is caused by my removal request...

Hope my English writing is good enough to understand, I am not English native spokenĄ­ :-)

8:09 pm on June 8, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2004
posts:693
votes: 0


yanyading - when you remove files or directories by submitting robots.txt you can see which files have been removed by looking at the 'options' page.
There will be a list of files requested to be removed and their status.
Status will be
Pending - requested to be removed but not done yet
Comlete - removed files
request denied - if you change your robots.txt before the removal so that they are not disallowed then your request for removal will be denied.

If you make a mistake and disallow too much
example
user-agent: googlebot
disallow: /
this robots.txt will cause your whole website to be removed. So be careful and make sure you know your robots.txt exactly.

If you make a mistake and there is files pending which you don't want removed then change your robots.txt immediately before the removal bot comes.

user-agent: *
disallow:

or empty robots.txt will cause nothing to be removed and all your requests will be denied. Once they are 'complete' then it is too late to change it.

8:35 pm on June 9, 2005 (gmt 0)

New User

10+ Year Member

joined:Mar 31, 2005
posts:34
votes: 0


Make your web server return response code 410 HTTP_GONE for that pages. This will cause googlebot not to request those pages anymore, and also deleting them from the database.
9:18 pm on June 9, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:June 10, 2003
posts:410
votes: 0


Grippo --

Are you certain that Google will respond correctly to 410? That sure sounds like the way to go if it does.

Thanks!

10:12 pm on June 9, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 1, 2004
posts:1987
votes: 0


You could put adsense on them...
From my experience those pages will be gone from the SERPs within the next update.

Ohhh, and make the titles similar to the filename.

11:14 pm on June 9, 2005 (gmt 0)

New User

10+ Year Member

joined:Mar 31, 2005
posts:34
votes: 0



Grippo --

Are you certain that Google will respond correctly to 410? That sure sounds like the way to go if it does.

Thanks!

Yes, 100%. I have had for ages thoushands of pages wich responded REDIRECT 302, and after REDIRECT 301, just because I decided to move foo.org/dir to dir.foo.org, and foo.org/dir/* were listed for years (most of them without title) until I manged to respond 410 HTTP_GONE. The beauty of all this, is that it's just common sense.

12:00 am on June 10, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member billys is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 1, 2004
posts:3181
votes: 0


Make your web server return response code 410 HTTP_GONE for that pages. This will cause googlebot not to request those pages anymore, and also deleting them from the database.

Technically, this is saying...

Yup, the page used to be here, but now it's gone.

I've used this technique before and both Yahoo and Google handle it correctly.

12:37 am on June 10, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2004
posts:693
votes: 0


You could put adsense on them...
From my experience those pages will be gone from the SERPs within the next update.
Ohhh, and make the titles similar to the filename.


Please save the sarcasm for less tecnical threads like PR or last update. Some people have a hard time with this stuff and are easily confused.

I have heard from others on WW that 410 works too. I just was in the 'remove pages from google' mode.

Either method will work but the robots.txt will do it instantly. I'm not sure how long 410 takes to actually see the URL removed.
robots meta tag on the page - some say - others not- will also remove the page but it takes a month or so.

8:45 am on June 10, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Feb 15, 2005
posts:380
votes: 0


In my experience, using 301 is good way, but slow. Even if I put a link to old URL from frequently spidered page, it takes at least a few weeks.

URL Console is fast, I often use it to remove pages returning 404, and this takes e few days, but there is a downside - after six months, removed URLs tend to reappear in the index, despite the fact they have been returning 404 ever since removal. Currently, I have a lot of trouble with removing again the outdated pages I removed in November.

9:09 am on June 10, 2005 (gmt 0)

Full Member

10+ Year Member Top Contributors Of The Month

joined:June 3, 2005
posts:298
votes: 12


Guys I want to remove all the pages from my "MoviePrints" folder (considered to be spam, but is a decent affiliate shop). Does this look ok?

User-agent: *
Disallow: /MoviePrints/
Disallow: /images/
Disallow: /banners/
Disallow: /products/
Disallow: /*.gif$
Disallow: /*.jpg$
Disallow: /*.pdf$
Disallow: /*.avi$

10:37 am on June 10, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Oct 7, 2002
posts:175
votes: 0


well, I think you can just put one line in your htaccess file:

RedirectMatch gone /MoviePrints/.*

and everything inside that folder will be gone.
maybe I am wrong, I am not a UNIX guru, but that's how I did it, and it works so far.

4:16 pm on June 10, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2004
posts:693
votes: 0


User-agent: *
Disallow: /MoviePrints/
Disallow: /images/
Disallow: /banners/
Disallow: /products/
Disallow: /*.gif$
Disallow: /*.jpg$
Disallow: /*.pdf$
Disallow: /*.avi$


This will remove all Moviprints,images,banners,products,.gif,.jpg,.pdf and .avi files.

A 404 will keep coming back, unless you disallow it with robots.txt or return a 410.
google does not interpret 404 as a removed page it treats it as a temporary error.

The proper way would be to return a 410 with .htaccess and then use robots.txt to remove it quick if you like.

4:30 pm on June 10, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2004
posts:693
votes: 0


wait there is a better way.

if you are still getting traffic to those URL's (404's) then what I like to do is do a 301 redirect to pick up stray traffic. Disalow in robots.txt and remove it from google. Leave it that way for 3-4 months until there are no more requests for that URL, there are other search engines and the disallow should cause them to also remove it eventually.
After the requests peter out then change it to 410 and remove it from robots.txt file.

6:26 pm on June 11, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Feb 12, 2002
posts:565
votes: 0


I have a number of pages to remove also, so how do you 'return a 410 with .htaccess' anyway?

And what should you do if you have a situation of individual pages - lots of them, - that no longer exist but there are other files in the directory they were in that do still exist so that you can't just block the whole directory?

8:44 pm on June 11, 2005 (gmt 0)

New User

10+ Year Member

joined:June 2, 2005
posts:5
votes: 0


We had several hundred, maybe even more, pages we just switched to 404's because they were old and in some cases had duplicate content because we updated our site to a new look and still had the old site. I take it this was the wrong way? We set the 404's about 2 evenings ago.

I am feeling rather concerned right now. :/

Also, I am confused a little about where this "'options' page" is that we can watch if we submit a robots.txt file. Where would we submit that?

Thanks!

Ellie

5:56 am on June 12, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2004
posts:693
votes: 0


so how do you 'return a 410 with .htaccess' anyway?
And what should you do if you have a situation of individual pages - lots of them, - that no longer exist but there are other files in the directory they were in that do still exist so that you can't just block the whole directory?


how to do 410 - you should goto the apache forum to learn about .htaccess never just cut and paste stuff you don't understand into .htaccess - know what you are doing you will find help there. Other server have different methods of sending a 410 their respective forums will help.

each individual page would have to be disallowed - unless you want to remove the whole directory.
If you can get them returning 410 then you could just let it ride - they will be removed. .htaccess can do wildcards if that helps.

We had several hundred, maybe even more, pages we just switched to 404's because they were old and in some cases had duplicate content because we updated our site to a new look and still had the old site. I take it this was the wrong way? We set the 404's about 2 evenings ago.
I am feeling rather concerned right now. :/


You should be concerned - google chokes on 404's with several hundred of them your website should be in the supplemental index within a month or 2.
Also, I am confused a little about where this "'options' page" is that we can watch if we submit a robots.txt file. Where would we submit that?


[services.google.com:8882...]

once you sign up you get into the 'options' page where you are given 3 options.
The first option is submit robots.txt file
There is a large grey area on the right side of the 'options' page. That is where your requests and their status will appear.

Before you submit your robots.txt file it is critical that you understand your robots.txt file and validate it. This tool is able to remove your entire domain from google for 6 months (if you disallow: /).

3:14 pm on June 12, 2005 (gmt 0)

New User

10+ Year Member

joined:June 2, 2005
posts:5
votes: 0


Thanks Reid, I appreciate the help.
5:18 pm on June 12, 2005 (gmt 0)

New User

10+ Year Member

joined:June 2, 2005
posts:5
votes: 0


Ok, robots.txt file all set up and uploaded. Hopefully it's not completely too late but at least it's there now.

We did exactly per Google's instructions for:

"Remove part of your website"

Thanks again for the tips!

Ellie

10:45 pm on June 13, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Feb 12, 2002
posts:565
votes: 0


Thanks Reid! I have set up the 410's and haven't yet decided if I will also do the robots/url removal thing or not.
12:44 am on June 14, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2004
posts:693
votes: 0


confused - you should see on the 'options' page what files google intends to remove, if it's wrong you have a little time 24-48 hrs to edit robots.txt before it actually happens. if you changed robots.txt to allow everything then all your requests would be denied and then you could try again.

Trish - even just returning 410 is good enough, just not sure how long it takes - I would guess within a crawl but if not it would be on the next update.

11:10 am on June 14, 2005 (gmt 0)

Full Member

10+ Year Member Top Contributors Of The Month

joined:June 3, 2005
posts:298
votes: 12


Thanks guys but do note Windows servers do NOT have .htaccess

Anyway I re-wrote it cos not all spiders do wild cards:


User-agent: *
Disallow: /MoviePrints/
Disallow: /images/
Disallow: /banners/
Disallow: /products/

User-agent: Googlebot-Image
Disallow: /*.gif$
Disallow: /*.jpg$

Source: [google.com...]

5:13 pm on June 14, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2004
posts:693
votes: 0


only apache uses .htaccess other servers can return a 410 but through different tools - check the forum for your server for the method.

user-agent: * should be the last directive in robots.txt because all robots (or most) will follow the directives of their own or * whichever comes first.
In the above robots.txt googlbot-image may follow the user-agent: * directives without ever seeing user-agent: googlebot-image

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members