Welcome to WebmasterWorld Guest from 184.72.177.182

Forum Moderators: ergophobe

Message Too Old, No Replies

Removing pages from site - best practices?

     
7:58 am on Aug 4, 2011 (gmt 0)

Junior Member

10+ Year Member

joined:July 9, 2005
posts: 53
votes: 0


I am removing pages from my site, do I need to include them in my robots.txt file, or will google eventually drop them from its index?

What would you do? disallow in robots.txt, or leave it to google and time will take care of it?
8:52 am on Aug 4, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member topr8 is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 19, 2002
posts:3171
votes: 8


serve a 410 status
4:30 pm on Aug 17, 2011 (gmt 0)

Moderator This Forum

WebmasterWorld Administrator ergophobe is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 25, 2002
posts:8139
votes: 103


Yes, robots.txt merely restricts the crawl, it doesn't stop indexing.

If you disallow them in robots.txt, Google can't crawl them, can't get the 410 server response, can't drop it from the index.

robots.txt = "page exists, but you're not allowed to go there Google."
410 = "page is gone forever, take it out of your index Google."
11:45 am on Sept 13, 2011 (gmt 0)

New User

joined:Sept 13, 2011
posts: 6
votes: 0


RE:
robots.txt = "page exists, but you're not allowed to go there Google."
410 = "page is gone forever, take it out of your index Google."


I also wondered this -sorry for the stupid question,I'm very new to web stuff.How do you write it?
User-agent: googlebot
allow:catalogid413.com/410
Disallow:

or an.htaccess file like:
<Limit GET HEAD POST>
order allow, deny
deny from www.domain.com/catalog413.html410
allow from all
</LIMIT>

OR are codes in a Rewrite form? I used the url removal tool once...then saw it's for urgent urls.
8:45 pm on Sept 13, 2011 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month

joined:Apr 9, 2011
posts:12720
votes: 244


For a given definition of "urgent". URL removal has to be instead of, not in addition to, a 410 ("gone") or 301 ("permanent redirect"). Or at least a "disallow" in robots.txt.

There is any number of different ways to keep humans and/or robots out of files and/or directories. Which one you use depends on the exact circumstances and on what else you already have going on in your htaccess or config file. You rarely need to identify your domain name, since that's implicit in the location of the file.
10:37 pm on Sept 13, 2011 (gmt 0)

New User

joined:Sept 13, 2011
posts: 6
votes: 0


Thanks,Lucy24. Guess I'll disallow the dead pages until they go away. My urgency in url removal was when I saw old pages in Way Back Machine and Gigablast full of red x'd boxes from deleted pics. I was horrified! Sad thing is, those SE's rarely visit and won't see a code to update it.