homepage Welcome to WebmasterWorld Guest from 54.227.215.140
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Content Management
Forum Library, Charter, Moderators: ergophobe

Content Management Forum

    
Removing pages from site - best practices?
jack38

5+ Year Member



 
Msg#: 4347592 posted 7:58 am on Aug 4, 2011 (gmt 0)

I am removing pages from my site, do I need to include them in my robots.txt file, or will google eventually drop them from its index?

What would you do? disallow in robots.txt, or leave it to google and time will take care of it?

 

topr8

WebmasterWorld Senior Member topr8 us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4347592 posted 8:52 am on Aug 4, 2011 (gmt 0)

serve a 410 status

ergophobe

WebmasterWorld Administrator ergophobe us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4347592 posted 4:30 pm on Aug 17, 2011 (gmt 0)

Yes, robots.txt merely restricts the crawl, it doesn't stop indexing.

If you disallow them in robots.txt, Google can't crawl them, can't get the 410 server response, can't drop it from the index.

robots.txt = "page exists, but you're not allowed to go there Google."
410 = "page is gone forever, take it out of your index Google."

KaraScene1



 
Msg#: 4347592 posted 11:45 am on Sep 13, 2011 (gmt 0)

RE:
robots.txt = "page exists, but you're not allowed to go there Google."
410 = "page is gone forever, take it out of your index Google."


I also wondered this -sorry for the stupid question,I'm very new to web stuff.How do you write it?
User-agent: googlebot
allow:catalogid413.com/410
Disallow:

or an.htaccess file like:
<Limit GET HEAD POST>
order allow, deny
deny from www.domain.com/catalog413.html410
allow from all
</LIMIT>

OR are codes in a Rewrite form? I used the url removal tool once...then saw it's for urgent urls.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4347592 posted 8:45 pm on Sep 13, 2011 (gmt 0)

For a given definition of "urgent". URL removal has to be instead of, not in addition to, a 410 ("gone") or 301 ("permanent redirect"). Or at least a "disallow" in robots.txt.

There is any number of different ways to keep humans and/or robots out of files and/or directories. Which one you use depends on the exact circumstances and on what else you already have going on in your htaccess or config file. You rarely need to identify your domain name, since that's implicit in the location of the file.

KaraScene1



 
Msg#: 4347592 posted 10:37 pm on Sep 13, 2011 (gmt 0)

Thanks,Lucy24. Guess I'll disallow the dead pages until they go away. My urgency in url removal was when I saw old pages in Way Back Machine and Gigablast full of red x'd boxes from deleted pics. I was horrified! Sad thing is, those SE's rarely visit and won't see a code to update it.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Content Management
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved