homepage Welcome to WebmasterWorld Guest from 54.237.184.242
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / Content Management
Forum Library, Charter, Moderators: ergophobe

Content Management Forum

    
Removing pages from site - best practices?
jack38




msg:4347594
 7:58 am on Aug 4, 2011 (gmt 0)

I am removing pages from my site, do I need to include them in my robots.txt file, or will google eventually drop them from its index?

What would you do? disallow in robots.txt, or leave it to google and time will take care of it?

 

topr8




msg:4347611
 8:52 am on Aug 4, 2011 (gmt 0)

serve a 410 status

ergophobe




msg:4352762
 4:30 pm on Aug 17, 2011 (gmt 0)

Yes, robots.txt merely restricts the crawl, it doesn't stop indexing.

If you disallow them in robots.txt, Google can't crawl them, can't get the 410 server response, can't drop it from the index.

robots.txt = "page exists, but you're not allowed to go there Google."
410 = "page is gone forever, take it out of your index Google."

KaraScene1




msg:4361871
 11:45 am on Sep 13, 2011 (gmt 0)

RE:
robots.txt = "page exists, but you're not allowed to go there Google."
410 = "page is gone forever, take it out of your index Google."


I also wondered this -sorry for the stupid question,I'm very new to web stuff.How do you write it?
User-agent: googlebot
allow:catalogid413.com/410
Disallow:

or an.htaccess file like:
<Limit GET HEAD POST>
order allow, deny
deny from www.domain.com/catalog413.html410
allow from all
</LIMIT>

OR are codes in a Rewrite form? I used the url removal tool once...then saw it's for urgent urls.

lucy24




msg:4362109
 8:45 pm on Sep 13, 2011 (gmt 0)

For a given definition of "urgent". URL removal has to be instead of, not in addition to, a 410 ("gone") or 301 ("permanent redirect"). Or at least a "disallow" in robots.txt.

There is any number of different ways to keep humans and/or robots out of files and/or directories. Which one you use depends on the exact circumstances and on what else you already have going on in your htaccess or config file. You rarely need to identify your domain name, since that's implicit in the location of the file.

KaraScene1




msg:4362152
 10:37 pm on Sep 13, 2011 (gmt 0)

Thanks,Lucy24. Guess I'll disallow the dead pages until they go away. My urgency in url removal was when I saw old pages in Way Back Machine and Gigablast full of red x'd boxes from deleted pics. I was horrified! Sad thing is, those SE's rarely visit and won't see a code to update it.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Content Management
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved