I am removing pages from my site, do I need to include them in my robots.txt file, or will google eventually drop them from its index?
What would you do? disallow in robots.txt, or leave it to google and time will take care of it?
topr8
8:52 am on Aug 4, 2011 (gmt 0)
serve a 410 status
ergophobe
4:30 pm on Aug 17, 2011 (gmt 0)
Yes, robots.txt merely restricts the crawl, it doesn't stop indexing.
If you disallow them in robots.txt, Google can't crawl them, can't get the 410 server response, can't drop it from the index.
robots.txt = "page exists, but you're not allowed to go there Google." 410 = "page is gone forever, take it out of your index Google."
KaraScene1
11:45 am on Sep 13, 2011 (gmt 0)
RE: robots.txt = "page exists, but you're not allowed to go there Google." 410 = "page is gone forever, take it out of your index Google."
I also wondered this -sorry for the stupid question,I'm very new to web stuff.How do you write it? User-agent: googlebot allow:catalogid413.com/410 Disallow:
or an.htaccess file like: <Limit GET HEAD POST> order allow, deny deny from www.domain.com/catalog413.html410 allow from all </LIMIT>
OR are codes in a Rewrite form? I used the url removal tool once...then saw it's for urgent urls.
lucy24
8:45 pm on Sep 13, 2011 (gmt 0)
For a given definition of "urgent". URL removal has to be instead of, not in addition to, a 410 ("gone") or 301 ("permanent redirect"). Or at least a "disallow" in robots.txt.
There is any number of different ways to keep humans and/or robots out of files and/or directories. Which one you use depends on the exact circumstances and on what else you already have going on in your htaccess or config file. You rarely need to identify your domain name, since that's implicit in the location of the file.
KaraScene1
10:37 pm on Sep 13, 2011 (gmt 0)
Thanks,Lucy24. Guess I'll disallow the dead pages until they go away. My urgency in url removal was when I saw old pages in Way Back Machine and Gigablast full of red x'd boxes from deleted pics. I was horrified! Sad thing is, those SE's rarely visit and won't see a code to update it.