homepage Welcome to WebmasterWorld Guest from 54.82.122.194
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Website
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Best way to tell Googlebot a page doesn't exist anymore
zerillos




msg:4261094
 11:45 am on Feb 1, 2011 (gmt 0)

What do you think it's the best way to tell googlebot that a page does not exist anymore. Simply delete it and let it 404 until googlebot gets bored and stops trying? or return a 410 code everytime it tries to download it?

Thanks for your opinion

 

goodroi




msg:4261128
 12:55 pm on Feb 1, 2011 (gmt 0)

it depends on the situation. if you don't care and have no link juice to lose or worried about your crawl budget then you can 404 it. this also assumes you dont care about a possibly flooding your 404 log file with new entries. if you are in this situation where you have nothing to lose i doubt you would be visiting WebmasterWorld.

when a page of mine no longer exists, i would:
1) make sure all of my internal links pointing to it are changed to another url or deleted.

2) contact all external sites linking to it and ask them to change the link to another one of my urls

3) add a 301 redirect to take care of any external links that couldnt be updated

4) make sure the url is not blocked by robots.txt or a noindex tag so google sees the 301 redirect

5) sit back and wait for googlebot to crawl the page and notice that it went bye-bye.

deadsea




msg:4261132
 1:03 pm on Feb 1, 2011 (gmt 0)

Very few sites use the 410 response code. If I recall correctly, Google has said that they don't treat it any differently from a 404.

pageoneresults




msg:4261137
 1:19 pm on Feb 1, 2011 (gmt 0)

Very few sites use the 410 response code. If I recall correctly, Google has said that they don't treat it any differently from a 404.


I might have to disagree with that. Google appears to handle 410 Gone exactly as it says on the tin. I've implemented it recently and have seen pages "Gone" within 24-48 hours.

I'm sure there are others who will chime in with similar findings. If a document does not exist anymore and there is no viable replacement for a 301, then 410 Gone is the suggested server response. A 404 is too vague and Googlebot will continue to request the document forever as long as there are external links to it.

This is where finite error reporting comes into play. You should have few, if any 404s. At some point, you'll capture those repetitive 404s and redirect them to an appropriate document. If no replacement exists, drop a 410 Gone in there.

I'd like to point out that a 410 Gone is probably your last resort. You'll want to preserve whatever equity may have been associated with the document that no longer exists, especially if there are inbound links that you have little to no control over. You'll of course 301 in this instance to the most appropriate document.

Note: I was surprised when I did the 410 implementation a little while back. Within 24-48 hours Google removed those pages from its index, it acted just like a URL Removal request without all the paperwork. :)

tedster




msg:4261189
 2:58 pm on Feb 1, 2011 (gmt 0)

Yes - Google used to handle 404 and 410 the same way but they did change that last year. There was even a public mention of the change from a Googler last year, I think John Mueller. I'll see if I can find the link.

tedster




msg:4261199
 3:13 pm on Feb 1, 2011 (gmt 0)

This wasn't the first mention of the change, but here, on 2010-04-15 JohnMu does confirm different handling:

If you are certain that the URLs will no longer have content on them, you could also use a 410 HTTP result code, to signal that they are "gone" forever. We may still crawl the URLs (especially when we find new links), but we generally see a 410 HTTP result code as being more permanent than a 404 HTTP result code (which can be transient by definition).

[google.com...]

optimierung




msg:4261235
 3:50 pm on Feb 1, 2011 (gmt 0)

If I cannot control the server how can I implement a 410?
Is there a htacess / rewrite statement?

LunaC




msg:4261274
 4:54 pm on Feb 1, 2011 (gmt 0)

htacess for 410's:

# 410 Permanently Removed
Redirect gone /filename.ext

zerillos




msg:4261295
 5:37 pm on Feb 1, 2011 (gmt 0)

A 301 redirect yes, but to where? the page simply does not exists anymore.

I fact, this is about a section of a website that was recently removed. it had no link juice and was simply attracting all kinds of spam, so we removed it.

I guess the 410 gone is a more appropriate thing to do. However, I was wondering if any of you are aware of any negative impacts on using a 410 gone on a website (from googlebot's point of view that is...)

TheMadScientist




msg:4261298
 5:55 pm on Feb 1, 2011 (gmt 0)

410 Gone

# The Mod_Rewrite Version
RewriteEngine on
RewriteRule ^thepage\.ext$ - [G]

tedster




msg:4261339
 6:40 pm on Feb 1, 2011 (gmt 0)

any negative impacts on using a 410 gone on a website

Only if you want to re-use the URL. Make sure it stays "gone".

rogerd




msg:4261853
 7:59 pm on Feb 2, 2011 (gmt 0)

OK, here's a variation on the original question. I'm working with a site that due to a site malfunction had wrong URLs in place for long enough that they got spidered. The malfunction was corrected, the correct URLs were put back in place, and the wrong ones 301ed to the correct locations.

Oddly, though, Google is spidering the new pages but still has the old pages in its index. At one point, I even tried putting some prominent links to a portion of the old URLs, thinking that Google would follow the link, discover the 301, and drop the old URL. Hasn't happened, though, despite otherwise aggressive spidering. The old pages live on like zombies in Google's index.

Thoughts?

tedster




msg:4261856
 8:07 pm on Feb 2, 2011 (gmt 0)

I've got a similar situation right now, Roger, with a new site launch that went technically wrong for more than a week. In the past, after a number of weeks, I'd only see the wrong URLs with a site: operator and never in an ordinary SERP. I'll see if that still holds, because I'm deep into a parallel situation right now.

The most critical thing is to remove ALL occurrences of the wrong URLs in the site. I would definitely not intentionally link to a URL that will redirect. That only compounds the chaos.

pageoneresults




msg:4261863
 8:23 pm on Feb 2, 2011 (gmt 0)

Google is spidering the new pages but still has the old pages in its index.


That's one of those times where I might say noarchive comes into play. There's something with that cache and redirects that noarchive "appears" to address. It's just a hunch. ;)

thecoalman




msg:4261911
 10:02 pm on Feb 2, 2011 (gmt 0)

If I cannot control the server how can I implement a 410?
Is there a htacess / rewrite statement?


You can also do it with PHP and other server side scripting languages which is very useful for dynamic content.

[php.net...]

tedster




msg:4261940
 11:02 pm on Feb 2, 2011 (gmt 0)

That's one of those times where I might say noarchive comes into play.

I don't follow, p1r. If there's already a 301 for that URL, then Google would never see the noarchive. Or did you mean having noarchive there from the start?

pageoneresults




msg:4262014
 2:40 am on Feb 3, 2011 (gmt 0)

If there's already a 301 for that URL, then Google would never see the noarchive.


Understood. It's just a hunch tedster. Since implementing noarchive years ago across all sites that I manage, many of the issues discussed in the fora don't affect us. From scraping to all sorts of other things that people discuss about cache.

Something is wrong somewhere in the process for Google to hold onto old URIs when there is a 301 in place. Apparently it is not seeing the 301? Or is not seeing properly? I dunno. I just "think" that noarchive sends a different set of signals to Googlebot and causes things to happen faster and more efficiently. I may be totally off my SEO rocker too. ;)

tedster




msg:4262021
 2:57 am on Feb 3, 2011 (gmt 0)

If you can ever pull together a pile of data on that I'd be very interested.

aakk9999




msg:4262039
 4:34 am on Feb 3, 2011 (gmt 0)

I am quite interested in this area, I have some questions.

@rogerd, can you see the old pages spidered? I.e. has 301 been discovered and page still not dropped from the index or is it the question of Google crawling old pages and finding 301 but still holding to the page in its index? Also, if they are both in index, which one is ranking more prominently, the new or the old 301-ed page (or better - can you see 301-ed page only with the site: command?)

@tedster, the same question with regards to spidering, from what you wrote above I am presuming you are seeing both, new and old URLs in regular SERPs? Are old 301-ed pages ranking better than new ones or is it a mix?

I am also wondering if with a brand new site Google is even slower to drop 301-ed URLs - almost like "This is a new site, I am not sure if this is what you really want or are you still messing around with URLs.."

I have noticed that in the last 6 months or so (perhaps something to do with Caffeine going live), the whole process of dropping redirected URLs is longer than before. It is almost as "we have a capacity so we can now hold onto the stuff in index longer."

I could be wrong but this is my observation.

tedster




msg:4262064
 5:22 am on Feb 3, 2011 (gmt 0)

@aakk9999 - no, I don't usually see the 301'd URLs in the regular keyword rankings, although sometimes they do seem to get stuck in the site: operator results for a while. So I was hoping to clarify what rogerd meant when he described "still has the old pages in its index".

zerillos




msg:4262164
 9:34 am on Feb 3, 2011 (gmt 0)

the 301 works only to a certain point. I recently messed around with putting a mobile version of my website. Because of my incompetence, googlebot (and not googlebot mobile) got to spidering the mobile content, which resulted in my whole site getting duplicate content. This just took less than 48 hours!

I pulled out the mobile content and 301 all the pages to their desktop versions. After 24 hours, around 80% of the duplicate content was out of the index. This happened around 3 month ago, and there are still around 10% of those pages in the index.

pageoneresults




msg:4262209
 1:07 pm on Feb 3, 2011 (gmt 0)

If you can ever pull together a pile of data on that I'd be very interested.


Both you and rogerd have the opportunity to put it to the test. :)

rogerd




msg:4262213
 1:20 pm on Feb 3, 2011 (gmt 0)

Ted, the pages in question show up using the site: operator but not in any of the keyword searches I tried. The issue is complicated by the fact that during the period of site malfunction that created the bad URLs, there was also a spam link injection hack. Hence, these nonexistent pages have unrelated spammy links, which could also explain why they don't show up for keyword searches. That's also why I'm anxious to get them out of the index, even though they likely get no search traffic.

A little more investigation on one content page shows that neither the new version nor the old, bogus version of the page is showing up even in exact phrase searches. Multiple pages linking to the correct URL are in the index, so I'd expect the new link to be spidered readily.

Most of the site is indexed correctly and is ranking for relevant keywords. I have to think ig Googlebot would just visit the URL in its index & find the 301 to the new URL, it would be fixed.

rogerd




msg:4262214
 1:31 pm on Feb 3, 2011 (gmt 0)

I'm a believer in hunches based on experience, p1r. One experiment found that people playing a game with two decks of cards, one riskier and less profitable than the other, responded subconsciously (measured with biometrics) before they could consciously identify one deck as being worse than the other.

You are likely in better tune with your subsconscious than the rest of us!

tedster




msg:4262360
 5:56 pm on Feb 3, 2011 (gmt 0)

One of the challenges is that Google is (and needs to be) complicated about how they deal with 301 redirects. They used to be a spammer haven, so all kinds of trust checking needs to occur.

Also technical errors are extremely common, so they can't just abandon a URL once they've verified that a 301 is in place - the website may easily change their mind. In other words, a "permanent" redirect is not truly permanent in the practical world of today's web.

So Google does have a challenge with 301s - and as long as the legacy URL is only showing in the site: operator results and not sending actual search traffic because it is the version that is ranking.

aakk9999




msg:4262485
 11:13 pm on Feb 3, 2011 (gmt 0)

@rogerd I have not had such experience, but just a thought... you had spam link injection to wrong URLs, then you redirected these wrong URLs to correct ones. So perhaps the correct URLs would now "inherit" these spammy links via 301 redirect? Could this be the reason why (if I understood well) the "good" URLs are not ranking either?

Normally 301 is a solution with leaked unwanted URLs that result as a technical error, but perhaps because of spammy links maybe this is not the best solution here? Mind you, I do not know if these wrong URLs have gained other "good" links whilst they were exposed.

As tedster said above, legacy URLs that are 301-ed are not just abandoned after a while - which is what I noticed even more so in the last 6-8 months.

E.g. I have a case where large number of URLs were redirected 2 years ago, and this redirection went really well and old URLs were dropped from index completely (no reports in site:operator, no reference to them anywhere in WMT). But then another technical mistake was made in September and a small subset of previously redirected URL "lost" the redirect for a couple of weeks - even though they were NOT referenced from within the site and as far as we could see there were no links to them. Despite that, in my case they were back in index pronto. Fortunately, WMT has reported duplicate titles and this is how we found about these and re-installed the redirect. They now mostly disappeared again (although it took longer than instating the original redirect), and there are few that still hang around somewhere at the end of the list that site: operator produces.

I have noticed that as of last 6-8 months Google is exposing to us via WMT much larger set of legacy data it knows about. My opinion (which may be wrong) is that whilst it had this data all along maybe the data was "archived" somewhere because maybe the old infrastructure might not have supported easy access to such data volume. And now the new infrastructure perhaps allowed for this data to be included more readily. This is just a speculation though.

xtremeux




msg:4269924
 10:10 pm on Feb 20, 2011 (gmt 0)

Block the URL in robots.txt file and submit this URL in Google webmaster tools.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved