Welcome to WebmasterWorld Guest from 188.8.131.52
But I consider handling this a new way because:
• We have a big load of new user generated content every day. Theoretically it becomes a huge number of pages on my site over time. (theoretically because in practice it does not all get crawled and indexed by Google).
• It will take a lot of resources for Google to crawl all these "old" pages, and thus it takes away resources for the crawl from other more important things on the site.
• Google can/will not keep all these pages in its index and simultaneously index all of our new content. We face that problem now.
• I guess the easiest way to solve this would be to remove the expired pages and return a 404 or 410 header response to Google. So they would be removed from the index and not crawled again.
• But this way you will lose all the link juice that is created when people link to their own content from other sites.
• So my best guess is to do a 301 redirect to a search result page from a similar search. This way you keep the link juice within your site. When the user (coming from the search engine) comes to this page, you could eg. give a small pop up that says: "This content has expired but here is a lot of other pages with similar ......". This pop up comes naturally only when the user was redirected from an expired page.
But I am a little worried about the large number of redirects will have negative effect in Google and I don't know if there are any other disadvantages. How should I handle this? What is your suggestion? Thanks in advance.
[edited by: tedster at 11:24 am (utc) on Aug. 28, 2009]
Because of all the above, it's very likely that the backlinks you are concerned about will begin to lose power quite quickly, even if the webmaster leaves them online after the expiration date. So "squeezing too hard" on the potential link juice is probably not a worthwhile prospect.
But there is something to it, and I understand why you would want to take a look at various approaches.
Have you considered a hybrid? The day after the content "expires", you add an extra notice to the page that the date has passed. Still keep the url live for another 30 or 60 days, and then return a 404 or 410 status. That would make extended use of the available link juice, even as it decays, but it would also avoid many problems.
Note that Google has made it clear that they don't want to see search results indexed in their search results - so I would not suggest going in that direction for your expired content. Offering a lot of urls that are only dynamically generated search results based on the referer could result in penalties after a while.
Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines.
Google Webmaster Guidelines [google.com]
i'm lucky though, because as the events pass by the links to them drop off the calendar, so all my in-site links disappear.
that means they never get crawled, but still remain in place when people visit them.
i've got a little script that checks the date and attaches a message to the top saying it's out of date, and suggesting alternative pages for them to visit.