Welcome to WebmasterWorld Guest from 22.214.171.124
Forum Moderators: goodroi
My goal is to learn how robots, crawlers and spiders deal with content on pages that have a redirect or meta refresh. If you have any advice about this and then how its affected with and without exceptions in the robots.txt file i would greatly appreciate!
As a noob working with php im guessing the ways of redirecting from one page to another are doing
- a simple meta-refresh
- a php header change
when a crawler comes across these kinds of page with redirects included does it -
a) scan the page and then jump to the redirected page
*result - both original and redirected page are indexed
b) scan the page and then stop
*result - only original page gets indexed
c) ignore original page, jump to redirect page and scan
* result - only redirect page gets indexed
I am curious as im using redirects a bit this week and wondering how relevant content(for indexing) would be treated on these 'original' pre-redirected pages.
It would be worthwhile creating some pages with the types of redirects you have in mind, and seeing how spiders react to them. I'd also recommend a read of the HTTP/1.1 Redirection Status Codes [w3.org].
My expectation would be
- 'meta refresh' redirects would be interpreted as an 'unknown' or temporary redirect (as if it delivered a 302 status code). Both the original and destination URLs can be indexed
- Server side redirection would be interpreted according to the status code. 301 means only one URL indexed
Note that server-side redirects (with an appropriate status code) don't deliver any page for search engines to spider. If a URL always redirects, there's no content to 'scan'. But that doesn't mean that search engines won't index that URL.