robots and redirecting

Forum Moderators: goodroi

Message Too Old, No Replies

robots and redirecting

i am wondering how a robot would act on a page with a redirect

melanger

2:55 am on Mar 7, 2008 (gmt 0)

Hi all,

My goal is to learn how robots, crawlers and spiders deal with content on pages that have a redirect or meta refresh. If you have any advice about this and then how its affected with and without exceptions in the robots.txt file i would greatly appreciate!

As a noob working with php im guessing the ways of redirecting from one page to another are doing
- a simple meta-refresh
- a php header change
- some kind of javascript redirect

when a crawler comes across these kinds of page with redirects included does it -

a) scan the page and then jump to the redirected page
*result - both original and redirected page are indexed

b) scan the page and then stop
*result - only original page gets indexed

c) ignore original page, jump to redirect page and scan
* result - only redirect page gets indexed

I am curious as im using redirects a bit this week and wondering how relevant content(for indexing) would be treated on these 'original' pre-redirected pages.

thanks again

Receptional Andy

8:56 pm on Mar 8, 2008 (gmt 0)

Hi and welcome [webmasterworld.com], melanger :)

It would be worthwhile creating some pages with the types of redirects you have in mind, and seeing how spiders react to them. I'd also recommend a read of the HTTP/1.1 Redirection Status Codes [w3.org].

My expectation would be

- 'meta refresh' redirects would be interpreted as an 'unknown' or temporary redirect (as if it delivered a 302 status code). Both the original and destination URLs can be indexed
- Server side redirection would be interpreted according to the status code. 301 means only one URL indexed
- It depends on your javascript, but javascript may not trigger a redirect at all. Spiders parse javascript, but don't usually execute it

Note that server-side redirects (with an appropriate status code) don't deliver any page for search engines to spider. If a URL always redirects, there's no content to 'scan'. But that doesn't mean that search engines won't index that URL.