Forum Moderators: coopster

Message Too Old, No Replies

Restricted Web pages and Robots

This is SEO/PHP/Apache question

         

anshul

6:40 am on May 12, 2005 (gmt 0)

10+ Year Member



This can be SEO/PHP/Apache question, but I ask here:
I've a directory/Web pages that require cookie/session/HTTP authentication. When bots/spiders visit, they can't index these pages. How can I idetify these robots/spiders and allow to index my Web pages. Normal visitors, still need login. Please reply.

mcibor

3:21 pm on May 12, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There shouldn't be a way to allow robots entering your secure page. What you may do instead is create a version of your page for a robot/spider. If a spider gets into your login page it submits it without any values (and disregarding javascript validation). You may then show a page just for the spider (if user and pass are null), the page shouldn't contain any special or valuable data, because user may do the same (after disabling the javascript).

Hope this helps you somehow
Michal Cibor

anshul

6:33 am on May 13, 2005 (gmt 0)

10+ Year Member



Idea is if we know HTTP_USER_AGENT of bots, we can bypass login/authentication. Can someone tell, what is it for Google, MSN, Yahoo, Altavista, Alexa? I find things like Google bot, Inktomi slurrp, MSN bot in my logs. Please discuss further.

incywincy

7:05 am on May 13, 2005 (gmt 0)

10+ Year Member



anshul it is simple to spoof the user agent therefore a surfer could easily bypass your authentication if you adopt this policy.

you could cloak i guess

anshul

9:38 am on May 16, 2005 (gmt 0)

10+ Year Member



> it is simple to spoof the user agent

How is it that easy?
If Google bot, Inktomi slurrp, MSN bot has fixed ip, we can re-think..

jatar_k

4:29 pm on May 16, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



they all have set ip ranges

there are tons of spider ip lists out there that you could use.