Forum Moderators: open
Suppose I have a folder. It is called /admin/
In this folder I have another folder /admin/SecretSauce/
In my robots.txt I have a
Disallow: /admin/
so all the nice spiders do not go in there and bite things.
Now suppose I make a directory called /SS/
The soul purpose of this directory is to redirect someone who works on the SecretSauce by typing in a short, easy to remember URL. But this url is only a redirect.
<% Response.Redirect "/admin/SecretSauce/" %>
That is all that is on the default.asp page of /SS/
What would a spider gain from this directory? /SS/ with only a redirect?
Of course the folder "/admin/SecretSauce/ has the most amazing authentication scheme known to human kind...
Just wondering if spiders care about directory names?
Paul
Well, the _nice_ spider finds a link to /SS/colic-and-chives.htm. First it carefully checks your robots.txt to see whether it's permitted to go there. Since /SS/ isn't mentioned, the spider requests the page.
Now your server gets the /SS/ request, remaps it, and returns /admin/SecretSause/colic-and-chives.htm, no problem. And the spider swallows it, presumably getting deathly ill.
Surely that wasn't what you wanted.
Anyone can type in www.mysite.com/robots.txt and find all the disallows. This is were all the "goodies" are.
I made it so that this /ss/ is not in the robots.txt to avoid giving away hints.
It gets what the server sends it, that's all.
The server COULD return a 301 or 302 status code admitting the data wasn't really from where the spider accessed ... but if it doesn't, the spider has no way of knowing that the request was redirected.
This isn't like client-side redirection -- either HTTP redirection which the spider can know about, or Javascript redirection which the spider probably won't know about.
This redirection happens strictly within the server, and the server absolutely controls everything about what the spider sees.