Forum Moderators: open

Message Too Old, No Replies

Spider Question

Just wondering if spiders care about directory names?

         

pelicanPaul

5:27 pm on May 20, 2003 (gmt 0)

10+ Year Member



Just wondering if spiders care about directory names?

Suppose I have a folder. It is called /admin/
In this folder I have another folder /admin/SecretSauce/

In my robots.txt I have a
Disallow: /admin/
so all the nice spiders do not go in there and bite things.

Now suppose I make a directory called /SS/
The soul purpose of this directory is to redirect someone who works on the SecretSauce by typing in a short, easy to remember URL. But this url is only a redirect.

<% Response.Redirect "/admin/SecretSauce/" %>

That is all that is on the default.asp page of /SS/

What would a spider gain from this directory? /SS/ with only a redirect?
Of course the folder "/admin/SecretSauce/ has the most amazing authentication scheme known to human kind...

Just wondering if spiders care about directory names?

Paul

hutcheson

9:17 pm on May 20, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>What would a spider gain from this directory? /SS/ with only a redirect?

Well, the _nice_ spider finds a link to /SS/colic-and-chives.htm. First it carefully checks your robots.txt to see whether it's permitted to go there. Since /SS/ isn't mentioned, the spider requests the page.

Now your server gets the /SS/ request, remaps it, and returns /admin/SecretSause/colic-and-chives.htm, no problem. And the spider swallows it, presumably getting deathly ill.

Surely that wasn't what you wanted.

pelicanPaul

9:54 pm on May 20, 2003 (gmt 0)

10+ Year Member



Actually. If they did get deathly ill that would be fine. The concept is that by putting a directory in your robots.txt file you are inviting someone into the areas you really wanted them not to go.

Anyone can type in www.mysite.com/robots.txt and find all the disallows. This is were all the "goodies" are.

I made it so that this /ss/ is not in the robots.txt to avoid giving away hints.

pelicanPaul

9:55 pm on May 20, 2003 (gmt 0)

10+ Year Member



Oh yeah. In /ss/ the only thing there is a redirect. That is important. No content whatsoever...

hutcheson

12:50 am on May 21, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>In /ss/ the only thing there is a redirect. That is important. No content whatsoever...

Important to whom? the spider? it's no chitin off its pedipalps where the content actually lives. It put up a URI it wasn't asked not to access, it got data, it goes on to the next link.

pelicanPaul

2:24 pm on May 21, 2003 (gmt 0)

10+ Year Member



hmmm,

The question is what does a spider get from a directory that is only a redirect? Does it record its name (/ss/) or does it record anything else?

So when someone goes into Google and types in SS they get that directory which would then do the redirect...

That is all that I am asking...

hutcheson

9:00 pm on May 21, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>The question is what does a spider get from a directory that is only a redirect? Does it record its name (/ss/) or does it record anything else?

It gets what the server sends it, that's all.

The server COULD return a 301 or 302 status code admitting the data wasn't really from where the spider accessed ... but if it doesn't, the spider has no way of knowing that the request was redirected.

This isn't like client-side redirection -- either HTTP redirection which the spider can know about, or Javascript redirection which the spider probably won't know about.

This redirection happens strictly within the server, and the server absolutely controls everything about what the spider sees.