Spider Question

Forum Moderators: open

Message Too Old, No Replies

Spider Question

Just wondering if spiders care about directory names?

pelicanPaul

5:27 pm on May 20, 2003 (gmt 0)

Just wondering if spiders care about directory names?

Suppose I have a folder. It is called /admin/
In this folder I have another folder /admin/SecretSauce/

In my robots.txt I have a
Disallow: /admin/
so all the nice spiders do not go in there and bite things.

Now suppose I make a directory called /SS/
The soul purpose of this directory is to redirect someone who works on the SecretSauce by typing in a short, easy to remember URL. But this url is only a redirect.

<% Response.Redirect "/admin/SecretSauce/" %>

That is all that is on the default.asp page of /SS/

What would a spider gain from this directory? /SS/ with only a redirect?
Of course the folder "/admin/SecretSauce/ has the most amazing authentication scheme known to human kind...

Just wondering if spiders care about directory names?

Paul

hutcheson

9:17 pm on May 20, 2003 (gmt 0)

>>What would a spider gain from this directory? /SS/ with only a redirect?

Well, the _nice_ spider finds a link to /SS/colic-and-chives.htm. First it carefully checks your robots.txt to see whether it's permitted to go there. Since /SS/ isn't mentioned, the spider requests the page.

Now your server gets the /SS/ request, remaps it, and returns /admin/SecretSause/colic-and-chives.htm, no problem. And the spider swallows it, presumably getting deathly ill.

Surely that wasn't what you wanted.

pelicanPaul

9:54 pm on May 20, 2003 (gmt 0)

Actually. If they did get deathly ill that would be fine. The concept is that by putting a directory in your robots.txt file you are inviting someone into the areas you really wanted them not to go.

Anyone can type in www.mysite.com/robots.txt and find all the disallows. This is were all the "goodies" are.

I made it so that this /ss/ is not in the robots.txt to avoid giving away hints.

pelicanPaul

9:55 pm on May 20, 2003 (gmt 0)

Oh yeah. In /ss/ the only thing there is a redirect. That is important. No content whatsoever...

hutcheson

12:50 am on May 21, 2003 (gmt 0)

>In /ss/ the only thing there is a redirect. That is important. No content whatsoever...

Important to whom? the spider? it's no chitin off its pedipalps where the content actually lives. It put up a URI it wasn't asked not to access, it got data, it goes on to the next link.

pelicanPaul

2:24 pm on May 21, 2003 (gmt 0)

hmmm,

The question is what does a spider get from a directory that is only a redirect? Does it record its name (/ss/) or does it record anything else?

So when someone goes into Google and types in SS they get that directory which would then do the redirect...

That is all that I am asking...

hutcheson

9:00 pm on May 21, 2003 (gmt 0)

>The question is what does a spider get from a directory that is only a redirect? Does it record its name (/ss/) or does it record anything else?

It gets what the server sends it, that's all.

The server COULD return a 301 or 302 status code admitting the data wasn't really from where the spider accessed ... but if it doesn't, the spider has no way of knowing that the request was redirected.

This isn't like client-side redirection -- either HTTP redirection which the spider can know about, or Javascript redirection which the spider probably won't know about.

This redirection happens strictly within the server, and the server absolutely controls everything about what the spider sees.

Spider Question

Just wondering if spiders care about directory names?

pelicanPaul

hutcheson

pelicanPaul

pelicanPaul

hutcheson

pelicanPaul

hutcheson

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week