Forum Moderators: goodroi

Message Too Old, No Replies

disallow all from subdirectory

         

Driven

6:22 pm on Mar 7, 2005 (gmt 0)

10+ Year Member



Hi all,

I have a dynamically generated site. All urls are turned into SE freindly urls, like this:

[mysite.com...]

The folders before the /addfav folder could be ANYTHING, as the site is dynamically generated. The /addfav folder always appears 5 levels deep in the folder structure.

What I am after is for the SEs not to index the /addfav folder or anything below it.

From what I understand, wildcards cannot be used to specify subdirectories, only user-agents... so something like this would not work, correct?

User-agent: *
Disallow: /*/*/*/*/addfav/

What should my robots.txt look like to accomplish this?

Thanks in advance,
Dan

encyclo

6:33 pm on Mar 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



As you've guessed, you can't do this in a robots.txt file as you would need to add a rule for every directory.

As you say that the site is generated, would you be able to add a

<meta name="robots" content="noindex">
to the pages you want to exclude? I reckon that would be a simpler and safer strategy.

Driven

7:21 pm on Mar 7, 2005 (gmt 0)

10+ Year Member



Thank you for the reply. I have thought of this but unfortunately it is not possible since the /addfav "page" does not actually exist. It is simply a function that adds an article to the reader's favorites list then redirects the user to the article.

There is not even any content on the /addfav "page" for the SEs to index, but looking at the serps for my domain many of these "pages" are in the index.

Is my request even possible with robots.txt or am I stuck with these worthless pages being indexed by the SEs?

Thanks,
Dan

encyclo

7:31 pm on Mar 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What sort of redirect are you using? If it is a 302 redirect, then that would explain why the pages are being indexed (you are effectively "hijacking" your own article pages). A 301 redirect would help fix this

As you are using Javascript to add to favorites in IE, could you make the href link a direct link to the article, but with an onclick event to action the bookmarking?

Driven

8:34 pm on Mar 7, 2005 (gmt 0)

10+ Year Member



Sorry 'bout that. The redirect does not add the article to their favorites within their browser, it is adding them to a list of their favorite articles on the webpage.

Go to my homepage listed in my profile, then look at one of the articles. You will see where you can add an article to your favorites list.

That was getting a little bit off topic but thought I should clarify.

Any other thoughts on getting those pages out of the serps?

Thanks,
Dan

Driven

5:16 am on Mar 10, 2005 (gmt 0)

10+ Year Member



Anybody have thoughts on how to do with with robots.txt?

Thanks.

ThomasB

12:20 pm on Mar 10, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Driven, what about putting the functionality in a new script and passing the parameters?

[example.com...]

Then you could easily exclude it like:
User-agent: *
Disallow: /addfav.php

Besides that Google might support wildcards for your need. You might want to try:
User-agent: *
Disallow: /*addfav*

Though I'm honestly not to sure about the "*" at the end.

Lord Majestic

12:49 pm on Mar 10, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Besides that Google might support wildcards for your need.

Might? Do they? I believe not, just like pretty much no other robot: current de facto robots.txt standard requires case-insentitive substring match without any patterns or wildcards.

ThomasB

12:56 pm on Mar 10, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Lord Majestic, Google supports them:
[google.com...]

Lord Majestic

1:21 pm on Mar 10, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I stand corrected, however User-agent specified in example was * - ie all robots and most of them don't (and not supposed to according to current standard) support wildcards. :)

ThomasB

1:26 pm on Mar 10, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Agreed, but GoogleBot will understand it and if one day other search engines support wildcards you won't have to touch your robots.txt :)