Forum Moderators: goodroi

Message Too Old, No Replies

How to exclude Oracle Portal Help pages using robots.txt

Need to exclude Oracle Portal Help Pages from search engines

         

atul_iiit

4:53 pm on Jul 19, 2005 (gmt 0)

10+ Year Member



Hi,
We have Orcale Portal ( Oracle Application Server 10G Installed ) we don't want webcrawlers ( robots ) to index portal help pages which are

[<sitename>...]

portalHelp alias is defined in mod_oc4j.conf as

Oc4jMount /portalHelp OC4J_Portal
Oc4jMount /portalHelp/* OC4J_Portal

Kindly suggest what we need to exclude .

Regards
atul

jatar_k

4:58 pm on Jul 19, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Welcome to WebmasterWorld atul_iiit,

That's the web based db management stuff isn't it?

If that's the case, I would use something more reliable such as htaccess and throwing a user pass on there to keep them out for sure.

the robots.txt syntax would be

User-agent: *
Disallow: /portalHelp/

atul_iiit

5:07 pm on Jul 19, 2005 (gmt 0)

10+ Year Member



Thanks for prompt reply but in that case shouldn't /portalHelp/ be a actual directory .

Does Virtual directory also works fine?

Since /portalHelp/ is pointing to OC4J_Portal, I want to clarify that it won't disable indexing for other things in OC4J_Portal .

Also can you please specify whats this htaccess?

Regards
Atul

jatar_k

5:38 pm on Jul 19, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



>> shouldn't /portalHelp/ be a actual directory

well, it just needs to be a directory from a crawler's perspective

atul_iiit

12:34 pm on Jul 20, 2005 (gmt 0)

10+ Year Member



This is just a alias & not directory in actual .

So can I still put this in robots.txt?

Will it work?

ThomasB

9:27 pm on Jul 20, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



atul_iiit, as already pointed out by jatar_k, as long as it looks like a directory (a "/" at the end in the URL) for a search engine, it won't matter.

atul_iiit

3:18 pm on Aug 4, 2005 (gmt 0)

10+ Year Member



Thanks a lot to all who replied , I have another doubt now.

I am putting my robots.txt in DocumentRoot of my webserver but we have restricted anyone to go to document root by putting access denied

so when I use

[<hostname>.<domainname>...]
I see access denied page

Forbidden
You don't have permission to access /robots.txt on this server.

Can my robots read this file in this case . What should I do so that yahoo or google robots can read this page .

Regards

atul_iiit

3:19 pm on Aug 4, 2005 (gmt 0)

10+ Year Member



Thanks a lot to all who replied , I have another doubt now.

I am putting my robots.txt in DocumentRoot of my webserver but we have restricted anyone to go to document root by putting access denied

so when I use

[<hostname>.<domainname>...]
I see access denied page

Forbidden
You don't have permission to access /robots.txt on this server.

Can my robots read this file in this case . What should I do so that yahoo or google robots can read this page .

Regards

ThomasB

6:05 pm on Aug 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



atul_iiit, you have to allow users (and robots) to access this file ofcourse. Otherwise they can't read and honor it.

atul_iiit

9:12 am on Aug 5, 2005 (gmt 0)

10+ Year Member



Thanks thomas.