Forum Moderators: martinibuster

Message Too Old, No Replies

BlockedURL help

         

MrBaseball34

4:09 pm on Mar 30, 2011 (gmt 0)

10+ Year Member



We only want the bots to access a certain directory on our site that has static pages built every day. However, our robots.txt file looks like this:

User-agent: *
Crawl-delay: 10
Disallow: /
Allow: /bots


The AdSense bot is blocking our main URL and not showing ads on our site. How can I fix this?

It would be ok for AdSense bots to have access to the whole site but the regular bots, we do not want them to have access to anything except the bots directory.

DaStarBuG

4:18 pm on Mar 30, 2011 (gmt 0)

10+ Year Member



Add this to your robots.txt

User-agent: Mediapartners-Google
Allow: /

However make sure that what you are doing does not fall under "cloaking" which means showing bots different content then your users.

MrBaseball34

5:03 pm on Mar 30, 2011 (gmt 0)

10+ Year Member



Does that go before or after the original stuff in robots.txt?

MrBaseball34

5:07 pm on Mar 30, 2011 (gmt 0)

10+ Year Member



We essentially have to do this with static pages because we run an inventory listing service with 2.1 million unique items. The script we originally had, which has works for a long time, queried the database and built a table with "pages" of the data with 10000 records each.

Something changed in the last couple of months where the bots were beginning to hit that script a lot more and it began to bring our server to its knees with the database requests.

We finally had to disable the script and find another way. We daily create static html pages of the data just like the old script created but these are stored in one directory which I have the bots access. I do not want the bots to access anything else on our site because of the database interaction.

Now could that be deemed cloaking?

DaStarBuG

5:50 pm on Mar 30, 2011 (gmt 0)

10+ Year Member



robots.txt goes from specific to generic so:

User-agent: Mediapartners-Google
Allow: /

User-agent: *
Crawl-delay: 10
Allow: /bots
Disallow: /