Welcome to WebmasterWorld Guest from 35.173.48.224

Forum Moderators: goodroi

Message Too Old, No Replies

Robots.txt : Allow Two Subfolders in a Disallowed Folder

     
12:20 am on Aug 14, 2015 (gmt 0)

New User

joined:May 13, 2015
posts: 6
votes: 0


Hi All

I just noticed in Google Webmaster Tools that some image elements were being blocked due to a Disallow directive in Prestashop's default Robots.txt.
The default prestashop robots.txt contains the directive : Disallow: */modules/
However, this is causing the top banner and image slider images to be blocked. These images reside in sub folders of the theme configurator and image slider modules.
I searched many resources for a way to allow crawlers to access these subfolders while blocking all other content in the modules folder.
After many trials I found the Robots Text Generator on Websmaster World.
I had originally inserted two Allow directives before the Disallow directive for the modules folder.
It didn't work but when I imported the file into the tool, I see these directives have been moved to the top of the file and now looks like :

User-agent: *
Allow: */modules/themeconfigurator/img/
Allow: */modules/homeslider/images/
Disallow: /*orderby=
Disallow: /*orderway=
Disallow: /*tag=
Disallow: /*id_currency=
Disallow: /*search_query=
Disallow: /*back=
Disallow: /*n=
Disallow: /*controller=addresses
Disallow: /*controller=address
Disallow: /*controller=authentication
Disallow: /*controller=cart
Disallow: /*controller=discount
Disallow: /*controller=footer
Disallow: /*controller=get-file
Disallow: /*controller=header
Disallow: /*controller=history
Disallow: /*controller=identity
Disallow: /*controller=images.inc
Disallow: /*controller=init
Disallow: /*controller=my-account
Disallow: /*controller=order
Disallow: /*controller=order-opc
Disallow: /*controller=order-slip
Disallow: /*controller=order-detail
Disallow: /*controller=order-follow
Disallow: /*controller=order-return
Disallow: /*controller=order-confirmation
Disallow: /*controller=pagination
Disallow: /*controller=password
Disallow: /*controller=pdf-invoice
Disallow: /*controller=pdf-order-return
Disallow: /*controller=pdf-order-slip
Disallow: /*controller=product-sort
Disallow: /*controller=search
Disallow: /*controller=statistics
Disallow: /*controller=attachment
Disallow: /*controller=guest-tracking
Disallow: */modules/
Disallow: */classes/
Disallow: */config/
Disallow: */download/
Disallow: */mails/
Disallow: */translations/
Disallow: */tools/
Disallow: */tests/

However, after resubmit the file in Google's robots.txt Tester and submitting the blocked banner image URL's it still shows as blocked.

I'm at a loss and would be grateful for review and advice as to what I'm doing wrong, or if what I'm trying to do is actually achievable.

Thanks
deepee
1:22 am on Aug 14, 2015 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4391
votes: 310


Try it with the two "Allow" lines immediately following the related Disallow line as in:
Disallow: */modules/
Allow: */modules/themeconfigurator/img/
Allow: */modules/homeslider/images/

Rather than at the top of the list.
2:28 am on Aug 14, 2015 (gmt 0)

New User

joined:May 13, 2015
posts: 6
votes: 0


Thanks for your reply.
I had tried that structure originally and tried again just now.
Unfortunately after sumbitting one of the blocked URL's in the tester it is still showing as blocked.
Wondering if anything looks wrong with my syntax?
3:27 am on Aug 14, 2015 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4391
votes: 310


There is a tool in the (GWT) Search Console to help you generate a robots.txt file. You might try using that if it is still not following your file correctly. They also offer advice here: [developers.google.com...]

There should not have been an issue with crawling the disallowed folders with an Allow: line specifying the permitted directories right after the disallow lines. Did you first fetch the new robots.txt file? If not they may be using their cached copy - what do you see when you view robots.txt tester? You should be able to test the changes right in the Console, using URLs they say are blocked, and editing the file they are using. Is your file UTF-8 encoded - that is their preferred format and they say it may be incorrectly parsed if it is not.
6:27 am on Aug 14, 2015 (gmt 0)

New User

joined:May 13, 2015
posts: 6
votes: 0


Thanks for that but I've been using the Search Console to generate the file.
Same issue when I enter a URL from /modules/themeconfigurator/img/ in the test field. It errors by highlighting the "Disallow: */modules/" line in red.
Next to the URL test field the "test" button changes to "Blocked"

Also tried editing the file in notepad++ with UTF-8 selected but I get the same error when I test it.
I'm baffled.
2:18 pm on Aug 14, 2015 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4391
votes: 310


Since I need to use a similar syntax, I went to look for reasons why yours might not be working as expected. The only thing I see (at the URL I shared above) is that the */ syntax is not used in their specifications. It explains how they use /* but not */ so if you have a number of possible URLs for the /modules/ you might just omit the trailing * wild-card. Also check file permissions.
5:06 pm on Aug 14, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:10110
votes: 1002


Just checking.... the banner and image sliders show on the website, correct? All you are concerned with is they are not indexed? If that is the problem, then one option is to put copies of those images in an accessible folder on your system and not worry about it.... but ONLY IF all you want is for the images to be indexed.

If the images are blocked to the user by G, then that's a whole different can of worms!
11:05 pm on Aug 14, 2015 (gmt 0)

New User

joined:May 13, 2015
posts: 6
votes: 0


Thanks to you both for your suggestions.
Putting it in perspective, tangor +, I think you hit the nail on the head.
I guess I have been somewhat paranoid recently, when I see any errors in Webmaster Tools.
This follows a nightmare couple of months involving a major software version upgrade, the transition to site-wide SSL and my development site in a subdomain folder being indexed.
Duplicate content penalties and bad URL rewrites lost me a lot of indexing and things are only just starting to improve.
The blocked images show on the website so as long as multiple pages that contain these blocked resources don't incur any penalties from Google, I may just leave them alone.