Forum Moderators: goodroi

Message Too Old, No Replies

Issue: Url blocked by robots.txt.

Cause?

         

aherman

5:53 pm on Oct 31, 2014 (gmt 0)

10+ Year Member



I apologize if I'm offending anyone by this new thread, but after searching around it seemed like this would be my best option. Again: sorry for any offense if any is taken.

Issue:

Verified a new, 5 page site with G 2 days ago - no problem.

Then submitted sitemap.xml - it indexed the 5 pages and returned 5 warnings, all Url blocks.

Checking back w/ WMT today, the site has accumulated a total of 25 of the same warning now.

Here's the pertinent part of the map:

User-agent: NerdyBot
Disallow: /

User-agent: *
Disallow: /ajax/
Disallow: /apps/


The only diff between this site and all the others I've ever done - none with which I've encountered this issue - is that I have 4 domain names connected to it:

com is the anchor and net, org, and biz are all URL redirected to com.

And...

com, biz, and org are all in the 4th day of a 5 day transfer pending status.

Is com being in transfer status the reason for the warning?

If no, any other comments are definitely most welcome.

Thank you.

not2easy

8:02 pm on Oct 31, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



The snippet posted above is not part of your sitemap (I hope) because it belongs in your robots.txt file. Make sure that the robots.txt file is in the root directory. Avoid blank lines in the file. The robots.txt file can be tested in your GWT account, these instructions are from Google at [support.google.com...]
Test your robots.txt

From the Webmaster Tools Home page, choose the site whose robots.txt file you want to test.
Expand the Crawl heading on the left dashboard, and select the robots.txt Tester tool.
Make changes to your live robots.txt file in the text editor.
Scroll through the robots.txt code to locate the highlighted syntax warnings and logic errors. The number of syntax warnings and logic errors is shown immediately below the editor.
Type in an extension of the URL or path in the text box at the bottom of the page.
Select the user-agent you want to simulate in the dropdown list to the right of the text box.
Click the TEST next to the dropdown user-agent list to run the simulation.
Check to see if TEST button now reads ACCEPTED or BLOCKED to find out if the URL you entered is blocked from Google web crawlers.

The pending Transfer status could be a cause of accessibility problems, but not the error message you are seeing.

lucy24

8:08 pm on Oct 31, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Here's the pertinent part of the map:

I hope that was a typo for "pertinent part of robots.txt". The robots.txt file can include a pointer to the sitemap --which, incidentally, you shouldn't even need on a five-page site-- but not the other way around. (That is: sure, you can name robots.txt on your sitemap-- but you don't want search engines to index it do you?)

Do all four domains have the same robots.txt?

Does the transfer include moving to a different server? One possibility is that google is looking in the wrong place, and that this wrong place includes a robots.txt excluding everyone.

Sometimes problems really do fix themselves if you just wait a few days. But what happens if you try "fetch as googlebot" or the robots.txt tester in wmt?

aherman

3:09 pm on Nov 2, 2014 (gmt 0)

10+ Year Member



Thanks for the responses - I appreciate them.

The pertinent part of "the map" I exhibited is actually (and obviously) the robot.txt itself - I apologize for my amateurism.

Conferring with my host and having been reassured that everything on my site is working fine to them and that there's no evidence whatsoever of what WMT is warning of, I'm pretty comfortable with siding with the cause simply being Google's crawling proclivities.

Wait a few days is great advise; maybe wait a week is even better sometimes?

Thanks again, folks. And thanks, WW.

penders

10:30 am on Nov 3, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Disallow: /ajax/


This presumably includes JavaScript files etc? Google has recommended for quite a long time that you should not block JS and CSS files (https://www.youtube.com/watch?v=B9BWbruCiDc - Matt Cutts, March 2012) from Googlebot as it can use these to help index your site.

In fact, Google have made this "more official" very recently and updated their webmaster guidelines [googlewebmastercentral.blogspot.co.uk]:

Disallowing crawling of Javascript or CSS files in your site’s robots.txt directly harms how well our algorithms render and index your content and can result in suboptimal rankings.


So, I wonder if they have also updated their warnings in GWT?