Welcome to WebmasterWorld Guest from 188.8.131.52
Forum Moderators: goodroi
I am having some major issues with the URL removal tool in webmaster tools.
I have blocked all the directories & files I want removed from the SERPs in the robotx.txt file (syntax and structure of robots is fine according to various validation tools – I know myself its fine but this problem has had me doubting myself!)
The issue is that I keep getting removal requests DENIED even though the URL has been blocked in the robots file. I have tried many times over the past 4-5 weeks to no avail.
Another issue (may be related) is that when I test the URL against the robots file using the “test robots.txt” function it says the directories and files are allowed....I can’t for the life of me figure out what is wrong.
I am at a complete loss here so all help greatly appreciated!
If you are using metatags, make sure your robots.txt is not blocking access. Make sure you are using an official robots.txt validator to ensure your robots.txt is doing what you want it to do. Good luck.
Thanks for replying to this. I will investigate further but I think all is in order regarding the issues you have suggested.
An interesting (weird!) thing has shown up in Google webmaster tools in that when I test against the robots file it does not block the files specified in the robots file when the setting is User-agent: * - yet when I change the setting to User-agent: Googlebot it responds correctly i.e. blocking the files and folders correctly disallowed in the robots file......strange goings on indeed!
The problem could be one of syntax, structure, or user-agent-policy-record priority.
I can tell you that most robots.txt validators are flawed, and that none of them use that actual search engine's robots.txt parsing code to evaluate the file... I've found discrepancies in *all* major search engines' robots.txt validation tools.
I am 100% certain syntax etc are fine (I have been creating robots files for years) there is nothing complex in the robots file - that is why the problem is driving me nuts...
As mentioned above - the robots file ignores Disallows issues after User-agent: * - but if I change this to Useragent: Googlebot it works perfectly......this seems to be the cause of the URL removals being denied but I still want to know why it is ignoring the Useragent: * command....
> the robots file ignores Disallows issues after User-agent: *
The robots file ignores Disallows issues? AFAIK, the file just sits there on the server and gets fetched by robots, so this statement is unclear. I assume that you mean that Googlebot appears to ignore Disallow directives in the "User-agent: *" policy record, but that it seems to obey them if the "User-agent:" name in that record is changed from "*" to "Googlebot".
Since we can't see the file, a few questions come to mind:
Is there more than one User-agent policy record that applies to Googlebot (i.e. "Googlebot and "*")?
What is the position, relative to "User-agent: *" policy record of the other UA policy records?
Are these other UA policy records more or less "permissive" than the "User-agent: *" record?
Is there a completely-blank line after each UA policy record, including the last one in the file?
Are there any spurious blank lines, say, between a User-agent: line and a "Disallow:" line?
Do all comment lines begin with the required "#" character?
It'd likely be easier to spot the problem with an example to look at, unless you prefer to continue to insist that Gbot is broken instead of making sure that no-one can spot *any* problem --actual or potential-- with your robots.txt file.
Anyway - see below:
Above doesnt work - but if User-agent is changed to Googlebot everything is fine.
The problem has been resolved in that I can now remove the files i want from the SERPs but I would still like to know why the User-agent: * statement appears to be fucntioning incorrectly....
> Now you can see why I am sure everything is fine sytnax wise :-)
I listed policy-record "structure" and "priority" above --in addition to syntax-- as things to check.
Only using one User-agent line and everything really does seem to be in order - I have had 3 others look at this file too.
I have just spoken to someone else about it and he suggest maybe something is up on the server configuration side.....will look into this and report back...
Thanks for all the input Jim - you too goodroi.