Forum Moderators: goodroi
The firewalls in question are by a company called SonicWALL and part of their firewall settings it to protect your websites from Spybots (what the tech support called them) however these spybots are ANYTHING that request robots.txt and there isn't any customization to these settings to allow certain bots. In the firewall anything that requests robots.txt is bad. The tech support had no clue what we were talking about and said nobody else had complained of the issue.
This post is to help other people trying to figure out what is stopping the spiders from accessing their sites. Spider, Googlebot, Yahoo Spiders wont spider can't access websites blocked by firewall (Sonic Firewall)
If you own a SonicWALL Firewall the settings you need to disable to allow spidering are:
SID 1600 - [software.sonicwall.com...]
SID 1601 - [software.sonicwall.com...]
Nice concept to integrate this into the firewall/ids however without customization your site wont be spidered. What these settings do is drop the packets of anyone requesting robots.txt, when you drop a packet your server does not even respond to the requesting host (ie: googlebot, yahooslurp) thus them thinking your site is offline.
This has caused me a lot of pain and trouble, hope this post helps other people. Wonder if other firewalls restrict this or they plan on it.
In my case it wasn't an issue with waiting for spiders on a new site. These were sites well indexed, it was noticing that all of a sudden with an update all the serps on the server dropped. Some engines drop faster than others and some treat non-accessable robots.txt and existing content in different manners. Many spiders were still seen in the logs, just no requests for robots.txt from the ones that did show :)
[edited by: EliteWeb at 1:43 am (utc) on May 4, 2005]
Remember there's a lot of people who don't know about SEO at all so it hasn't become issues for majority of people out there unless they notice it.
Makes it so you can add another thought in besides thinking the whole host is blacklisted by google :D
>>>Any site operating behind a SonicWALL or a Cisco device should have a network admin sufficient in one of these devices.
In a beautiful world, everything that should happen will. Unfortunately, Norman Rockwell isn't painting this picture we call life, and what should maybe won't.
If this report is true, Sonicwall should be ashamed for such a blunder.
But I don't see how it is SonicWALL's fault. The device is doing exactly what it is installed to do, provide the utmost level of security for everything behind it. It's up to the consumer to hire a certified engineer to configure it properly.
Elite, was this a hosting company that did this without notifying you, or did you hire a company to install it? What version of the SonicWALL OS was it?
Thanks,
Chip-
Not sure of the OS version of it right now.
If this report is true, Sonicwall should be ashamed for such a blunder.
It's not a blunder. Far from it, it's just part of good practice.
The device is a firewall. It is meant to be the first line of defence when protecting a network.
This ain't a Windows workstation we're talking about...
These things come locked down! They're supposed to be like that so that users don't unwittingly leave open a big security hole.
When you buy a firewall, it's locked down so nothing can get through. Nothing. No traffic at all.
It's then the network administrator's responsibility to open up the device to let through the traffic that is needed for the network to operate.
When you install a firewall nothing should work, that's the point! In this case, the network operator opened up the ports to allow normal web traffic, however he/she didn't let through the spider traffic. They didn't read the manual.
I think it's an open and shut case as to who made the blunder:
The firewall's job is to prevent access to a network.
The Network administrator's job is to keep a network running smoothly.
a robots.txt isnt quite an exploit
I can think of a number of successful web site break ins where the cracker started off by using information that a clueless webmaster stored in robots.txt
Have a read of this...
a security site-based in Estonia, has uncovered the elementary mistake in RIAA's robots.txt files which gave the crackers their back door.
[theregister.co.uk...]
The firewall protects against all possible network hacks. If robots.txt wasn't necessary for your situation, then it should have been disabled.
[edited by: ThomasB at 1:32 pm (utc) on May 4, 2005]
[edit reason] linked to original article [/edit]
Have a read of this...a security site-based in Estonia, has uncovered the elementary mistake in RIAA's robots.txt files which gave the crackers their back door.
Don't stop there. Read on:
"This organization must be employing a blind webmaster if he did not figure out that this very passwordless admin module at www.thatsite.org/admin was used to deface the website. There was also no filtering to prevent uploading mp3 files through the PDF upload section. That would also explain how illegal mp3 music files appeared on this anti-piracy site," explained Holmes smugly.
Blocking any UA (spider or browser) that accesses robots.txt is surely not a solution. RTFM? Yes, but you should be able to assume that a high-end fireWall comes with reasonable default settings. Blocking anything that requests robots.txt is not reasonable. It's like cutting your phone line because you might receive prank calls.
Always remember the golden rule of IT security: Nothing is 100% secure. A secure system is one that has the right balance between prey value, attack effort, counter measure effort and usability impediments.
[sonicwall.com...]
Firewalls that do not have the service activated(it's a pay-for add-on) are not "blocking" spiders from accessing the robots.txt files. And it can be disabled, as noted above.
Chip-
Whoaaaa! Check out all the marketing mumbo-jumbo and meaningless technobabble on that page!
Utilizing a configurable, high-performance deep packet inspection architecture
deep packets! WOW!
deep packet inspection engine
Hold on, a minute ago it was a deep packet architecture!
intelligent file-based virus and malicious code prevention
Intelligent eh? Would it pass the Turing test?
scanning packet payloads for worms
Packet payloads! Most people would scan the packets looking for payloads, but no, these guys are one step ahead of the game!
scanning in real-time for decompressed and compressed files containing viruses
This one makes me laugh. I can visualise computer-illiterate managers with their clipboards and pens jotting down notes from each of the firewall manufacturers web sites...
"ooh, this one supports decompressed files, none of the others say they support that, it must be good!"