homepage Welcome to WebmasterWorld Guest from 204.236.254.124
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
SonicWALL Firewall Blocks Spiders
Settings will block spiders from indexing site.
EliteWeb




msg:1527068
 9:41 pm on May 3, 2005 (gmt 0)

Wondering why your website isn't being spidered? Your firewall may now be part of the issue. We just ran into the issue of a few servers housing a few hundred websites get de-listed from the search engines shortly after these firewalls were installed. At first thinking it was a server wide ban then realizing it wasn't we had to figure out what had changed and the only server changes were the addition of the firewall.

The firewalls in question are by a company called SonicWALL and part of their firewall settings it to protect your websites from Spybots (what the tech support called them) however these spybots are ANYTHING that request robots.txt and there isn't any customization to these settings to allow certain bots. In the firewall anything that requests robots.txt is bad. The tech support had no clue what we were talking about and said nobody else had complained of the issue.

This post is to help other people trying to figure out what is stopping the spiders from accessing their sites. Spider, Googlebot, Yahoo Spiders wont spider can't access websites blocked by firewall (Sonic Firewall)

If you own a SonicWALL Firewall the settings you need to disable to allow spidering are:

SID 1600 - [software.sonicwall.com...]
SID 1601 - [software.sonicwall.com...]

Nice concept to integrate this into the firewall/ids however without customization your site wont be spidered. What these settings do is drop the packets of anyone requesting robots.txt, when you drop a packet your server does not even respond to the requesting host (ie: googlebot, yahooslurp) thus them thinking your site is offline.

This has caused me a lot of pain and trouble, hope this post helps other people. Wonder if other firewalls restrict this or they plan on it.

 

semjenn




msg:1527069
 10:20 pm on May 3, 2005 (gmt 0)

Thanks for the heads up with SonicWall. Totally smart to document that...will be interesting to see if spybot protection will be the new norm for firewall companies.

The Contractor




msg:1527070
 10:34 pm on May 3, 2005 (gmt 0)

Sounds like a pretty stupid way to go about things. The ones that don't read/obey robots.txt are the ones they should develop a blocking mechanism for ...

EquityMind




msg:1527071
 11:42 pm on May 3, 2005 (gmt 0)

This is major, I think in those instances that you are banging your head trying to figure out why you cant get crawled, this is another think to check off the list of 'stuff that could go wrong'

Brett_Tabke




msg:1527072
 11:55 pm on May 3, 2005 (gmt 0)

It is an issue, but on the other hand, any webmaster should notice the issue within a short while and resolve it before it gets too bad.

EliteWeb




msg:1527073
 1:40 am on May 4, 2005 (gmt 0)

However most webmasters havenít any clue about firewalls, they build webpages and not administrate networks nor do they have access to the network equipment in question. In the past firewalls havenít been an issue of connectivity issues from spiders other than the getting banned from too many connections too quick (flood issues) vs what I saw now with the filters automatically rejecting them and doing so in a poor manner (defaulty, in my case). Now this is just something else to add to the checklist when trying to troubleshoot situations.

In my case it wasn't an issue with waiting for spiders on a new site. These were sites well indexed, it was noticing that all of a sudden with an update all the serps on the server dropped. Some engines drop faster than others and some treat non-accessable robots.txt and existing content in different manners. Many spiders were still seen in the logs, just no requests for robots.txt from the ones that did show :)

[edited by: EliteWeb at 1:43 am (utc) on May 4, 2005]

carguy84




msg:1527074
 1:42 am on May 4, 2005 (gmt 0)

Any site operating behind a SonicWALL or a Cisco device should have a network admin sufficient in one of these devices. I don't forsee it causing any headaches when setup properly.

But, good tip for those who think they can plug and play one of these things like a Linksys or a Netgear though!

Chip-

EliteWeb




msg:1527075
 1:48 am on May 4, 2005 (gmt 0)

carguy84, what was great about this issue with the SonicWALL was that it was default settings of firewall protection listed with all the other known exploits to protect against in its thousands of definitions. When bringing the issue to SonicWALL they hadn't any clue of what the issue could have been because nobody ever reported anything like it and anything accessing robots.txt is a spybot. Sure you can go through the hundreds of thousands of items these devices protect against one by one but you would figure if its default its widely known and its a cert'd issue or whatnot. In the end the system admin did indeed figure it out but nothing was documented on it so id figure id share my fun horror story :)

Remember there's a lot of people who don't know about SEO at all so it hasn't become issues for majority of people out there unless they notice it.

Makes it so you can add another thought in besides thinking the whole host is blacklisted by google :D

martinibuster




msg:1527076
 2:37 am on May 4, 2005 (gmt 0)

>>>It is an issue, but on the other hand, any webmaster should notice the issue

>>>Any site operating behind a SonicWALL or a Cisco device should have a network admin sufficient in one of these devices.

In a beautiful world, everything that should happen will. Unfortunately, Norman Rockwell isn't painting this picture we call life, and what should maybe won't.

If this report is true, Sonicwall should be ashamed for such a blunder.

Easy_Coder




msg:1527077
 4:11 am on May 4, 2005 (gmt 0)

I like martinibusters' *Murphys Law* analogy.

carguy84




msg:1527078
 5:51 am on May 4, 2005 (gmt 0)

martini, I understand what you're saying, but we're not talking about a $100 networking device that will be installed by a common user; these are serious machines performing serious tasks. Any certified SonicWALL/Cisco/Juniper/NetScreen admin will be trained enough to set it up correctly. And when dealing with something as important as a revenue channel, I wouldn't be playing around with devices I didn't know inside and out. I'm sure it is a problem for others, because as you stated, these "shoulds" are doubtful to be followed unfortunately :(.

But I don't see how it is SonicWALL's fault. The device is doing exactly what it is installed to do, provide the utmost level of security for everything behind it. It's up to the consumer to hire a certified engineer to configure it properly.

Elite, was this a hosting company that did this without notifying you, or did you hire a company to install it? What version of the SonicWALL OS was it?

Thanks,
Chip-

EliteWeb




msg:1527079
 6:02 am on May 4, 2005 (gmt 0)

carguy84, private facility for the host. However even if you are trained inside and out, the robots.txt filter is right in the same spot as requesting /etc/passwd from the web or pulling a CMD along with thousands of other well known exploits, a robots.txt isnt quite an exploit.

Not sure of the OS version of it right now.

mrMister




msg:1527080
 8:34 am on May 4, 2005 (gmt 0)

The lesson to be learnt from this:

RTFM ;-)

[edited by: mrMister at 8:46 am (utc) on May 4, 2005]

mrMister




msg:1527081
 8:43 am on May 4, 2005 (gmt 0)

If this report is true, Sonicwall should be ashamed for such a blunder.

It's not a blunder. Far from it, it's just part of good practice.

The device is a firewall. It is meant to be the first line of defence when protecting a network.

This ain't a Windows workstation we're talking about...

These things come locked down! They're supposed to be like that so that users don't unwittingly leave open a big security hole.

When you buy a firewall, it's locked down so nothing can get through. Nothing. No traffic at all.

It's then the network administrator's responsibility to open up the device to let through the traffic that is needed for the network to operate.

When you install a firewall nothing should work, that's the point! In this case, the network operator opened up the ports to allow normal web traffic, however he/she didn't let through the spider traffic. They didn't read the manual.

I think it's an open and shut case as to who made the blunder:

The firewall's job is to prevent access to a network.

The Network administrator's job is to keep a network running smoothly.

mrMister




msg:1527082
 8:56 am on May 4, 2005 (gmt 0)

a robots.txt isnt quite an exploit

I can think of a number of successful web site break ins where the cracker started off by using information that a clueless webmaster stored in robots.txt

Have a read of this...

a security site-based in Estonia, has uncovered the elementary mistake in RIAA's robots.txt files which gave the crackers their back door.

[theregister.co.uk...]

The firewall protects against all possible network hacks. If robots.txt wasn't necessary for your situation, then it should have been disabled.

[edited by: ThomasB at 1:32 pm (utc) on May 4, 2005]
[edit reason] linked to original article [/edit]

Hanu




msg:1527083
 11:23 am on May 4, 2005 (gmt 0)

Ahh, another security frenzy!

Have a read of this...

a security site-based in Estonia, has uncovered the elementary mistake in RIAA's robots.txt files which gave the crackers their back door.

Don't stop there. Read on:

"This organization must be employing a blind webmaster if he did not figure out that this very passwordless admin module at www.thatsite.org/admin was used to deface the website. There was also no filtering to prevent uploading mp3 files through the PDF upload section. That would also explain how illegal mp3 music files appeared on this anti-piracy site," explained Holmes smugly.

Blocking any UA (spider or browser) that accesses robots.txt is surely not a solution. RTFM? Yes, but you should be able to assume that a high-end fireWall comes with reasonable default settings. Blocking anything that requests robots.txt is not reasonable. It's like cutting your phone line because you might receive prank calls.

Always remember the golden rule of IT security: Nothing is 100% secure. A secure system is one that has the right balance between prey value, attack effort, counter measure effort and usability impediments.

alpine




msg:1527084
 11:25 am on May 4, 2005 (gmt 0)

We've used Sonicwall firewalls--three different models over the years--with no spidering issues. We did not have to configure manually.

Perhaps this is a newer model? Sorry, but I am slightly skeptical.

EliteWeb




msg:1527085
 3:50 pm on May 4, 2005 (gmt 0)

True robots.txt leads to people putting in URLs that shouldn't be spidered, but the spiders wouldn't even be able to obey them.

Requesting v. number and how this got part of the security package from admin :)

mrMister




msg:1527086
 6:31 pm on May 4, 2005 (gmt 0)

Blocking any UA (spider or browser) that accesses robots.txt is surely not a solution.

I agree with you there. But its no different to many other things that some of these "top end" firewalls "protect" you against.

They have to justify the extreme cost of these devices somehow!

SEOMike




msg:1527087
 6:59 pm on May 4, 2005 (gmt 0)

This is pretty ridiculous. Anyone that is familiar with robots.txt should know what it does and why it's important. Maybe they should have had the firewall READ the robots.txt and block spiders from the directories that were identified in it. Think that would work?

mavherick




msg:1527088
 9:04 pm on May 4, 2005 (gmt 0)

Who's the target audience for your robots.txt? Well mannered bots right? We all know the bad bots won't listen anyway right? So I would consider that akin to banning users who visit your site map. Weird decision if you ask me.

carguy84




msg:1527089
 9:15 pm on May 4, 2005 (gmt 0)

OK, after talking to our SonicWALL guru, this ISN'T a service of all SonicWALLs, it's only a part of SonicWALLS with IPS(Intrusion Prevention Service) installed, which is an add-on service.

[sonicwall.com...]

Firewalls that do not have the service activated(it's a pay-for add-on) are not "blocking" spiders from accessing the robots.txt files. And it can be disabled, as noted above.

Chip-

mrMister




msg:1527090
 3:57 am on May 5, 2005 (gmt 0)

> [sonicwall.com...]

Whoaaaa! Check out all the marketing mumbo-jumbo and meaningless technobabble on that page!

Utilizing a configurable, high-performance deep packet inspection architecture

deep packets! WOW!

deep packet inspection engine

Hold on, a minute ago it was a deep packet architecture!

intelligent file-based virus and malicious code prevention

Intelligent eh? Would it pass the Turing test?

scanning packet payloads for worms

Packet payloads! Most people would scan the packets looking for payloads, but no, these guys are one step ahead of the game!

scanning in real-time for decompressed and compressed files containing viruses

This one makes me laugh. I can visualise computer-illiterate managers with their clipboards and pens jotting down notes from each of the firewall manufacturers web sites...

"ooh, this one supports decompressed files, none of the others say they support that, it must be good!"

EliteWeb




msg:1527091
 8:17 pm on May 5, 2005 (gmt 0)

Thanks carguy84,
Yes it is part of that package of the SonicWALL system.

moltar




msg:1527092
 1:33 am on May 10, 2005 (gmt 0)

Interestingly enough their own site is well spiered.

site:www.sonicwall.com shows 31 pages in Y and 4,390 in G :)

They also use robots.txt and I tried requesting it and didn't get banned.

artsky




msg:1527093
 3:16 pm on May 15, 2005 (gmt 0)

I'll check if our clients are experiencing this problem. we are currently selling these Sonicwall products.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved