Forum Moderators: phranque
I run a web-based newsletter, so I get feedback that I wouldn't get if I weren't in contact with my visitors. I suspect this problem affects my other sites as well, but visitors of others sites (not newsletters, message boards or otherwise membership-based sites) are not likely to contact me if the site is down. They would simply move on.
What makes the problem really strange is that people who report they can't access that site, say they could access it in the page, and are usually able to access it later on. They all have different ISPs and different browsers.
I can't find anything in common among those who e-mail me and say they couldn't open the page.
I'm not able to replicate/catch any problems myself. I've also tried setting up a script to hit that site from another location every couple of minutes and log the results. I've tried uptime monitoring services, and nothing shows any problems.
Yet, I keep getting an e-mail from someone who says they can't access the site and read the newsletter at least once every other week.
I've checked everything from dns to web server to application server to database to firewall, and everything seems fine.
Does anyone have any idea what I could try to find out what the problem is?
At first, I kept dismissing such e-mail and simply told people to try later, but it's been like that for a year now, and if I do lose some visitors because something is wrong with that box, then I would like to know what it is and fix it.
Any ideas?
Are you doing anything cookie based? Any code that relies on cookies?
Another common problem could be invalid HTML markup that certain browsers are just not showing.
The best way to debug this is to somehow work directly with the people having the problem and trouble shooting various things. You might have to call them, or even invest in a WebEx type of session so you can see and control their screens.
I have a secure site that people can see, but when they login, it kicks them back out. This was caused by a Microsoft bug in the Content Advisor. When that was enabled, it blocked Secure Cookie-based sites.
I've even tried running a script to do dns lookups every minute and log errors. It ran fine.
Really weird.
As far as asking them to help troubleshoot, I doubt I would get much help from them. Imagine if someone asks you to install some software so they can "see" your desktop and what's going on?
I doubt someone would agree to do it.
[edited by: bcc1234 at 12:26 am (utc) on Jan. 6, 2007]
Have you tried using one of the 3rd party DNS checker sites that will come in from outside in the same way that a user would, eg dnsreport.com?
If it really isn't DNS then your hosting company may have terrible routing problems; get a few of your users that see the problems to run tracert/traceroute to your Web server.
Rgds
Damon
I just e-mailed one of them asking if they get an error right away or if it takes a bit of time while "loading" before they see that error. Maybe that would clue me in a bit.
I tried checking DNS from other locations. From my own boxes elsewhere and by using various on-line services.
A DNS issue will be obscured by the fact that once a good record is cached it'll last for your TTL, eg maybe a day or more, so it'll 'just work' for a bit.
So, as I'm sure you'll be doing, try to clear any caches and avoid going through any intermediate DNS resolvers before repeating each test.
Rgds
Damon
If you're doing that (and that puts you in the "power user" category), then the cacheing problem isn't confusing *you*, but it may still be muddying the symptoms seen and reported by your users.
But yes, sounds more like a routing problem.
Time to try as many 3rd-party traceroutes (there must be dozens of sites out there) from as many corners of the planet as possible!
I wonder if you could register your host/router/server with [internettrafficreport.com...] to have it do some free monitoring for you: I did it for one of my dodgy connections!
Rgds
Damon
but it may still be muddying the symptoms seen and reported by your users
In what kind of scenario? I can't think of anything.
If servers authoritative of the com zone list my server as authority for mydomain.com zone, and my server returns A for www.mydomain.com -- how can caching bring in any problems? Assuming I don't have any weirdness in my zone like retry time greater than refresh, not that it would matter.
As far as routing problem, EV1 is not the smallest company in the world. I'm sure if they were having problems routing with their peers, it would be known.
I wonder if you could register your host/router/server with [internettrafficreport.com...] to have it do some free monitoring for you: I did it for one of my dodgy connections!
I doubt that. I'm way to small to be on that map :)
The ones I can remember right now have AOL, wmconnect, verizon.
Some say that they were able to access the site in the past (read previous issues), but can't access it "now".
For some, the problem seems momentary. When I ask them to try again, they say that now (usually the next day or several hours later) everything is working.
Some say they hadn't been able to access the site for a "very long time". I actually had a few people ask me to remove them from the list because they can't see the newsletters when they click the link.
Yet others, say they can't access it completely -- never could from the start.
This is really driving me crazy. I can't find anything similar among the people that complain. And I'm just afraid to think that for each one that actually e-mailed me there are many more who didn't.
Find a EV1 sensible techie, not some first-line droid who just wants to get you off the line.
Explain the issue as you have to us, and ask them what they might suggest so that you can demonstrate that it is or isn't a DNS or routing issue, and whatever other troubleshooting they suggest.
Point them at this thread!
Rgds
Damon
[edited by: DamonHD at 11:47 am (utc) on Jan. 7, 2007]