Forum Moderators: phranque

Message Too Old, No Replies

Server Error: Traffic increased by incredible amount

fast help needed

         

globay

9:45 am on Oct 18, 2003 (gmt 0)

10+ Year Member



I hope somebody can help me with this problem: I have integrated via amazon webservices a lot of their products on my websites, that is currently on a shared hosting server. I don't have a huge amount of visitors, but enough to be happy. Then two days ago, I checked my site with the Lynx text browser for the first time, but instead of the expected content, I found a lot of error messages, produced by the file, that converts the xml data into php variables. The same page seemed to be fine in the MIE so I decided to worry about this problem some time later.

When I checked my server stats a day later, I could not beleive what I just saw, because the amount of kb transferred was more than 100 (!) times larger than on an average day, though page impressions and visitors had remained the same. A closer look at the log files showed, that instead of the usual page size of 10kb, the amoung had risen up to 64 000 000 kb. But in no way, I can think of somebody downloading 64 mb of error messages, or what ever it was.

When I checked the sites with the Lynx browser again, everything seemed to be ok, so I contacted the server support, and they replied, that the traffic amaount was calculated by Apache, and thus is correct.

I have not changed anything on the site in weeks, and yesterdays traffic is back to normal, but still I need to know fast, what happened, because such an amount of traffic is going to cost my quite a lot.

Thank you for you help!

BlueSky

11:25 am on Oct 18, 2003 (gmt 0)

10+ Year Member



Check your raw logs -- the answer lies there. Probably some sort of bot activity.

Since you say page impressions and visitors remained the same, my guess is it could be regular spiders if they don't count against those two stats, someone may have downloaded your images (or other files not counting under impressions like maybe js), or perhaps hotlinking. Someone may have sucked down your site but the page impressions overall should have gone way up. Don't discount error messages. When I caught one bot and fed him 403's, he pounded my site for a solid hour. Luckily, my 403 pages are very small, so it didn't amount to much bandwidth. Check your raw logs!

globay

11:49 am on Oct 18, 2003 (gmt 0)

10+ Year Member



Yes, I did take a look at the raw logs, and indeed, there was some spider activity, but not more than usual. The only difference was, that the spiders got served huge pages. The problem is, that not only spiders got served these big pages, but everybody did:

[IP][Time] "GET [page] HTTP/1.0" 200 8006002 "http://search.yahoo.com/search?[Query]" "Mozilla/4.0[...]"

This is an example line from my raw logs, and there are many more. Note the 8006002!

When I now go to [page], I get the page that is supposed to be there, with a filesize of about 10kb.

storevalley

12:03 pm on Oct 18, 2003 (gmt 0)

10+ Year Member



... currently on a shared hosting server ...

Have you discussed the problem with your hosting company? Shared hosting accounts are normally reasonably well supported.

globay

12:19 pm on Oct 18, 2003 (gmt 0)

10+ Year Member



Yes, I have contacted the server support, but so far they only said, that I had an elevated amount of page requests (which is not true), and that apache calculated the amount of traffic, and apache is always right.

Yidaki

12:30 pm on Oct 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Are you sure your amazon xml >to> php script works as expected? Sure there are no infinite loops or something? Did you test the scripts before embedding them into your pages? Do they retrieve data based on dynamic queries (user input)? If so do you limit the data you retrieve from amazon?

Is it possible that somebody hacked your amazon scripts and does requests that are not logged?

globay

12:57 pm on Oct 18, 2003 (gmt 0)

10+ Year Member



I used a PHP script especially designed to 'read' XML, that somebody recommended to me. I think it does not contain any infinite loops, since it always worked perfectly.

Right now, I have three hosting accounts at that hosting company, with each containing the very same shop system and files, just different data. Two of them are in English and are hosted on the same server. Both accounts showed an extreme elevation in traffic. The third account contains the German Amazon Data and is hosted at a different server. I did not encounter any problems there.

I can think of two possible causes so far:
- Amazon changed the xml format temporarily for the US Datafeed, which somehow caused the the xml script to produce lots of errors. Sometime later, they changed it back. This does not explain, why only some of the page requests at that time produced errors.

- The server (PHP, Apache, MySQL, or whatever) broke down, and was no longer able to process the requests without errors. Unfortunately, I know only little about web servers, so I can not judge, how likely this is. Just over a year ago, I had a similar problem with another account at that hosting company, where I had a sudden increase in traffic, just one day. At that time, I did not investigate, because it was the last day of the month, and I did not exceed the traffic limits. The page used just very little code at that time, and for sure had no infinite loops.

globay

1:05 pm on Oct 18, 2003 (gmt 0)

10+ Year Member



Yes, the site retreives data from user input, but that input is either a valid product ID, and thus retreives the product information, or it is an invalid ID and I get an error message. Either way, there are no problems.

The site has been running for a few months now, and there were no mayor changes during that time. I am not sure how to determine, if the site was hacked, but my FTP programm shows, that the files were not edited during the last two weeks.

There is no limit for data output on my side, butI don't understand, why the server did not stop sending these error messages, thus producing files of several MB.

Yidaki

1:17 pm on Oct 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>I used a PHP script especially designed to 'read' XML

Hm, so it's not especially designed to read AMAZON's xml?
Me personally i would feel safe then ...

Does it retrieve and display reviews too? Guessing here: could it be that the review feature has been spammed at amazon's site and therefor your script loaded tons of (spam)reviews? After a while amazon noticed the spam and removed all reviews for the affected products ...!?

globay

1:32 pm on Oct 18, 2003 (gmt 0)

10+ Year Member



I think I found the cause for the problems: Amazon.com introduced a new version of its webservices, at about the time the first error messages appeared. As I read in their forum, a lot of their users complained about errors and unreliable behavior, though nobody encountered the same problems as I did. After a few hours their engineers were able to fix the problem partly, so the amount of error messages has been reduced.

Thank you everybody for your help. I will now talk to Amazon and my host to resolve the last problems!

Yidaki

1:34 pm on Oct 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Cool that you now know what the problem caused!

>Amazon.com introduced a new version of its webservices

Would you mind explaining shortly what the changes were?

BlueSky

1:44 pm on Oct 18, 2003 (gmt 0)

10+ Year Member



Oh, okay. Nice to see you found out what it was. You might want to think about modding the script to account for such a case again.

globay

1:47 pm on Oct 18, 2003 (gmt 0)

10+ Year Member



Sure, no problem:

- Encoding of the umlaut character (typically found in German language text) has been fixed.
- Encoding of some Japanese characters has been fixed.
- Availability messages for items in the German and Japanese locales are now returned in the proper language.
- A more appropriate SOAP fault string is now returned in case of an error on a BlendedSearch.
- The TextStreamSearch and BlendedSearch calls no longer return an error message no results are found for the search.
- The results from a BlendedSearch now include a RelevanceRank value for each ProductLine element. An application can sort the results using this value in order to produce results that more closely resemble those found on the Amazon web site.
- The TextStreamSearch has been tuned to return results that are more relevant to the given text.

I don't know the details yet, since I just discovered this. But I'll tell you when I know more.

globay

1:50 pm on Oct 18, 2003 (gmt 0)

10+ Year Member



You might want to think about modding the script to account for such a case again.

Yes I definately will ;)

I was just about to implement a caching system to serve the pages faster and more reliable, and this time with a better xml parser.