Forum Moderators: phranque
When I checked my server stats a day later, I could not beleive what I just saw, because the amount of kb transferred was more than 100 (!) times larger than on an average day, though page impressions and visitors had remained the same. A closer look at the log files showed, that instead of the usual page size of 10kb, the amoung had risen up to 64 000 000 kb. But in no way, I can think of somebody downloading 64 mb of error messages, or what ever it was.
When I checked the sites with the Lynx browser again, everything seemed to be ok, so I contacted the server support, and they replied, that the traffic amaount was calculated by Apache, and thus is correct.
I have not changed anything on the site in weeks, and yesterdays traffic is back to normal, but still I need to know fast, what happened, because such an amount of traffic is going to cost my quite a lot.
Thank you for you help!
Since you say page impressions and visitors remained the same, my guess is it could be regular spiders if they don't count against those two stats, someone may have downloaded your images (or other files not counting under impressions like maybe js), or perhaps hotlinking. Someone may have sucked down your site but the page impressions overall should have gone way up. Don't discount error messages. When I caught one bot and fed him 403's, he pounded my site for a solid hour. Luckily, my 403 pages are very small, so it didn't amount to much bandwidth. Check your raw logs!
[IP][Time] "GET [page] HTTP/1.0" 200 8006002 "http://search.yahoo.com/search?[Query]" "Mozilla/4.0[...]"
This is an example line from my raw logs, and there are many more. Note the 8006002!
When I now go to [page], I get the page that is supposed to be there, with a filesize of about 10kb.
Is it possible that somebody hacked your amazon scripts and does requests that are not logged?
Right now, I have three hosting accounts at that hosting company, with each containing the very same shop system and files, just different data. Two of them are in English and are hosted on the same server. Both accounts showed an extreme elevation in traffic. The third account contains the German Amazon Data and is hosted at a different server. I did not encounter any problems there.
I can think of two possible causes so far:
- Amazon changed the xml format temporarily for the US Datafeed, which somehow caused the the xml script to produce lots of errors. Sometime later, they changed it back. This does not explain, why only some of the page requests at that time produced errors.
- The server (PHP, Apache, MySQL, or whatever) broke down, and was no longer able to process the requests without errors. Unfortunately, I know only little about web servers, so I can not judge, how likely this is. Just over a year ago, I had a similar problem with another account at that hosting company, where I had a sudden increase in traffic, just one day. At that time, I did not investigate, because it was the last day of the month, and I did not exceed the traffic limits. The page used just very little code at that time, and for sure had no infinite loops.
The site has been running for a few months now, and there were no mayor changes during that time. I am not sure how to determine, if the site was hacked, but my FTP programm shows, that the files were not edited during the last two weeks.
There is no limit for data output on my side, butI don't understand, why the server did not stop sending these error messages, thus producing files of several MB.
Hm, so it's not especially designed to read AMAZON's xml?
Me personally i would feel safe then ...
Does it retrieve and display reviews too? Guessing here: could it be that the review feature has been spammed at amazon's site and therefor your script loaded tons of (spam)reviews? After a while amazon noticed the spam and removed all reviews for the affected products ...!?
Thank you everybody for your help. I will now talk to Amazon and my host to resolve the last problems!
- Encoding of the umlaut character (typically found in German language text) has been fixed.
- Encoding of some Japanese characters has been fixed.
- Availability messages for items in the German and Japanese locales are now returned in the proper language.
- A more appropriate SOAP fault string is now returned in case of an error on a BlendedSearch.
- The TextStreamSearch and BlendedSearch calls no longer return an error message no results are found for the search.
- The results from a BlendedSearch now include a RelevanceRank value for each ProductLine element. An application can sort the results using this value in order to produce results that more closely resemble those found on the Amazon web site.
- The TextStreamSearch has been tuned to return results that are more relevant to the given text.
I don't know the details yet, since I just discovered this. But I'll tell you when I know more.