Forum Moderators: coopster

Message Too Old, No Replies

Excessive jump in traffic to my error page

         

puckparches

5:58 am on Sep 17, 2018 (gmt 0)

10+ Year Member Top Contributors Of The Month



I'm not sure if this is the correct forum, but here is the question:

Pages to my website have the following address:

https://www.example.com/category/page1.html
https://www.example.com/category/page2.html


If part of the link is missing: (e.g. https://www.example.com/category/) visitors get redirected to the error page.

Checking my Google Analytic, I notice an excessive jump in traffic to my error page, and it shows that the previous pages are for example:

https://www.example.com/category/disch.com
https://www.example.com/category/nbc.com
https://www.example.com/category/espn.com


Why am I getting those links? nowhere on my site have those links.

Should I make any changes to the .htaccess to redirect somewhere else?

Any ideas what is the problem and how to resolve it will be much appreciated.

Thank you

[edited by: phranque at 9:39 am (utc) on Sep 17, 2018]
[edit reason] exemplified domain [/edit]

keyplyr

6:44 am on Sep 17, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi puckparches,

Most likely a bot script misfiring bad paths. It's not folliwing links, so there's nothing ti correct on your end. Happens all the time.

TorontoBoy

2:35 pm on Sep 17, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



These are possibly probes to look for software vulnerabilities. This is called reconnaissance. They test your site for specific vulnerable software packages before they try a hack. This initial probing will give you a lot of 404s. I use this as a good indicator of someone scouting my site and then ban them. This is very common.

I recommend looking at your raw access log, find the IPs doing the probe and ban them.

lucy24

4:08 pm on Sep 17, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Checking my Google Analytic, I notice an excessive jump in traffic
This will sound like a meaningless quibble, but it's important: Analytics (GA or third-party) does not show traffic. It shows requests for the analytics file. Only your server access logs will show actual requests for pages.

puckparches

2:11 am on Sep 18, 2018 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thank you for all the information guys, really help.

puckparches

3:48 am on Oct 11, 2018 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hello guys

Back again, with the same problem. After many attempts, I still can't figure out how to stop this. I check raw logs and besides a 2-3 IP excessively htting the site, I still can't fix the problem. I have already ban like 20 IP's, that according to the raw files are the IP visiting those pages, but there are hundreds more IP visiting those pages. Could those bots come from many IPs? any other ideas to help stop this?

Thanks

not2easy

4:06 am on Oct 11, 2018 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Bot nets can be running scripts on compromised machines so that they may appear to be coming from 2 or 3 hits on one IP and continue on another - and another. I would try to find them in your access logs, to looks for a common UA or other identifying part of the request (referer?) to cut back on it. See whether the IPs are actual ISPs or hosting servers, if the latter, block the entire CIDR rather than individual IPs. This is especially likely when the IPs differ little, such as only the c.d. (of a.b.c.d) or sometimes just the last digit of the IP.

If the IPs are widely spread, then it is more likely to find residential ISP IPs from malware on computers whose owners are not aware their machine has ever visited your domain.

keyplyr

4:10 am on Oct 11, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Block the server farms, ignore the hits from the ISPs.

Read this:
Blocking Methods [webmasterworld.com]

lucy24

4:22 am on Oct 11, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Not sure how I missed this in the first place, but:
If part of the link is missing ... visitors get redirected to the error page.
Do you mean literally redirected, i.e. a 300-class response telling them to make a fresh request for the error page? Or did you really mean that they get a 404 response, and you've got a particular page in place for those 404s?

puckparches

5:22 am on Oct 11, 2018 (gmt 0)

10+ Year Member Top Contributors Of The Month



Do you mean literally redirected, i.e. a 300-class response telling them to make a fresh request for the error page? Or did you really mean that they get a 404 response, and you've got a particular page in place for those 404s?


Let me see if I can explain this better:

If I get the following:
https://www.example.com/main-category/sub-category

.htaccess send the request to:
https://www.example.com?main-category.php=sub-category

Then main-category.php searches the database for "sub-category". If "sub-category" is missing or is not found in the database, then it redirects to
https://www.example.com/error.php


I'm not an expert, I'm I doing something wrong there?

not2easy

6:02 am on Oct 11, 2018 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



The server response for a resource that is not found/does not exist should be a 404 to indicate that no such URL exists. If you are redirecting it to an error page, the server may be responding with either a 'permanently moved' 301 response or a 302 'temporarily moved' response. Either response could be considered a 'soft 404' by Google because the requested page was not served and no error document was served - unless of course, your error.php serves a 404 header.

Many of us include htaccess code to direct visitors to a custom error document:
ErrorDocument 404 /404.php 
which could just as easily be named error.php - it just lets the server supply the 404 header and serve a nicer page than most default 404 pages. Google does not like 'soft 404's but 404s are a normal occurrence.

puckparches

4:04 pm on Oct 11, 2018 (gmt 0)

10+ Year Member Top Contributors Of The Month



@not2easy Thank you!

I already had this:
ErrorDocument 404 https://www.example.com/error.php

I think the high jump to my error page is due to the bots hitting my site. My biggest problem is no much the error page traffic, but how to stop the bots. Is it possible to stop the bots hitting my site? if I can't stop them, will they ever stop?

not2easy

5:10 pm on Oct 11, 2018 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



There is an error in your implementation - the full URL for the page should not be used, only the root relative path of /error.php

I did that myself many years ago, using the full URL to the page. Visitors may end up on the error page, but without the corresponding 404 server response. That 404 would eventually end the request as soon as the source noticed their 404s. As long as they get a non-404 response they may well continue.

If UAs, IPs or referers don't supply a realistic blocking method, they can't be blocked. I would suggest that editing your ErrorDocument 404 directive to simply /error.php would be a help in ending the requests by ensuring the 404 response.

puckparches

5:21 pm on Oct 11, 2018 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thank you very much for your help! I hope that takes care of the problem or at least decrease.

lucy24

5:35 pm on Oct 11, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I already had this:
Well, that explains it. If your ErrorDocument directive--for any error--includes the protocol and hostname instead of starting with bare / then errors will not receive a 404 (or whatever number) response. They'll get a 302 redirect to the new URL.

This is in addition to and separate from the php code you described, which also sounds as if it's returning a Redirect response rather than a simple 404.

If I remember rightly, to return a 404 in php (it's been a few years since I last had to do it) you need to do two separate things: send the numerical response code, and also Include the content of the 404 page. Robots--including search engines--obviously don't care, but humans will be alarmed if they don't see anything on their screen.

not2easy

7:35 pm on Oct 11, 2018 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



A php sitemap creator I use has some lines like:
 {
header("HTTP/1.0 404 Not Found");
print "<h1>404 Not Found</h1>";
exit();
}
as part of the process so that it doesn't just pretend to create a sitemap when there's nothing there.

lucy24

8:24 pm on Oct 11, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I pawed through some ancient directories and found this, which I undoubtedly cribbed from someone else, possibly on this very site:
if (function_exists('http_response_code'))
{ http_response_code(404); }
else
{ header($_SERVER['SERVER_PROTOCOL'] . " 404 Not Found"); }
include ($_SERVER['DOCUMENT_ROOT'] . "/missing.html");

puckparches

9:22 pm on Oct 11, 2018 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thank you for the information!