Forum Moderators: phranque
They access a page that should be mysite.com/mypage.php?code=#*$!
(code is a widget id)
They replace #*$! with another url. The page doesnt redirect to them (as far as I know) but displays a relatively blank page - should I be concered- can it do me any harm?
Today as an extra twist, I've seen the attached url ends with: Please_click_on_my_google_adds
Fortunately, I have my adsense set up to only display if the widget description on any particular page is over a certain length, so the page that results from the above does not display adsense.
I would like to block this unwanted access to my site but the ip used varies (not even in the same block)- I assume I can do this with htaccess but not sure how.
should I be concerned?
I'm going to say . . probably. Others will have better contributions, but...
They replace #*$! with another url. - can it do me any harm?
This is going to depend on the content of the "other URL." What does mypage.php do, does it open a database, are there abilities to access system components? Does this URL contain anything like "or 1=1" or possible email injection content? How secure is myscript.php, does it filter input data?
I've seen the attached url ends with: Please_click_on_my_google_adds
I am "presuming" you are displaying adSense on your site, is this correct? As you (should) know any requests to click a site's ads via forum posts or other inquiries is strictly against adSense policy and grounds for termination of the account. My concern would be that this is possibly a competitor or the like, hoping these requests get logged somewhere and the logs get picked up by Google, in which case your adSense account would get suspended.
Don't let my ramblings alarm you, as I could be completely wrong, but this would be the only real reason I could see for sending your site such a request.
The variable in question is sanitised befor it used in a database query so I am hopeful it cannont harm the database.
Yes I do run adsense on the site but it doesn't show unless the widget description is over a certain size - so will not show with the worrying urls.
The urls do not show on the page, only in the address bar - but then anyone can type what they want into the address bar.
I would be interested to know if there is a way to totally block this unwanted access
If the attempts don't pass through your script and are not logged publicly, you may not have anything to worry about.
If you don't know, right about now is the time to be installing the Live HTTP Headers extension for Firefox and finding out.
If it is '200 OK' then you are in deep deep trouble, because your site returns Infinite Duplicate Content, albeit with a 'relatively blank template page'.
If it is '404 Not Found' then there is little to worry about. You can leave it like that, or set up a 301 redirect to capture those requests and redirect the user to a valid URL that will return real content.
If it is '200 OK' then you have a range of options to fix it. One fix is to set up a 301 redirect to the corrrect URL, either by using a RewriteRule in the .htaccess file or by altering the PHP script, and the other fix is for the script to return a 404 HTTP Header to the browser. If none of the real URLs use parameters, then this can be more efficiently done in the .htaccess file instead.
If it is '200 OK' then you are in deep deep trouble, because your site returns Infinite Duplicate Content, albeit with a 'relatively blank template page'.
What's "infinite duplicate content"? Why would it put someone in deep deep trouble? And how would an error page be considered "duplicate content" by anyone? It seems to me that any website with some programming behind it could potentially generate an infinite number of "duplicate" pages that say "invalid request". I can't see how that can be anything but ok.
Try doing a search for "duplicate content." One of the more interesting thread titles you'll find here is "Duplicate content - Get it right or perish."
A bit of research will reveal that search engines take steps to identify "infinite URL-spaces" on sites. If they discover that requests for every or most URLs return a 200-OK, then they will arbitrarily limit the number of URLs that they crawl on your site as a policy of self-preservation and fairness. They will also limit the number of your URLs that they display in search results, primarily to keep their indexes free of junk, and likely give you several demerits on your site's "quality score" as well.
If you have any concern for usability and retention of off-site referral traffic, then the page content returned with the 404 should explain that the requested resource was not found or does not exist, and offer text links to your home page, major category pages, major site sub-sections, and your site search facility, as applicable.
Run a tight ship.
Jim
If it is you that needs this level of information, then in addition to the normal 404 response, you can add a 'debug switch' to the code to temporarily enable outputting diagnostic information on the page.
Jim
Yes it was giving 200 response.
After some testing have found exactly what I need to do
This may explain a worry I had with another site where in Google WMT, every now and again the crawl stats show them crawling far more pages than they should.
Jim - as an alternative to the debug switch, I have some code on my 404 page that sends info to me in an email.
I have also put this on a site I monitor for a client to help me keep an eye on things.
On my site the ending url and the heading text sometimes match but the heading text is always pulled from the DB so messing with different URL parameter text is only going to cause 404's not different text on the page.
So now the question is, who and why does someone want to get you in trouble with adsense?
You can make some crazy stuff up and the website will parrot it back, with products seemingly categorised a long way from their proper location ([not real examples] e.g. watches in the underwear section, socks in the kitchen section, electric saws in the shoes section, and so on).