Forum Moderators: Robert Charlton & goodroi
Searching Google for the query string shows thousands of results for different sites (root files, internal pages (dynamic and static), PDFs) but no clue as to where this comes from, nor if I just search for "ArdSI"
Any ideas?
Trying to find the link is probably not worth the time since you just need to redirect and remove the query_string to fix any issue. I wouldn't worry about where it came from as much as how to get rid of it and keep it from happening again.
The only result in Google I can find for it (other than other sites where it is in the url) is this thread.
Searching for ArdSI reveals nothing.
Is this a Google glitch? Some other search site or aggregator? A scraper?
Personally, I don't worry as much about the why as I do about how to fix and keep it from happening again, because I feel it's better use of my time... The preceding is only one of the possible causes, and I don't know if that's what it is or not.
As this is a static site and the first occurrence I've just added the query string to robots.txt then I can use WMT url removal tool.
Should work fine if it's just this one, but personally, I would recommend using mod_rewrite and redirecting all requests with query_strings to the same location without a query_string so it doesn't keep happening...
RewriteEngine on
RewriteCond %{QUERY_STRING} ^.+
RewriteRule ^(.*)$ http://www.example.com/$1? [R=301,L]
Adding a ? with nothing after it removes the query_string.
I'm not so sure I like the idea of 301-ing anything anyone might throw at a url. I mean, that htaccess effectively says "Yes, this is part of my site". I'd rather serve a 404 which says "Not me".
I'm not so sure I like the idea of 301-ing anything anyone might throw at a url.
All you're doing is removing any query_string they might put there. Not redirecting any requested location. IMO it's actually good practice to not allow non-existent query_strings to be sent.
I'd rather serve a 404 which says "Not me".
Then you'll have to change your site to dynamic rather than static and use a scripting language to serve the 404... Try typing ?anything=whatever-you-want on any of your URLs and you'll see what I mean... A query_string is technically not part of the location requested, but rather information passed to the script at the location requested, which means no query_string on your site will serve a 404 if the location requested (/page.html) from your server is a valid resource.
The code doesn't do anything to change the location /page.html it just removes the query_string. A request for /missing-page.html will still return a 404 just like you want, and by not removing the query_string you are actually sending a stronger signal the location with the query_string is part of your site than you are by removing it, because if you don't remove it the visitor receives a 200 OK for a request with any query_string, which means it is part of your site...
Personally, I'd rather show visitors (including search bots) the information they were looking for at the location requested with a query_string by serving them the resource (page) without the query_string than show them a 404 because someone linked to a page and included a query_string, especially when I know exactly what page they were looking for...