Anyone else get this problem in search console? I still have a bunch of parameter urls as of today (may 2022) in S.C. despite the fact that "Googe parameter tool" was supposed to stop working end of April 2022.
Art
not2easy
7:43 pm on May 16, 2022 (gmt 0)
The parameter tool's deprecation has nothing to do with what you are seeing. That tool was supposed to be used to tell Google about URL parameters you did or did not want indexed or crawled. If you are having problems with URL parameters in GSC, you can learn how to handle them. A good article is at Search Engine Journal: https://www.searchenginejournal.com/technical-seo/url-parameter-handling/
Arturo99
8:11 pm on May 16, 2022 (gmt 0)
thanks.. Thats a great link.
Arturo99
9:09 am on May 17, 2022 (gmt 0)
Boils down to this I think: I need to stop Google crawling parameter strings in my site.
But I don't see how you do this in Search Console www.example.com/widget1?qs=1&productenquiry=1 I have 100 of them, all pointing to a 404
robzilla
9:56 am on May 17, 2022 (gmt 0)
You can only stop them from crawling those URLs through robots.txt, although if those URLs return a 404 (make sure they actually return the 404 status code, not just an error page) they'll stop crawling them eventually anyway.
There's no way to do this from Search Console. Frankly I would just ignore it (once I've confirmed the 404 status code).
not2easy
5:17 pm on May 17, 2022 (gmt 0)
In robots.txt you can add a line like
Disallow: /*?qs=
but that only covers the kind of URL you shared. If they all start with www.example.com/widget1?qs= that will prevent crawling all of them. If you have other formats you may need other rules too.
Arturo99
6:20 pm on May 17, 2022 (gmt 0)
well an independent crawler (not google) is picking up these parameter strings www.example.com/widget1?qs=1&productenquiry=1 100 of them I don't uderstand how they are being generated by the site. It is as if every search carried out in the site for a product, generates a paramater url, which is then indexed by crawlers. So 100 such searches have created 100 parameter urls which then show up as 100 x 404s in search console
not2easy
6:53 pm on May 17, 2022 (gmt 0)
404s don't matter, especially if they come from some other scraper site. Not to worry if Google isn't finding your parameters. Somebody is trying to pretend they matter to your site but the 404s ensure they do not matter at all. Google will end them when they observe it for a time, not your problem.
Arturo99
9:32 am on May 18, 2022 (gmt 0)
Can you help me understand why 404s keep getting generated from my search box? And every time a search is carried out, somehow google gets the url and adds to my list of urls. which is then designated a 404 , is nonidexed and leads nowhere?
I don't know if Google is doing a test search itself, or a visitor is carrying out the search.
not2easy
12:13 pm on May 18, 2022 (gmt 0)
Google occasionally does silly inquiries just to see how a site handles non-existent URLs. 404 is the right answer.
Hundreds of them is more likely a scripted bot and should be investigated in your access logs to make it stop if possible - only to save your server from wasted effort. IF there is no evidence in the logs of any of the searches, then they are making it up by substituting your domain name into a list of nonsense searches.
Again, if the URLs do not exist and your site returns a 404, that is what is supposed to happen, it can be ignored.
Arturo99
4:44 pm on May 18, 2022 (gmt 0)
Thank you for explaining this Art
Brett_Tabke
3:23 pm on May 30, 2022 (gmt 0)
Also, poke around your site and see if a log file might be visible to the outside world. If if is, and Google crawls it - they have all the urls that have been hit on your site (even the 404's). Searching for log files on a competitors site, is a classic grey hatters trick to learn what your competitors are up too.
Arturo99
4:26 pm on May 30, 2022 (gmt 0)
what danage could they do? Do you mean get google to keep crawling old 404s?
Brett_Tabke
2:38 am on Jun 25, 2022 (gmt 0)
Well, if your logs are open, then they can dl your logs and see how much traffic you have and where some of that traffic comes from.
No, I wouldn't want G crawling 404's.
Sgt_Kickaxe
3:49 pm on Jan 3, 2023 (gmt 0)
The only 404's you need to worry about are the ones your pages actually link to. You should fix those, or remove the link.
Martin Potter
7:11 pm on Jan 3, 2023 (gmt 0)
A vaguely related question -- Does this line in an .htaccess file return a proper 404 server error before executing the php file, or after execution, or not at all?
ErrorDocument 404 /404.php
Thanks.
phranque
8:54 pm on Jan 3, 2023 (gmt 0)
before. that directive does not get invoked unless there is already a 404 status code.
Martin Potter
8:48 pm on Jan 5, 2023 (gmt 0)
@phranque , thank you! I am grateful that there are people here, like yourself, who understand the inner workings of these things. Thanks again.