Welcome to WebmasterWorld Guest from 54.156.76.187

Forum Moderators: goodroi

Message Too Old, No Replies

Strange 404's showing up in GSC

     
1:57 pm on Apr 5, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Apr 1, 2016
posts: 1863
votes: 470


For two days in a row now I am getting the same 404 error in GSC.

The errors are for the urls /km and for /mile. This coincides with the two positions of the toggle button on the main input form on my site. This toggle button is the first input on the form. This suggest to me that Google bot is trying to crawl the form but is not succeeding.

The site has 5 pages indexed. I have not submitted a site map as there are only three relevant pages to index, home page, two input forms. The other two pages are ones I submitted manually. One for each result page for each form. I submitted these so that Google could see the result without necessarily indexing an infinite number of potential outputs.

Clearly something is wrong here, or is it? How does one go about getting a dynamic website indexed? Do I need set something up in GSC under url parameters?
6:20 pm on Apr 7, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Apr 1, 2016
posts: 1863
votes: 470


Just as a side note, I think I figured out why Google was crawling /km and /mile urls event though they don't exist. On my page, the jQuery code has two vars that are set to these string values


var x = '/km'
var y = '/mile'


I think that Googlebot is mistaking these as urls. I changed the '/' to /'. If this was the problem, this will solve it.

This doesn't resolve the underlying issue of not being indexed but i guess it shows the naivety of the crawler.
7:23 pm on Apr 7, 2017 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11279
votes: 133


This is not a duplicate content issue.
...
The content is absolutely not faceted or fitlered.

i am not assuming your site has either of these issues or that you are doing anything "wrong".
i am suggesting that you step away far enough to think about how it might LOOK to an algorithm that discovers content typically through links, crawl urls on budget, indexes content, and ranks urls based on both content quality and user intent.

I haven't invented anything new. Fill in form -> submit -> get result on a result page. This seems like a pretty straight forward design pattern.

it's almost ideal for collecting information from visitors and presenting custom/personal/dynamic/premium content, none of which is typically interesting to googlebot.
googlebot wants to see what everyone else sees when they request a url and take no further action other than eventually clicking another link.

from a google perspective, it's still hard to imagine millions of pages of quality content.
for example will you have millions of unique titles?
7:31 pm on Apr 7, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:8167
votes: 609


Or g might recognize that a calculator producing results any calculator can do will not be a unique result for the visitor. Make the content on the form entry page more compelling and let that be your landing page. (Note, I wouldn't crawl such a site either, were I a search engine)
7:42 pm on Apr 7, 2017 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11279
votes: 133


i guess it shows the naivety of the crawler

i would call it voracious rather than naive:
"if that looks anything like a url or a path i'm going to note that and maybe get a taste later if i have time between bites"

"/km" fits the pattern of a relative url...
11:47 pm on Apr 7, 2017 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:11470
votes: 691


I don't pay much attention to the soft 404s in GSC. They are either typos from remote links, malformed database URLs from other SEs or directories or from your own generated parameters.

Regardless, they don't exist and Googlebot got the correct server response. End of story.
1:09 am on Apr 8, 2017 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11279
votes: 133


the soft 404s in GSC

"soft 404s" have a specific meaning that involve a 200 OK response rather than a 404 as in this case.

Official Google Webmaster Central Blog: Farewell to soft 404s:
https://webmasters.googleblog.com/2008/08/farewell-to-soft-404s.html [webmasters.googleblog.com]
Instead of returning a 404 response code for a non-existent URL, websites that serve "soft 404s" return a 200 response code.


more appropriate to this discussion - Do 404s hurt my site?:
https://webmasters.googleblog.com/2011/05/do-404s-hurt-my-site.html [webmasters.googleblog.com]
Q: Most of my 404s are for bizarro URLs that never existed on my site. What’s up with that? Where did they come from?
A: If Google finds a link somewhere on the web that points to a URL on your domain, it may try to crawl that link, whether any content actually exists there or not; and when it does, your server should return a 404 if there’s nothing there to find. These links could be caused by someone making a typo when linking to you, some type of misconfiguration (if the links are automatically generated, e.g. by a CMS), or by Google’s increased efforts to recognize and crawl links embedded in JavaScript or other embedded content; or they may be part of a quick check from our side to see how your server handles unknown URLs, to name just a few. If you see 404s reported in Webmaster Tools for URLs that don’t exist on your site, you can safely ignore them. We don’t know which URLs are important to you vs. which are supposed to 404, so we show you all the 404s we found on your site and let you decide which, if any, require your attention.
3:15 am on Apr 8, 2017 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:11470
votes: 691


Thanks phranque.

My point was, I ignore the GSC 404 report for the most part. I still look at every 404, but my attitude is - if the file does not exist, then Googlebot got the correct server response.

GSC presents that report as errors on *our* part and that we need to fix them ("mark these errors as fixed".) However most of the time this is not true. The source of the error is usually remote and has nothing to do with our properties. Roughly 95% of 404s in my GSC reports are user typos & malformed backlinks.

Occasionally (and this might be the case here) Googlebot misreads some dynamic parameter our systems are using. In this case it is prudent to investigate & if possible find a work-around that Googlebot can live with. Sometimes the GSC URL Parameters tool can be useful.
5:38 am on Apr 8, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14709
votes: 613


I think your fingers just typed “soft 404” in place of “404” alone, because the phrase comes up so often.
5:49 am on Apr 8, 2017 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:11470
votes: 691


Likely supposition
1:40 pm on Apr 24, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Apr 1, 2016
posts:1863
votes: 470


I just wanted to give an update here as I found a solution to my problem.

At the same time as posting the question here, I posted a similar question in the Google Webmaster Help Forum. There I was given some advice that seems to have worked.

The fundamental issue was that there was no way to access any of the content without passing through the form. And, Googlebot can crawl a form, but generally it doesn't unless I guess you through a bunch of links at the content behind the form first.

So at first I assumed that providing a sitemap would be sufficient to get Google to crawl the content. At least with the sitemap Googlebot could find the pages. But again nothing.

So it was suggested that I add internal links to some of the content. At first this seemed a little counter intuitive (at least to me!) as this would amount to create content specifically and only for Googlebot. The site consists of two calculator tools where users enter a series of inputs and a result is returned. So to create links to random results pages seemed at first not create much value. But with a bit of creative thinking I decided to publish some of the data in a tabular format, so users could see the results for a series of inputs all on a single page. Then each result in the table, which is just a number, is shown as a link to the specific results page. The result page provides a lot more content than just a number. As such the user gets an overview, and an understanding of the goal of calculator and can access the result directly from the table (as can Googlebot).

From the home page, I linked to these pages. There are two, one per calculator. I submitted the pages to the index and bingo! Those pages are indexed, and they show up in a site: search. And most important, the target keywords are now appearing the search analytics reports. I have got a long way to go before they generate any traffic but that is a whole other issue.
This 40 message thread spans 2 pages: 40
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members