| 6:51 pm on Dec 11, 2012 (gmt 0)|
A broken link is one where the status code in the HTTP header is "404". A direct link to a custom error page would show a "200" status. So a direct link to a custom 404 error message page is NOT a broken link.
| 7:06 pm on Dec 11, 2012 (gmt 0)|
What tedster said, with the exception of when the person who built the custom 404 error page built it to guard against an .htaccess 'oops' (EG ErrorDoc http://example.com/404.php) and serves a 404 Not Found via server-side scripting which will override the 200 OK status the server would normally send, in which case a direct link to a custom error page would still be considered a broken link.
Status code makes the determination in most cases, so if the end result of the link results in the user-agent receiving a 404 Not Found status code, then the link is considered broken.
An exception to the status code being the determining factor is where a number of URLs end in exactly the same content (or no content), which is not a direct replacement for the original via redirect. This type of situation can cause the pages to be considered a 'soft 404' by search engines and would likely end up with them treating the links to the pages as broken.
There is really 'no such thing' as a totally blank page, because a properly functioning server will send a status code for every URL requested, so whether content is served to the browser or not doesn't really matter as much as the status code received (or derived as in the case of 'soft 404s') by the user-agent.
| 12:17 am on Dec 12, 2012 (gmt 0)|
|There is really 'no such thing' as a totally blank page |
There speaks someone who has never tested out php on a live site.
A totally blank page = there is an error within the page itself. It will return a 200 but there will be no content.
| 12:25 am on Dec 12, 2012 (gmt 0)|
|It will return a 200 but there will be no content. |
So its not totally blank as the page has http headers
| 12:35 am on Dec 12, 2012 (gmt 0)|
|A totally blank page = there is an error within the page itself. It will return a 200 but there will be no content. |
There speaks someone who doesn't read an entire paragraph...
"There is really 'no such thing' as a totally blank page, because a properly functioning server will send a status code for every URL requested, so whether content is served to the browser or not doesn't really matter as much as the status code received (or derived as in the case of 'soft 404s') by the user-agent."
I've coded PHP for 7 years and still have two tutorials posted in the Apache Forum from a different user name. HBU?
| 3:29 am on Dec 12, 2012 (gmt 0)|
I tend to think that a broken link is any that doesn't take you where the author intended.
Often the link will take you to a parked domain (which returns 200 status and "content"), but that is still broken.
| 4:01 am on Dec 12, 2012 (gmt 0)|
Okay, I guess I got the concept right now.
1. Broken Link:
Page A has link to Page B but, Page B does not exist. So the server takes the user to 404 page. So the link on Page A is 'broken' link.
2. Not a Broken Link:
Page A directly links to 404.php (or the page that is meant to be a 404 page) - is not a broken link.
Thanks to everyone who replied!
| 5:46 am on Dec 12, 2012 (gmt 0)|
I think you're focusing too much on the physical page that a human user gets taken to. From a search engine's point of view, what matters is the response header. Very few robots take the trouble to read your 404 page. They just note the 404 and go on their way. Same for almost any response other than 200.
| 2:11 pm on Dec 12, 2012 (gmt 0)|
|Very few robots take the trouble to read your 404 page. They just note the 404 and go on their way. |
Where do you get that?
The only way they don't 'take the trouble to read the page' is if they request it using a HEAD request, but we already know Google has explicitly stated they don't use a HEAD request because it really doesn't save much on resources or speed requesting up, so they want to know what's on the pages they request and use GET.
I also don't see a large number of HEAD requests for 404 (or any) pages from bots, so where exactly are you getting your info, because they DO take the trouble to get the whole page if they use GET, which means unless you have access to every engine's handling algo, you cannot know exactly what any given engine does or does not do with any given page or info based on status code, unless you have an explicit statement from someone who actually works at each engine you're referring to as part of very few (and all the rest you say 'note it and go on their way') and I haven't found those statements anywhere yet.
So, please, if you have that type of info from search engines, cite your source(s). Thanks.
| 5:20 pm on Dec 12, 2012 (gmt 0)|
If a robot can't find something, how can it read it?
| 5:42 pm on Dec 12, 2012 (gmt 0)|
HUH? The robot isn't what can't find something, it's the server that can't find the explicit information relating to, or specific resource associated with, a particular URL, so it serves the robot a default page or information with a status code indicating it's status for the request made by the robot.
Even if the page is completely blank in a browser, because they don't show you the status code and other server headers, there is still information served by a properly functioning server for Any request made.
IOW: A bot Always finds something for Every request made to a properly functioning server.
[edited by: tedster at 8:14 pm (utc) on Dec 12, 2012]
| 11:25 pm on Dec 12, 2012 (gmt 0)|
Just did a test of this on a #*$!ed website. When you serve a blank page with 200 OK header google drops the page from the keyword index entirely in my test.
| 11:40 pm on Dec 12, 2012 (gmt 0)|
I'm not sure what your test is supposed to prove?
You served a page with only server headers to GoogleBot and it didn't 'count' it for any keywords ... What else would they do with it or what did you expect to happen? I guess I just don't understand what you were testing or something...
| 11:52 pm on Dec 12, 2012 (gmt 0)|
I was just experimenting, I thought the keyword would still rank on a blank page due to inbound links. But I was wrong
| 11:59 pm on Dec 12, 2012 (gmt 0)|
Ah, that is interesting then ... Thanks for sharing and explaining ... It seems to say with a degree of accuracy the content of a page (or lack there of) can override inbound link anchor text.
ADDED: It would be interesting to see if the page shows as a 'soft 404' in WMT in the future or not if the 'blank' page is left in place for a period of time.
| 2:00 am on Dec 13, 2012 (gmt 0)|
|It seems to say with a degree of accuracy the content of a page (or lack there of) can override inbound link anchor text. |
Yes, it does. Maybe part of Google-bombing protection?
| 2:11 am on Dec 13, 2012 (gmt 0)|
It does seem as if it could be that way to me ... One interesting thought I had and posted in the Sandbox Length thread [webmasterworld.com...] is it's seeming like visitor behavior could have more of an influence than it used to.
Something I started wondering about a while ago, but I'm thinking about more even seriously now, is if links are actually on the decline and visitor behavior is moving up in the 'scheme of things' ... It could be what we're seeing reported as 'ranking just because of spammy links missed by Google' is actually people not moving beyond 'past ideas' and simply looking at the links to a page in an effort to make a determination on why it ranks, rather than looking at 'other influences' (EG search user behavior) that could be effecting rankings more than they are thought to be with an 'open mind' about how rankings are actually determined ... If that's the case, then the 'influence' and 'importance' of areas may definitely be changing...
| 4:14 pm on Dec 13, 2012 (gmt 0)|
Guess I should have put a ;-) on my last post.
| 4:20 pm on Dec 13, 2012 (gmt 0)|
LOL ... Yeah, you had me totally shaking my head thinkin 'How on earth does some one who's been here for decade... What's happened to this place?!' lol