Forum Moderators: open
1) custom error pages: Google wants you to deliver error pages as error pages (404s). If you are trying to deliver targeted content to the user when a page can't be found, he asked that it still be done with a 404.
2) as previously posted by GoogleGuy, hide session ids in URLs from Googlebot.
3) expired domains: Google will "soon" be filtering expired domains from its index and link calculations [no further elaboration]
4) when asked about the significance of ODP/dmoz listings to Google, Daniel replied that "links from directories that people still use" have significance to Google. [he did not expand on this. the question was about ODP specifically but he did not refer directly to ODP in his answer.]
5) an audience member suggested that webmasters would be willing to pay to find out whether a site had been banned, Daniel replied that Google would love to be able to respond to these kinds of inquiries and that they are "working very hard" to do so. "When we find a fair way to do it, we will."
6) use of applications that send automated queries (eg, WebPosition) is against their terms of service. Using it may result in them blocking your searching. It won't usually affect your rank.
7) he mentioned in passing that they crawl dynamic sites more slowly than static pages so that they don't overwhelm the databases behind them [for what that's worth]
As I said, his comments were crisp and carefully worded. He did not elaborate beyond what I have noted. I have done my best to capture the gist of these comments. If you heard something different or differently, add it here.
Have at it!
The 404 header tells Google that the page can't be found. This way it can discount it from the index next crawl.
Why would it matter if that 404 page had some content on it? - It still knows that the page doesn't exist...
Nick
[webmasterworld.com...]
There really is no such thing as a standard error page and a custom one.
Except when responding to a HEAD request, the server SHOULD include an entity containing an explanation of the error situation, and whether it is a temporary or permanent condition.RFC2616 - Hypertext Transfer Protocol -- HTTP/1.1 [faqs.org] - 10.4 Client Error 4xx
All that RFC2616 recommends is that you include in your server response header a document explaining about the error that occured. This does not imply that you need to limit yourself to just that. In fact providing links to the homepage, sitemap, and search page and a little company information to make error pages a mini homepage will add to the usability of your site.
It is a good yet not new idea to always return the appropriate status code: If the requested resource is gone, return a 410 Gone status code, if it moved permanently return a 301 Moved Permanently status code and if there is no content for a given request just return a 204 No Content status code to let the UA know that everything went well but that there is no content to this request.
Andreas
Sounds like some kind of popularity variable input?
Doesn't PR handle this anyhow? dmoz/directory.google have a lot of incoming PR to give out.. does that make them responsible authorities..
Camster, did you get the impression they were going to tinker with ODP clones by hand, or that this is just going to get taken care of by onging PR/duplicate content work?
Thanks for the summary, very interesting :)
Sounds like some kind of popularity variable input?
Doesn't PR handle this anyhow? dmoz/directory.google have a lot of incoming PR to give out.. does that make them responsible authorities..
Pagerank up until know had little to do with popularity in the sense of amount of users, that is "directories that people still use".
But I'm probably reading too much out of that one remark.
I kinda missed that one, yeah I think I meant popularity of webmasters rather than hits, though their both a bit chicken-and-egg ;)
Do you think they have any way of getting traffic information appart from their toolbar? Am I right in thinking they don't /redirect?my.domain.com from the results.. Moreover do you think google would use traffic information if they had it?
(I vaguely remember something in one of the original papers about the concern using traffic info would lead to a feedback loop.. with which I'd agree.)
when asked about the significance of ODP/dmoz listings to Google, Daniel replied that "links from directories that people still use" have significance to Google. [he did not expand on this. the question was about ODP specifically but he did not refer directly to ODP in his answer
Another one for my "Clues the ODP is an Endangered Species" list.
Thanks for the post, its so nice those who could make it there share with us who can't! (My dogsled is broken...)
All of the crawlers with paid inclusion programs took great pains to describe the "chinese walls" between their paid programs and crawlers. Daniel made it pretty clear that they aren't planning such a program. Although he also said, "even if we were planning one I'd still say we weren't" (loose quote).
3) expired domains: Google will "soon" be filtering expired domains from its index and link calculations [no further elaboration]
I don't know about the "no further elaboration" part ...
IHMO Google can remove expired domains from its index, but as soon as the domain is re-registered the PR will come back, since the links to the domain from other sites will (probably) still exist ...
Which would change the meaning of what you are telling the UA. If instead of sending a 404 when a resource is not found you send a 301 or 302 to redirect to another page you are telling the UA that the originally requested resource is available at a new location. The UA will then request that new URL and will get a 404. This tells him that the new URL could not be found on the server. So the first URL moved (temporarily/permanently) and the second URL could not be found. This is different from saying the first URL could not be found.
Andreas
I got the impression that giving GB 301's instead of 404's was what they we're trying to avoid? AFAIK, there's no reason why you can't give a dynamic error page with 404 in the headers..
From watching GB spidering us, it appears to que 301/302's rather than follow them, coming back to them later on, whereas if they'd got the 404 straight away they wouldn't need to re-visit?
No there isnīt. I was only referring to: You could still redirect 404's to another page, but that page should give the 404 header to the user-agent. I do not really see a need why one would do a redirect first, i.e. return a 301/302 status code and then show an error page returning a 404 status code.
>>wouldn't need to re-visit
Thatīs why I was wondering about the benefits of doing a redirect before showing an error message.
Andreas
btw, I still do not understand how google would determine a true 404 - does it read the title and heading? Is that what Nick means by 404 header?