Forum Moderators: open

Message Too Old, No Replies

More from Boston

session ids, 404s, expired domains, ODP

         

Camster

6:21 am on Mar 6, 2003 (gmt 0)

10+ Year Member



Here are a few more comments from Daniel Dulitz from the crawler session at Search Engine Strategies in Boston. (See Brett's post for a full roster and photo. [webmasterworld.com...] ) The reps of all the crawlers always come to these sessions with a few canned comments and if asked questions or pressed beyond that, they retreat to cryptic comments or terse answers. So here are summaries of Daniel's comments:

1) custom error pages: Google wants you to deliver error pages as error pages (404s). If you are trying to deliver targeted content to the user when a page can't be found, he asked that it still be done with a 404.

2) as previously posted by GoogleGuy, hide session ids in URLs from Googlebot.

3) expired domains: Google will "soon" be filtering expired domains from its index and link calculations [no further elaboration]

4) when asked about the significance of ODP/dmoz listings to Google, Daniel replied that "links from directories that people still use" have significance to Google. [he did not expand on this. the question was about ODP specifically but he did not refer directly to ODP in his answer.]

5) an audience member suggested that webmasters would be willing to pay to find out whether a site had been banned, Daniel replied that Google would love to be able to respond to these kinds of inquiries and that they are "working very hard" to do so. "When we find a fair way to do it, we will."

6) use of applications that send automated queries (eg, WebPosition) is against their terms of service. Using it may result in them blocking your searching. It won't usually affect your rank.

7) he mentioned in passing that they crawl dynamic sites more slowly than static pages so that they don't overwhelm the databases behind them [for what that's worth]

As I said, his comments were crisp and carefully worded. He did not elaborate beyond what I have noted. I have done my best to capture the gist of these comments. If you heard something different or differently, add it here.

Have at it!

Nick_W

8:00 am on Mar 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks!

You're link doesn't work though?

as previously posted by GoogleGuy, hide session ids in URLs from Googlebot.

Doesn't that mean cloaking?

<runs for cover ;)>

Nick

diddlydazz

8:03 am on Mar 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



nice one camster,

not giving a lot away was he :O) but some interesting points.

Dazz

PFOnline

8:04 am on Mar 6, 2003 (gmt 0)

10+ Year Member



Thanks Camster, now I know how Google feel's about custom error pages.

Nick_W

8:07 am on Mar 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes,

You could still redirect 404's to another page, but that page should give the 404 header to the user-agent. In PHP you'd do it like this:

header("HTTP/1.0 404 Not Found");

Nick

kovacs

8:10 am on Mar 6, 2003 (gmt 0)

10+ Year Member



Edit: nevermind, posted at the same time as Nick_W :)

PFOnline

8:13 am on Mar 6, 2003 (gmt 0)

10+ Year Member


If I'm not mistaken, that means Google prefer's the default error page browsers show, rather than a custom one like this:

http://www.webmasterworld.com/sdfsdfweweref.html

Nick_W

8:15 am on Mar 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No, I think you're mistaken. I can't see why google would care as long as they get the 404 header.

The 404 header tells Google that the page can't be found. This way it can discount it from the index next crawl.

Why would it matter if that 404 page had some content on it? - It still knows that the page doesn't exist...

Nick

kovacs

8:15 am on Mar 6, 2003 (gmt 0)

10+ Year Member



Surely a custom error page, with links to an internal SE or sitemap, is more user friendly than just a standard 404 page?

fathom

8:16 am on Mar 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Brett's thread -- the bracket is causing the error.

[webmasterworld.com...]

msr986

8:22 am on Mar 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thank you for the Boston update!

The information seems to be very useful. :)

PFOnline

8:24 am on Mar 6, 2003 (gmt 0)

10+ Year Member



Opps, I must have misunderstood... Thanks for clarification, Nick.

andreasfriedrich

9:15 am on Mar 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just to add to Nickīs point.

There really is no such thing as a standard error page and a custom one.

Except when responding to a HEAD request, the server SHOULD include an entity containing an explanation of the error situation, and whether it is a temporary or permanent condition.

RFC2616 - Hypertext Transfer Protocol -- HTTP/1.1 [faqs.org] - 10.4 Client Error 4xx

All that RFC2616 recommends is that you include in your server response header a document explaining about the error that occured. This does not imply that you need to limit yourself to just that. In fact providing links to the homepage, sitemap, and search page and a little company information to make error pages a mini homepage will add to the usability of your site.

It is a good yet not new idea to always return the appropriate status code: If the requested resource is gone, return a 410 Gone status code, if it moved permanently return a 301 Moved Permanently status code and if there is no content for a given request just return a 204 No Content status code to let the UA know that everything went well but that there is no content to this request.

Andreas

vitaplease

10:29 am on Mar 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



4) when asked about the significance of ODP/dmoz listings to Google, Daniel replied that "links from directories that people still use" have significance to Google

Sounds like some kind of popularity variable input?

WindSun

10:40 am on Mar 6, 2003 (gmt 0)

10+ Year Member



directories that people still use

Maybe all those ODP clones that get very little traffic, many of which are just parked domains?

yetanotheruser

10:46 am on Mar 6, 2003 (gmt 0)

10+ Year Member



Sounds like some kind of popularity variable input?

Doesn't PR handle this anyhow? dmoz/directory.google have a lot of incoming PR to give out.. does that make them responsible authorities..

Camster, did you get the impression they were going to tinker with ODP clones by hand, or that this is just going to get taken care of by onging PR/duplicate content work?

Thanks for the summary, very interesting :)

Brett_Tabke

10:53 am on Mar 6, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



When asked whether they supported meta keyword tags, all the crawlers but Inktomi said NO.

vitaplease

10:55 am on Mar 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sounds like some kind of popularity variable input?

Doesn't PR handle this anyhow? dmoz/directory.google have a lot of incoming PR to give out.. does that make them responsible authorities..

Pagerank up until know had little to do with popularity in the sense of amount of users, that is "directories that people still use".

But I'm probably reading too much out of that one remark.

yetanotheruser

2:04 pm on Mar 6, 2003 (gmt 0)

10+ Year Member



vitaplease,

I kinda missed that one, yeah I think I meant popularity of webmasters rather than hits, though their both a bit chicken-and-egg ;)

Do you think they have any way of getting traffic information appart from their toolbar? Am I right in thinking they don't /redirect?my.domain.com from the results.. Moreover do you think google would use traffic information if they had it?

(I vaguely remember something in one of the original papers about the concern using traffic info would lead to a feedback loop.. with which I'd agree.)

OntheEdge

2:34 pm on Mar 6, 2003 (gmt 0)

10+ Year Member



when asked about the significance of ODP/dmoz listings to Google, Daniel replied that "links from directories that people still use" have significance to Google. [he did not expand on this. the question was about ODP specifically but he did not refer directly to ODP in his answer

Another one for my "Clues the ODP is an Endangered Species" list.
Thanks for the post, its so nice those who could make it there share with us who can't! (My dogsled is broken...)

Camster

2:35 pm on Mar 6, 2003 (gmt 0)

10+ Year Member



yetanotheruser, I wouldn't read too much into the ODP comments. His comments were in response to a question and not a part of his initial comments, so it wasn't like they were trying to announce anything. So I would say it's more like something that is already reflected in link pop rather than something they are manually tweaking.

Camster

2:40 pm on Mar 6, 2003 (gmt 0)

10+ Year Member



You could still redirect 404's to another page, but that page should give the 404 header to the user-agent.

Yes, Nick_W, I'm sure that's what he meant. Thanks for elaborating.

Camster

2:46 pm on Mar 6, 2003 (gmt 0)

10+ Year Member



Daniel also confirmed yet again with a resounding no that Google's paid programs (premium and adwords) DO NOT help you get indexed or ranked.

All of the crawlers with paid inclusion programs took great pains to describe the "chinese walls" between their paid programs and crawlers. Daniel made it pretty clear that they aren't planning such a program. Although he also said, "even if we were planning one I'd still say we weren't" (loose quote).

misja

2:53 pm on Mar 6, 2003 (gmt 0)

10+ Year Member



3) expired domains: Google will "soon" be filtering expired domains from its index and link calculations [no further elaboration]

I don't know about the "no further elaboration" part ...

IHMO Google can remove expired domains from its index, but as soon as the domain is re-registered the PR will come back, since the links to the domain from other sites will (probably) still exist ...

msgraph

2:59 pm on Mar 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



IHMO Google can remove expired domains from its index, but as soon as the domain is re-registered the PR will come back, since the links to the domain from other sites will (probably) still exist ...

(PR > X) + Expired + New Whois Info = Red Flag

andreasfriedrich

2:59 pm on Mar 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>You could still redirect 404's to another page, but
>>that page should give the 404 header to the user-agent.

Which would change the meaning of what you are telling the UA. If instead of sending a 404 when a resource is not found you send a 301 or 302 to redirect to another page you are telling the UA that the originally requested resource is available at a new location. The UA will then request that new URL and will get a 404. This tells him that the new URL could not be found on the server. So the first URL moved (temporarily/permanently) and the second URL could not be found. This is different from saying the first URL could not be found.

Andreas

yetanotheruser

3:48 pm on Mar 6, 2003 (gmt 0)

10+ Year Member



Andreas,

I got the impression that giving GB 301's instead of 404's was what they we're trying to avoid? AFAIK, there's no reason why you can't give a dynamic error page with 404 in the headers..

From watching GB spidering us, it appears to que 301/302's rather than follow them, coming back to them later on, whereas if they'd got the 404 straight away they wouldn't need to re-visit?

ramitheweb

3:57 pm on Mar 6, 2003 (gmt 0)

10+ Year Member



PR > X) + Expired + New Whois Info = Red Flag

Interesting where they will get Whois info? all of registers ban after near 1K queries.

andreasfriedrich

4:01 pm on Mar 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>AFAIK, there's no reason why you can't give a dynamic
>>error page with 404 in the headers

No there isnīt. I was only referring to: You could still redirect 404's to another page, but that page should give the 404 header to the user-agent. I do not really see a need why one would do a redirect first, i.e. return a 301/302 status code and then show an error page returning a 404 status code.

>>wouldn't need to re-visit

Thatīs why I was wondering about the benefits of doing a redirect before showing an error message.

Andreas

Total Paranoia

5:43 pm on Mar 6, 2003 (gmt 0)

10+ Year Member



I have used htaccess to redirect any 404's to a homepage on a clients site. Is this the type of thing we shouldn't do? I take it from reading this thread that I should just have a 404 page with a hyperlink to the home page (or whichever page I wish within the site)

btw, I still do not understand how google would determine a true 404 - does it read the title and heading? Is that what Nick means by 404 header?

This 42 message thread spans 2 pages: 42