Welcome to WebmasterWorld Guest from 23.20.239.237

Forum Moderators: open

Message Too Old, No Replies

More from Boston

session ids, 404s, expired domains, ODP

     
6:21 am on Mar 6, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 26, 2002
posts:83
votes: 0


Here are a few more comments from Daniel Dulitz from the crawler session at Search Engine Strategies in Boston. (See Brett's post for a full roster and photo. [webmasterworld.com...] ) The reps of all the crawlers always come to these sessions with a few canned comments and if asked questions or pressed beyond that, they retreat to cryptic comments or terse answers. So here are summaries of Daniel's comments:

1) custom error pages: Google wants you to deliver error pages as error pages (404s). If you are trying to deliver targeted content to the user when a page can't be found, he asked that it still be done with a 404.

2) as previously posted by GoogleGuy, hide session ids in URLs from Googlebot.

3) expired domains: Google will "soon" be filtering expired domains from its index and link calculations [no further elaboration]

4) when asked about the significance of ODP/dmoz listings to Google, Daniel replied that "links from directories that people still use" have significance to Google. [he did not expand on this. the question was about ODP specifically but he did not refer directly to ODP in his answer.]

5) an audience member suggested that webmasters would be willing to pay to find out whether a site had been banned, Daniel replied that Google would love to be able to respond to these kinds of inquiries and that they are "working very hard" to do so. "When we find a fair way to do it, we will."

6) use of applications that send automated queries (eg, WebPosition) is against their terms of service. Using it may result in them blocking your searching. It won't usually affect your rank.

7) he mentioned in passing that they crawl dynamic sites more slowly than static pages so that they don't overwhelm the databases behind them [for what that's worth]

As I said, his comments were crisp and carefully worded. He did not elaborate beyond what I have noted. I have done my best to capture the gist of these comments. If you heard something different or differently, add it here.

Have at it!

8:00 am on Mar 6, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member nick_w is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Feb 4, 2002
posts:5044
votes: 0


Thanks!

You're link doesn't work though?

as previously posted by GoogleGuy, hide session ids in URLs from Googlebot.

Doesn't that mean cloaking?

<runs for cover ;)>

Nick

8:03 am on Mar 6, 2003 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Nov 27, 2001
posts:1160
votes: 2


nice one camster,

not giving a lot away was he :O) but some interesting points.

Dazz

8:04 am on Mar 6, 2003 (gmt 0)

Preferred Member

10+ Year Member

joined:Sept 22, 2002
posts:519
votes: 0


Thanks Camster, now I know how Google feel's about custom error pages.
8:07 am on Mar 6, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member nick_w is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Feb 4, 2002
posts:5044
votes: 0


Yes,

You could still redirect 404's to another page, but that page should give the 404 header to the user-agent. In PHP you'd do it like this:

header("HTTP/1.0 404 Not Found");

Nick

8:10 am on Mar 6, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 11, 2003
posts:58
votes: 0


Edit: nevermind, posted at the same time as Nick_W :)
8:13 am on Mar 6, 2003 (gmt 0)

Preferred Member

10+ Year Member

joined:Sept 22, 2002
posts:519
votes: 0

If I'm not mistaken, that means Google prefer's the default error page browsers show, rather than a custom one like this:

http://www.webmasterworld.com/sdfsdfweweref.html

8:15 am on Mar 6, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member nick_w is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Feb 4, 2002
posts:5044
votes: 0


No, I think you're mistaken. I can't see why google would care as long as they get the 404 header.

The 404 header tells Google that the page can't be found. This way it can discount it from the index next crawl.

Why would it matter if that 404 page had some content on it? - It still knows that the page doesn't exist...

Nick

8:15 am on Mar 6, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 11, 2003
posts:58
votes: 0


Surely a custom error page, with links to an internal SE or sitemap, is more user friendly than just a standard 404 page?
8:16 am on Mar 6, 2003 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member fathom is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:May 5, 2002
posts:4110
votes: 109


Brett's thread -- the bracket is causing the error.

[webmasterworld.com...]

8:22 am on Mar 6, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 7, 2001
posts:661
votes: 0


Thank you for the Boston update!

The information seems to be very useful. :)

8:24 am on Mar 6, 2003 (gmt 0)

Preferred Member

10+ Year Member

joined:Sept 22, 2002
posts:519
votes: 0


Opps, I must have misunderstood... Thanks for clarification, Nick.
9:15 am on Mar 6, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 22, 2002
posts:1782
votes: 0


Just to add to Nickīs point.

There really is no such thing as a standard error page and a custom one.

Except when responding to a HEAD request, the server SHOULD include an entity containing an explanation of the error situation, and whether it is a temporary or permanent condition.

RFC2616 - Hypertext Transfer Protocol -- HTTP/1.1 [faqs.org] - 10.4 Client Error 4xx

All that RFC2616 recommends is that you include in your server response header a document explaining about the error that occured. This does not imply that you need to limit yourself to just that. In fact providing links to the homepage, sitemap, and search page and a little company information to make error pages a mini homepage will add to the usability of your site.

It is a good yet not new idea to always return the appropriate status code: If the requested resource is gone, return a 410 Gone status code, if it moved permanently return a 301 Moved Permanently status code and if there is no content for a given request just return a 204 No Content status code to let the UA know that everything went well but that there is no content to this request.

Andreas

10:29 am on Mar 6, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member vitaplease is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Dec 11, 2001
posts:2725
votes: 0


4) when asked about the significance of ODP/dmoz listings to Google, Daniel replied that "links from directories that people still use" have significance to Google

Sounds like some kind of popularity variable input?

10:40 am on Mar 6, 2003 (gmt 0)

Full Member

10+ Year Member

joined:May 8, 2002
posts:325
votes: 0


directories that people still use

Maybe all those ODP clones that get very little traffic, many of which are just parked domains?

10:46 am on Mar 6, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 26, 2003
posts:177
votes: 0


Sounds like some kind of popularity variable input?

Doesn't PR handle this anyhow? dmoz/directory.google have a lot of incoming PR to give out.. does that make them responsible authorities..

Camster, did you get the impression they were going to tinker with ODP clones by hand, or that this is just going to get taken care of by onging PR/duplicate content work?

Thanks for the summary, very interesting :)

10:53 am on Mar 6, 2003 (gmt 0)

Administrator from US 

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 21, 1999
posts:38057
votes: 12


When asked whether they supported meta keyword tags, all the crawlers but Inktomi said NO.
10:55 am on Mar 6, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member vitaplease is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Dec 11, 2001
posts:2725
votes: 0


Sounds like some kind of popularity variable input?

Doesn't PR handle this anyhow? dmoz/directory.google have a lot of incoming PR to give out.. does that make them responsible authorities..

Pagerank up until know had little to do with popularity in the sense of amount of users, that is "directories that people still use".

But I'm probably reading too much out of that one remark.

2:04 pm on Mar 6, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 26, 2003
posts:177
votes: 0


vitaplease,

I kinda missed that one, yeah I think I meant popularity of webmasters rather than hits, though their both a bit chicken-and-egg ;)

Do you think they have any way of getting traffic information appart from their toolbar? Am I right in thinking they don't /redirect?my.domain.com from the results.. Moreover do you think google would use traffic information if they had it?

(I vaguely remember something in one of the original papers about the concern using traffic info would lead to a feedback loop.. with which I'd agree.)

2:34 pm on Mar 6, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Dec 23, 2002
posts:150
votes: 0


when asked about the significance of ODP/dmoz listings to Google, Daniel replied that "links from directories that people still use" have significance to Google. [he did not expand on this. the question was about ODP specifically but he did not refer directly to ODP in his answer

Another one for my "Clues the ODP is an Endangered Species" list.
Thanks for the post, its so nice those who could make it there share with us who can't! (My dogsled is broken...)

2:35 pm on Mar 6, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 26, 2002
posts:83
votes: 0


yetanotheruser, I wouldn't read too much into the ODP comments. His comments were in response to a question and not a part of his initial comments, so it wasn't like they were trying to announce anything. So I would say it's more like something that is already reflected in link pop rather than something they are manually tweaking.
2:40 pm on Mar 6, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 26, 2002
posts:83
votes: 0


You could still redirect 404's to another page, but that page should give the 404 header to the user-agent.

Yes, Nick_W, I'm sure that's what he meant. Thanks for elaborating.

2:46 pm on Mar 6, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 26, 2002
posts:83
votes: 0


Daniel also confirmed yet again with a resounding no that Google's paid programs (premium and adwords) DO NOT help you get indexed or ranked.

All of the crawlers with paid inclusion programs took great pains to describe the "chinese walls" between their paid programs and crawlers. Daniel made it pretty clear that they aren't planning such a program. Although he also said, "even if we were planning one I'd still say we weren't" (loose quote).

2:53 pm on Mar 6, 2003 (gmt 0)

New User

10+ Year Member

joined:Feb 2, 2003
posts:27
votes: 0


3) expired domains: Google will "soon" be filtering expired domains from its index and link calculations [no further elaboration]

I don't know about the "no further elaboration" part ...

IHMO Google can remove expired domains from its index, but as soon as the domain is re-registered the PR will come back, since the links to the domain from other sites will (probably) still exist ...

2:59 pm on Mar 6, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 29, 2000
posts:1425
votes: 0


IHMO Google can remove expired domains from its index, but as soon as the domain is re-registered the PR will come back, since the links to the domain from other sites will (probably) still exist ...

(PR > X) + Expired + New Whois Info = Red Flag

2:59 pm on Mar 6, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 22, 2002
posts:1782
votes: 0


>>You could still redirect 404's to another page, but
>>that page should give the 404 header to the user-agent.

Which would change the meaning of what you are telling the UA. If instead of sending a 404 when a resource is not found you send a 301 or 302 to redirect to another page you are telling the UA that the originally requested resource is available at a new location. The UA will then request that new URL and will get a 404. This tells him that the new URL could not be found on the server. So the first URL moved (temporarily/permanently) and the second URL could not be found. This is different from saying the first URL could not be found.

Andreas

3:48 pm on Mar 6, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 26, 2003
posts:177
votes: 0


Andreas,

I got the impression that giving GB 301's instead of 404's was what they we're trying to avoid? AFAIK, there's no reason why you can't give a dynamic error page with 404 in the headers..

From watching GB spidering us, it appears to que 301/302's rather than follow them, coming back to them later on, whereas if they'd got the 404 straight away they wouldn't need to re-visit?

3:57 pm on Mar 6, 2003 (gmt 0)

New User

10+ Year Member

joined:Feb 21, 2003
posts:29
votes: 0


PR > X) + Expired + New Whois Info = Red Flag

Interesting where they will get Whois info? all of registers ban after near 1K queries.

4:01 pm on Mar 6, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 22, 2002
posts:1782
votes: 0


>>AFAIK, there's no reason why you can't give a dynamic
>>error page with 404 in the headers

No there isnīt. I was only referring to: You could still redirect 404's to another page, but that page should give the 404 header to the user-agent. I do not really see a need why one would do a redirect first, i.e. return a 301/302 status code and then show an error page returning a 404 status code.

>>wouldn't need to re-visit

Thatīs why I was wondering about the benefits of doing a redirect before showing an error message.

Andreas

5:43 pm on Mar 6, 2003 (gmt 0)

Full Member

10+ Year Member

joined:Oct 30, 2002
posts:236
votes: 0


I have used htaccess to redirect any 404's to a homepage on a clients site. Is this the type of thing we shouldn't do? I take it from reading this thread that I should just have a 404 page with a hyperlink to the home page (or whichever page I wish within the site)

btw, I still do not understand how google would determine a true 404 - does it read the title and heading? Is that what Nick means by 404 header?

This 42 message thread spans 2 pages: 42