homepage Welcome to WebmasterWorld Guest from 54.205.254.108
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

This 42 message thread spans 2 pages: 42 ( [1] 2 > >     
More from Boston
session ids, 404s, expired domains, ODP
Camster




msg:39049
 6:21 am on Mar 6, 2003 (gmt 0)

Here are a few more comments from Daniel Dulitz from the crawler session at Search Engine Strategies in Boston. (See Brett's post for a full roster and photo. [webmasterworld.com...] ) The reps of all the crawlers always come to these sessions with a few canned comments and if asked questions or pressed beyond that, they retreat to cryptic comments or terse answers. So here are summaries of Daniel's comments:

1) custom error pages: Google wants you to deliver error pages as error pages (404s). If you are trying to deliver targeted content to the user when a page can't be found, he asked that it still be done with a 404.

2) as previously posted by GoogleGuy, hide session ids in URLs from Googlebot.

3) expired domains: Google will "soon" be filtering expired domains from its index and link calculations [no further elaboration]

4) when asked about the significance of ODP/dmoz listings to Google, Daniel replied that "links from directories that people still use" have significance to Google. [he did not expand on this. the question was about ODP specifically but he did not refer directly to ODP in his answer.]

5) an audience member suggested that webmasters would be willing to pay to find out whether a site had been banned, Daniel replied that Google would love to be able to respond to these kinds of inquiries and that they are "working very hard" to do so. "When we find a fair way to do it, we will."

6) use of applications that send automated queries (eg, WebPosition) is against their terms of service. Using it may result in them blocking your searching. It won't usually affect your rank.

7) he mentioned in passing that they crawl dynamic sites more slowly than static pages so that they don't overwhelm the databases behind them [for what that's worth]

As I said, his comments were crisp and carefully worded. He did not elaborate beyond what I have noted. I have done my best to capture the gist of these comments. If you heard something different or differently, add it here.

Have at it!

 

Nick_W




msg:39050
 8:00 am on Mar 6, 2003 (gmt 0)

Thanks!

You're link doesn't work though?

as previously posted by GoogleGuy, hide session ids in URLs from Googlebot.

Doesn't that mean cloaking?

<runs for cover ;)>

Nick

diddlydazz




msg:39051
 8:03 am on Mar 6, 2003 (gmt 0)

nice one camster,

not giving a lot away was he :O) but some interesting points.

Dazz

PFOnline




msg:39052
 8:04 am on Mar 6, 2003 (gmt 0)

Thanks Camster, now I know how Google feel's about custom error pages.

Nick_W




msg:39053
 8:07 am on Mar 6, 2003 (gmt 0)

Yes,

You could still redirect 404's to another page, but that page should give the 404 header to the user-agent. In PHP you'd do it like this:

header("HTTP/1.0 404 Not Found");

Nick

kovacs




msg:39054
 8:10 am on Mar 6, 2003 (gmt 0)

Edit: nevermind, posted at the same time as Nick_W :)

PFOnline




msg:39055
 8:13 am on Mar 6, 2003 (gmt 0)
If I'm not mistaken, that means Google prefer's the default error page browsers show, rather than a custom one like this:

http://www.webmasterworld.com/sdfsdfweweref.html

Nick_W




msg:39056
 8:15 am on Mar 6, 2003 (gmt 0)

No, I think you're mistaken. I can't see why google would care as long as they get the 404 header.

The 404 header tells Google that the page can't be found. This way it can discount it from the index next crawl.

Why would it matter if that 404 page had some content on it? - It still knows that the page doesn't exist...

Nick

kovacs




msg:39057
 8:15 am on Mar 6, 2003 (gmt 0)

Surely a custom error page, with links to an internal SE or sitemap, is more user friendly than just a standard 404 page?

fathom




msg:39058
 8:16 am on Mar 6, 2003 (gmt 0)

Brett's thread -- the bracket is causing the error.

[webmasterworld.com...]

msr986




msg:39059
 8:22 am on Mar 6, 2003 (gmt 0)

Thank you for the Boston update!

The information seems to be very useful. :)

PFOnline




msg:39060
 8:24 am on Mar 6, 2003 (gmt 0)

Opps, I must have misunderstood... Thanks for clarification, Nick.

andreasfriedrich




msg:39061
 9:15 am on Mar 6, 2003 (gmt 0)

Just to add to Nickīs point.

There really is no such thing as a standard error page and a custom one.

Except when responding to a HEAD request, the server SHOULD include an entity containing an explanation of the error situation, and whether it is a temporary or permanent condition.

RFC2616 - Hypertext Transfer Protocol -- HTTP/1.1 [faqs.org] - 10.4 Client Error 4xx

All that RFC2616 recommends is that you include in your server response header a document explaining about the error that occured. This does not imply that you need to limit yourself to just that. In fact providing links to the homepage, sitemap, and search page and a little company information to make error pages a mini homepage will add to the usability of your site.

It is a good yet not new idea to always return the appropriate status code: If the requested resource is gone, return a 410 Gone status code, if it moved permanently return a 301 Moved Permanently status code and if there is no content for a given request just return a 204 No Content status code to let the UA know that everything went well but that there is no content to this request.

Andreas

vitaplease




msg:39062
 10:29 am on Mar 6, 2003 (gmt 0)

4) when asked about the significance of ODP/dmoz listings to Google, Daniel replied that "links from directories that people still use" have significance to Google

Sounds like some kind of popularity variable input?

WindSun




msg:39063
 10:40 am on Mar 6, 2003 (gmt 0)

directories that people still use

Maybe all those ODP clones that get very little traffic, many of which are just parked domains?

yetanotheruser




msg:39064
 10:46 am on Mar 6, 2003 (gmt 0)

Sounds like some kind of popularity variable input?

Doesn't PR handle this anyhow? dmoz/directory.google have a lot of incoming PR to give out.. does that make them responsible authorities..

Camster, did you get the impression they were going to tinker with ODP clones by hand, or that this is just going to get taken care of by onging PR/duplicate content work?

Thanks for the summary, very interesting :)

Brett_Tabke




msg:39065
 10:53 am on Mar 6, 2003 (gmt 0)

When asked whether they supported meta keyword tags, all the crawlers but Inktomi said NO.

vitaplease




msg:39066
 10:55 am on Mar 6, 2003 (gmt 0)

Sounds like some kind of popularity variable input?

Doesn't PR handle this anyhow? dmoz/directory.google have a lot of incoming PR to give out.. does that make them responsible authorities..

Pagerank up until know had little to do with popularity in the sense of amount of users, that is "directories that people still use".

But I'm probably reading too much out of that one remark.

yetanotheruser




msg:39067
 2:04 pm on Mar 6, 2003 (gmt 0)

vitaplease,

I kinda missed that one, yeah I think I meant popularity of webmasters rather than hits, though their both a bit chicken-and-egg ;)

Do you think they have any way of getting traffic information appart from their toolbar? Am I right in thinking they don't /redirect?my.domain.com from the results.. Moreover do you think google would use traffic information if they had it?

(I vaguely remember something in one of the original papers about the concern using traffic info would lead to a feedback loop.. with which I'd agree.)

OntheEdge




msg:39068
 2:34 pm on Mar 6, 2003 (gmt 0)

when asked about the significance of ODP/dmoz listings to Google, Daniel replied that "links from directories that people still use" have significance to Google. [he did not expand on this. the question was about ODP specifically but he did not refer directly to ODP in his answer

Another one for my "Clues the ODP is an Endangered Species" list.
Thanks for the post, its so nice those who could make it there share with us who can't! (My dogsled is broken...)

Camster




msg:39069
 2:35 pm on Mar 6, 2003 (gmt 0)

yetanotheruser, I wouldn't read too much into the ODP comments. His comments were in response to a question and not a part of his initial comments, so it wasn't like they were trying to announce anything. So I would say it's more like something that is already reflected in link pop rather than something they are manually tweaking.

Camster




msg:39070
 2:40 pm on Mar 6, 2003 (gmt 0)

You could still redirect 404's to another page, but that page should give the 404 header to the user-agent.

Yes, Nick_W, I'm sure that's what he meant. Thanks for elaborating.

Camster




msg:39071
 2:46 pm on Mar 6, 2003 (gmt 0)

Daniel also confirmed yet again with a resounding no that Google's paid programs (premium and adwords) DO NOT help you get indexed or ranked.

All of the crawlers with paid inclusion programs took great pains to describe the "chinese walls" between their paid programs and crawlers. Daniel made it pretty clear that they aren't planning such a program. Although he also said, "even if we were planning one I'd still say we weren't" (loose quote).

misja




msg:39072
 2:53 pm on Mar 6, 2003 (gmt 0)

3) expired domains: Google will "soon" be filtering expired domains from its index and link calculations [no further elaboration]

I don't know about the "no further elaboration" part ...

IHMO Google can remove expired domains from its index, but as soon as the domain is re-registered the PR will come back, since the links to the domain from other sites will (probably) still exist ...

msgraph




msg:39073
 2:59 pm on Mar 6, 2003 (gmt 0)

IHMO Google can remove expired domains from its index, but as soon as the domain is re-registered the PR will come back, since the links to the domain from other sites will (probably) still exist ...

(PR > X) + Expired + New Whois Info = Red Flag

andreasfriedrich




msg:39074
 2:59 pm on Mar 6, 2003 (gmt 0)

>>You could still redirect 404's to another page, but
>>that page should give the 404 header to the user-agent.

Which would change the meaning of what you are telling the UA. If instead of sending a 404 when a resource is not found you send a 301 or 302 to redirect to another page you are telling the UA that the originally requested resource is available at a new location. The UA will then request that new URL and will get a 404. This tells him that the new URL could not be found on the server. So the first URL moved (temporarily/permanently) and the second URL could not be found. This is different from saying the first URL could not be found.

Andreas

yetanotheruser




msg:39075
 3:48 pm on Mar 6, 2003 (gmt 0)

Andreas,

I got the impression that giving GB 301's instead of 404's was what they we're trying to avoid? AFAIK, there's no reason why you can't give a dynamic error page with 404 in the headers..

From watching GB spidering us, it appears to que 301/302's rather than follow them, coming back to them later on, whereas if they'd got the 404 straight away they wouldn't need to re-visit?

ramitheweb




msg:39076
 3:57 pm on Mar 6, 2003 (gmt 0)

PR > X) + Expired + New Whois Info = Red Flag

Interesting where they will get Whois info? all of registers ban after near 1K queries.

andreasfriedrich




msg:39077
 4:01 pm on Mar 6, 2003 (gmt 0)

>>AFAIK, there's no reason why you can't give a dynamic
>>error page with 404 in the headers

No there isnīt. I was only referring to: You could still redirect 404's to another page, but that page should give the 404 header to the user-agent. I do not really see a need why one would do a redirect first, i.e. return a 301/302 status code and then show an error page returning a 404 status code.

>>wouldn't need to re-visit

Thatīs why I was wondering about the benefits of doing a redirect before showing an error message.

Andreas

Total Paranoia




msg:39078
 5:43 pm on Mar 6, 2003 (gmt 0)

I have used htaccess to redirect any 404's to a homepage on a clients site. Is this the type of thing we shouldn't do? I take it from reading this thread that I should just have a 404 page with a hyperlink to the home page (or whichever page I wish within the site)

btw, I still do not understand how google would determine a true 404 - does it read the title and heading? Is that what Nick means by 404 header?

This 42 message thread spans 2 pages: 42 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved