| This 42 message thread spans 2 pages: < < 42 ( 1  ) || |
|More from Boston|
session ids, 404s, expired domains, ODP
Here are a few more comments from Daniel Dulitz from the crawler session at Search Engine Strategies in Boston. (See Brett's post for a full roster and photo. [webmasterworld.com...] ) The reps of all the crawlers always come to these sessions with a few canned comments and if asked questions or pressed beyond that, they retreat to cryptic comments or terse answers. So here are summaries of Daniel's comments:
1) custom error pages: Google wants you to deliver error pages as error pages (404s). If you are trying to deliver targeted content to the user when a page can't be found, he asked that it still be done with a 404.
2) as previously posted by GoogleGuy, hide session ids in URLs from Googlebot.
3) expired domains: Google will "soon" be filtering expired domains from its index and link calculations [no further elaboration]
4) when asked about the significance of ODP/dmoz listings to Google, Daniel replied that "links from directories that people still use" have significance to Google. [he did not expand on this. the question was about ODP specifically but he did not refer directly to ODP in his answer.]
5) an audience member suggested that webmasters would be willing to pay to find out whether a site had been banned, Daniel replied that Google would love to be able to respond to these kinds of inquiries and that they are "working very hard" to do so. "When we find a fair way to do it, we will."
6) use of applications that send automated queries (eg, WebPosition) is against their terms of service. Using it may result in them blocking your searching. It won't usually affect your rank.
7) he mentioned in passing that they crawl dynamic sites more slowly than static pages so that they don't overwhelm the databases behind them [for what that's worth]
As I said, his comments were crisp and carefully worded. He did not elaborate beyond what I have noted. I have done my best to capture the gist of these comments. If you heard something different or differently, add it here.
Have at it!
You mentioned automated queries (eg, WebPosition). What exactly is that? Sounds like I want to stay away from it, but kinda hard to do if you don't know what it is!
These are peices of software that check your position for your keywords on the major search engines.
Google like to keep resources free for their users and do not like bandwidth being used by Webposition or similar programs.
So, what I'm getting is that it's ok to check your ranking by hand but not by software. Is that right? And what about all the sites like the Google Dance Tools that search www1, www2, and www3 of Google at the same time? Or the sites that will automatically look up all of your backlinks with multiple search engines? Do these come under automated inquiries?
>3) expired domains: Google will "soon" be filtering expired domains from its index and link calculations [no further elaboration]
I don't quite grok what the above means. An expired domain will naturally not resolve. Thus, these are already removed in the next index because Googlebot won't be able to spider them. Or, do perhaps they mean they will automatically remove them between index updates as soon as they know it is expired? I'll presume that the link calculations part is that if some page that happens not to be updated in many years has 10 links on it, and now 9 are long dead yet one still works, all PR will be transferred to the one that does work.
>4) when asked about the significance of ODP/dmoz listings to Google, Daniel replied that "links from directories that people still use" have significance to Google. [he did not expand on this. the question was about ODP specifically but he did not refer directly to ODP in his answer.]
Obviously Google will continue to count the ODP. Otherwise, they'd be disregarding links from their *own* directory because that is just an ODP mirror. I wonder if perhaps the above means Google is planning on ignoring in the future a lot of small directories that the don't consider important?
There are certainly enough editors of ODP who use it to consider it used.
Do you think they could be refering to directories that are never updated, or have a large percentage of dead links?
Google must mean something like the latter BigDave. If Google thought the ODP wasn't used they wouldn't have a copy on their site.
re expired domains:
This is pure speculation, but I took it to mean that they are addressing the problem of expired domains displaying or being redirected to irrelevant content? This wouldn't necessarily get detected in the crawl/update cycle since there is still a site resolving at the domain. And there are still links to it.
They would have to tackle this algorithmically (I like the sound of that!). Perhaps they could compare link text/context and the actual content of the page, using domain expiration lists as a seed for this comparison. Or maybe there will be an anti-freshbot / death-bot, crawling expired-then-purchased domains looking for irrelevance?
As I said, Google didn't elaborate. This is just my own speculation.
|when asked about the significance of ODP/dmoz listings to Google, Daniel replied that "links from directories that people still use" have significance to Google. [he did not expand on this. the question was about ODP specifically but he did not refer directly to ODP in his answer.] |
That was actually my question (is it ok if I take credit), and I am still kinda disappointed with the answer. Does that mean that if nobody is using DMOZ that we shouldn't really care about.
RE: Blocking ranking programs.
What I got from the conference and talking with the search engine people, including the Google reps, was that you won't be blocked for using a ranking software unless you "abuse" it. I think what they meant was excessive searching on the index. The index doesn't update but once a month, so ranking your site more than once or twice could be considered abuse.
When I spoke with Daniel, he mentioned that users could sign up for the API program if they were worried about being blocked from the search engine. I didn't think many people would be interested in doing that. Plus, that still doesn't guarantee you wont get blocked.
If you're afraid of getting banned or blocked, you can use a dial-up account to rank your web site and never search for your url. Then you really don't have anything to worry about.
Fast uses the Macromedia SDK to index flash sites. Since that basically just turns flash in to a crappy HTML page I asked Tim Mayer about the potential of duplicate content problems for those that currently provide an HTML & Flash version of thier site. In a nutshell he said it could be a problem so you should robots.txt one version of the site.
I thought this was pretty important since the duplicate content in this case is actually being generated by Fast, not the publisher.
Kinda an aside here, but why was Boston chosen? Las Vegas would be a nice, convenient location. Great weather, too. Just my .02
| This 42 message thread spans 2 pages: < < 42 ( 1  ) |