|but the most reliable way is to make sure that no one has copied your content without your permission. |
GG, even with permission, google has no way of determining which is the original and which is the copy...
The best policy I'm afraid is only to give permission to a trusted source who agrees to block google in robots.txt.
Or just deny permission.
Can someone please help answer this question.
My website shows up #1 for my optimized keyword (h1, title, anchor text) in the fi DIRECTORY (and all others), but is buried #50 somewhere in the web results.
Can anyone imagine why? It's been like this since Dominic. There has to be something in the web algo that is kicking it back to #50 when it's #1 in the directory. But what could the web algo be doing the directory algo is not?
Thanks for your opinions,
<added> I recently went through all my links and removed 4 links to sites that have become PR0 since I linked to them 6 months ago when they were at least a 4. Other then that, I can't figure out why.
[edited by: Zapatista at 9:50 pm (utc) on June 16, 2003]
|Maybe a good nights sleep with help clarify a few thing: my head is spinning with links and anchor text. |
Same here ;)
To be honest:
Concerning some of my sites, bringing up sub pages makes sense (esp for spec key phrases, pages w. links, anchors and stuff).
But there are those contact, imprint aso irrelevant-content pages. As you said: low density of key words bla-bla (sorry, not adressing you).
At the other hand:
I've a strong impression G improves ... if I could just catch it!
Maybe still too optimistic?
I expect it'll shift around for at least 2-3 more days, and probably closer to 3..
If I had to bet: You'll be looking at thursday or friday before it ends.. I'm hoping they'll calculate more links from high PR.. Seems like there is no method to the madness of what links they're including..
I think the order by PR in the directory has also been updated. Can someone confirm my observation?
Ugh, I'm seeing a six month old directory on directory.google.com
<edit... make that five months old>
>>>One other piece of data - when the results for Dominic were on the data centers, sometimes the site would appear briefly on 1 or more datacenters at #1 for the keyword blue widgets. However, after a few hours, it would disappear from the front page and would appear again on page 8.
I have noticed on some sites that what seems to be happening is when they have a fresh date tag they have much worse rankings than when there is no fresh date tag. I looked at the cache pages and couldn't see any differences. Any ideas?
|Just a comment - don't know whether this is new or old, but the fact that Google drops out "it" as a common word makes it difficult to search for a phrase such as "jobs in IT" (as in 'information technology'). |
No, that behavior is not new for this update or any recent one... and making the query you want is not difficult at all. You hit on one way to do it in your post: enclose the phrase in quotes. Another would be to use the plus sign to make "IT" a required part of the query: jobs in +IT, or even jobs +in +IT if you wanted to be sure that "in" was in.
Stop words are used by every search engine, and it's clear that always including such common words would cause them to return less relevant results for more queries than not including would result in -- especially since there are easy ways to work around it.
I am seeing a fair amount of shifting of the SERPS in my neighborhood. I am also noticing some sites that did well last update were knocked down a page or so yesterday but today are nowhere to be found. I wonder if those will pop up later of if they just got filtered out by some recently applied filter.
It's really hard to tell what Google favors now (maybe it's still too early), but if I had to guess at this point, it might have something to do with less weight on anchor text and incoming links. Though I'm still seeing sites without much relevant content occupying top 5 positions on what seems to be off page factors alone. Maybe Google prefers one without the other. I can tell you, I'm not seeing any more perfectly optimized sites on top right now. Still probably futile to ponder while things are shifting.
I'm wondering how strict those hidden text filters are. I have 3 sites that use #CCCCCC text on white. It's not easy to read when it's small but I use it in a fairly large H1. Those sites have vanished today. Googleguy?
I recently built 10 sites that only use the index page - they are only 1 page deep.
Some of them are linked to each other - but not every sites linked to every site.
They are now showing 5-6 backlinks each and the new PR is 3-4 on each site - yet the are NOWHERE to be found in the SERP's - not even in the top 1000?
Is this a penalty - or is it an index pahe problem?
The onyl thing is that the index page is the only page used on the domain?
Just because those other search engines show 80 backlinks doesn't mean Google will because Google doesn't show anything less then a PR4. Google is saying you only have 7 backlinks that are PR4 or higher.
"Just because those other search engines show 80 backlinks doesn't mean Google will because Google doesn't show anything less then a PR4. Google is saying you only have 7 backlinks that are PR4 or higher."
Thank you for that info, i did not know that, however, I have several backlinks that are 6 or higher and they are not showing at all...
Hi vbjaeger, welcome to WebmasterWorld. Google probably saw all those backlinks to your site--it's just that we don't report all backlinks that we see. But don't worry; we find and process many more links than we report.
If you jumped 27 notches and there's only 100 people ahead of you, then you sound like a natural SEO--you got 1/5th of the way there in one try. :)
I'd recommend reading around here more. Start with the FAQ and Brett's guide to building traffic in 26 steps. Feel free to keep an eye on what I post via my profile page. Think about terms that users would actually type to find your site, and make sure you've got them on your pages. Look for other sites that your users might like and link to them, and look for sites that would be a good match for your domain and nicely ask if they'd link to you. Definitely read through our section at www.google.com/webmasters/ for more advice about how to make your site more crawlable.
One cautionary word of advice: take everything with a grain of salt, and make choices that are common sense to you and work well for your users. For example, there was recently a thread that suggested Google was running out of "address space" to label our documents. I was talking to another engineer here and he said he almost fell out of his chair laughing when he read that. So there's always a lot of theories floating around all the time about why something is this way or that. My advice is to assume that Google wants the most useful, relevant pages to come up first for searchers. Try to build those useful, relevant pages as well as you can, and we'll do our best to find them and rank them accurately for searches.
Welcome to WebmasterWorld!
[edited by: GoogleGuy at 11:56 pm (utc) on June 16, 2003]
vbjaeger, are you checking your backlinks on www.google.com, or www-fi.google.com? The latter will have the most recnt index.
I checked on fi- and found 3 more, but it seems we actually lost a couple and gained more from within my own site.
"My advice is to assume that Google wants the most useful, relevant pages to come up first for searchers. Try to build those useful, relevant pages as well as you can..."
I think that for a vast majority of site owners, this is what we are trying to do and we do try our best to stay within Google's guidelines. I can say for me, that building a user friendly sight has always helped with google. I may have had a few bumps with the last update (and maybe this one as well but I am hoping things will change). Generally good content and a relevant site will be rewarded by google and by users.
When I do an allinurl search for my site, I'm coming up with the indexed pages in mydomain.org, but I'm also coming up with duplicate content on URLs like:
I've never seen this before.
Googleguy, is this likely to cause me problems? Anyone any ideas about how to stop this happening if it is a problem?
I still think there's a lot of merit to the "address space" theory... for one thing, if you set up a sim in RAM it actual terminates in a flow error... something the boys at microsoft have been working with, and showed off here in DC a few weeks back.
|Start with the FAQ and Brett's guide to building traffic in 26 steps. |
It would be interesting to see how many who have missing index pages have followed most of those steps. I did... whoosh..gone
|My advice is to assume that Google wants the most useful, relevant pages to come up first for searchers. Try to build those useful, relevant pages as well as you can, and we'll do our best to find them and rank them accurately for searches. |
That's why I'm not about to jump out the first floor window.
I'm still confused as to why I see 45 backlinks on all datacenters except -fi, where I only see 8.
Why would I lose 37 backlinks?
The links are still there. I checked.
Will this kill my PR, and slam me?
Oh, and another site of mine is showing a Grey Bar now. I'm really confused by all this because the site shows up on numerous searches on -fi. While I understand that there will be fluctuation of PR on the toolbar right now, should that include fluctuating into the Grey Bar range, which generally means the site has been penalized doesn't it?
The address space theory is bunk, and this is why:
Google is going to identify pages by the result garnered through some hash algorithm or message digest, the inputs of which may be domain name, path, title, who knows? This allows them to separate index bits by prefixes of this message digest/hash result.
Given a hash "signature" that is 20 characters long, and using both upper and lower case letters as legal characters, we have a total address space of 20^^52, or 4.503599627370496e+67 (4.5 with 67 zeros after it).
If we think that Google engineers are using a four-byte unsigned integer value to identify pages we may be taking an infantile view of their structure. :)
Hey, I'm going to have to turn off my stickymail (I'm slowly collapsing under the weight of stickies to read), but I'll still be reading this thread off and on. I'm checking on subpages showing. Some searches I've collected seem like the right level of detail to me. In another case I don't think the final pageranks have settled down yet. But if you have specific feedback on good or bad searches, you can always send it to webmaster at google.com.
I hope this is the place for my posting.
I have a site which appears in www-fi.google.com in the number one spot for my chosen key words. It also had appeared as number one in the last index used. Now with this new update I cant find it anywhere. The site is about a month old, perhaps a little more.
Am i right in thinking that the www-fi.google.com is the latest index, and that this is a snapshot of what is going to hit the streets?
Can I sit back and wait for my site to hit the high street as it appears in the fi index.
Is is possible that an old index is being used while the update completes?
|Google is going to identify pages by the result garnered through some hash algorithm or message digest, the inputs of which may be domain name, path, title, who knows? This allows them to separate index bits by prefixes of this message digest/hash result. |
You sound pretty sure of that.
|Given a hash "signature" that is 20 characters long, and using both upper and lower case letters as legal characters, we have a total address space of 20^^52, or 4.503599627370496e+67 (4.5 with 67 zeros after it). |
That would be 52^20, 2.0896e+34.
Of course that would also require significantly more storage space than the proposed 4-byte system.
I'm not one to guess at the mechanics of the system, but I can see how just a little forethought would have avoided any sort of address space issues that have been speculated about. When you realize that just an extra byte can really save your butt in these situations, you aren't likely to be cheap about it.
to the guy who has had his site copied , I hope this might help , 8 months ago a company copied one of my sites (300 pages )word for word even to the point of leaving my name in email addresses to their domain, I was gutted and thought about evil devilish palns to sabatage them , but then decided if there was any justice on the net I would just keep working away at my site and not spend all my time worrying about them . low and behold 6 months later i was back to the same stats where i was and up somewhat . the company in question uses every dubious technique to obtain visitors on umpteen domains and still appears in top 3 places on loads of keywords with no substance to their site but it is easy to forget the surfers out there will only be conned once with search engine results so my moral is serps and PR is only king for a day what we provide to our visitors hopefully will decide if we succeed long term
"I'm checking on subpages showing. Some searches I've collected seem like the right level of detail to me. In another case I don't think the final pageranks have settled down yet."
I certainly hope that pagerank will soon kick in and obvious errors be corrected. Also, a complete lack of any pagerank calculations could conceivably explain why situations like mine are occuring (but it would have to be "complete lack", with any calculation of PR the current ranking of the minor pages would just be silly). Therefore I will attampt to have faith for a bit longer...
My users may be goofy but I think even they might wonder why they are starting on page three of a three page article instead of page one.
Also, this straightjacket the white coats have me in makes it hard to type....
<<it is easy to forget the surfers out there will only be conned once with search engine results so my moral is serps and PR is only king for a day what we provide to our visitors hopefully will decide if we succeed long term>>
Well stated. Some of my business interests are in areas especially subject to spam noise. Often quality does rise to the top. But man oh man you can really get buried in the noise until Google implements filters to stop it. But while buried in spam, cloaked sites and thousands of identical doorway pages, it's good to remember words like yours. Quality will be remembered. Trust of the public is easy to lose.
|Hey, I'm going to have to turn off my stickymail (I'm slowly collapsing under the weight of stickies to read) |
Incredible; GG actually had stickymail on. There must be some special WebmasterWorld medal of honour for that. Brett, any chance of a trophy or citation or something for GG on this one?
To stay on topic; Esmeralda was the girlfriend of the Hunchback of Notre Dame, and the update came on Father's Day.... coincidence?
Whoops! Thanks for the correction. (It's late--that's my story and I'm sticking to it).
While it's true that the 20 character identity would add to storage space, at this time (3 billion pages) it would "only" be 16*3 billion, or around 40-ish gigs of space. If you spread this out over, say, 40 machines that's a gig apiece: not a lot. And the index would be spread out over more than 50 machines.
<<My users may be goofy but I think even they might wonder why they are starting on page three of a three page article instead of page one.
Also, this straightjacket the white coats have me in makes it hard to type.... >>
That made my night :)
What benefit does landing 2 clicks away from the content the user is searching for have?
Also, I'm not quite convinced that frshbot has done the job as well as deep used to as it just didn't seem to pick up all the pages it should have :(
|While it's true that the 20 character identity would add to storage space, at this time (3 billion pages) it would "only" be 16*3 billion, or around 40-ish gigs of space. If you spread this out over, say, 40 machines that's a gig apiece: not a lot. And the index would be spread out over more than 50 machines. |
Excuse me, but read the essay by Brin and Page, "The Anatomy of a Large-Scale Hypertextual Web Search Engine," about the Google architecture. They use two inverted indexes, the "fancy index" and the "plain index." Between these two indexes, plus the other places in the system where the docID is used, it amounts to a total space requirement of two docIDs per word per document indexed.
Yes, that's not one docID per document, it's two docIDs per word per document.
You can use a 20-byte hash if you like, but I think a four-byte or five-byte docID would make just a little more sense.
| This 249 message thread spans 9 pages: < < 249 ( 1 2 3  5 6 7 8 9 ) > > |