| This 200 message thread spans 7 pages: < < 200 ( 1 2 3 4  6 7 ) > > || |
|Some big observations on dropped pages|
| 5:01 pm on May 22, 2006 (gmt 0)|
I have been trying to figure our why my site dropped from 57,000 pages down to only 700. Today I noticed a huge pattern, and barring something major, I believe it is the reason for the dropped pages. First, I noticed that all pages three levels deep and higher are indexed. Any pages indexed lower than that are externally linked in some way.
How I noticed this, is that we have a huge directory of content arranged alphabetically with each letter being a seperate page a.html for example. From my front page I have a.html linked, and then all the content links on that page. The content that starts with a letter 'a' is all indexed. The pages like b.html and c.html are also indexed, but the individual content pages aren't.
So, what this means is that Google is giving an overall site PR which tells it how many levels down it will index. In my limited research it seems that a site with a front page of PR 5 will get indexed three levels down, and a site of PR 6 will get indexed four levels down. Those below PR 5 I have looked at are barely getting spidered.
When doing this, keep in mind that your front page counts as a level. So if you are only PR 5 it seems like if you have a huge directory don't split it up into sections, just have a huge page with the links to it all. This of course totally hoses usability but you will get spidered.
Also, externally linked pages will get spidered, as a few of the pages listed under the other letters are indexed, as they are linked in blogs and other sites. This is across the board what is happening on my site and the others I have looked at.
Count your levels getting spidered and you will notice how deep they are going. For me, three levels and that is it except for the externally linked individual pages I have seen.
[edited by: tedster at 6:16 pm (utc) on May 22, 2006]
[edit reason] formatting [/edit]
| 10:41 am on May 29, 2006 (gmt 0)|
Making duplicate pages isn't a well laid out structure, any more than washing the same dish twice is a good dishwashing practice.
Websites are about linking, not typing some 150 character URL into the address bar. There is no reason to have two pages. Just link to one single page from two or more sections.
| 11:08 am on May 29, 2006 (gmt 0)|
did I say anything about making duplicate pages. What I said was to offer people the information laid out so it would be easy to find - whether you are talking about a review site or store. So whats wrong in putting information out by location or product type or as I said are we going to go backwards by having pages with massive long list of information - this is not progress
| 3:42 pm on May 29, 2006 (gmt 0)|
My pages are not duplicates either. That's why I went for the structure I've now got - and been penalised by having very few pages indexed.
If changing the 'directory' structure to a flat one as above will get more pages indexed I'll do it.
| 5:44 pm on May 29, 2006 (gmt 0)|
While philosophically I feel it is not that great to go to a flat structure, philosophy doesn't rule business, as much as the web 2.0 and open source community want you to believe. As of now because of my new "flat structure" I have over 98k pages indexed and an amazing amount of traffic! I could go back to the previous structure and have 700 pages indexed and almost no Google traffic. I could feel good that I have an awesome directory structure though.
The thing is, is that I retained my original directory, I just link to the flat list in a place where users rarely go. That way I get the best of both worlds. If you have noticed, forums have long used long flat lists of topics that are the top results for many searches. This isn't new.
| 7:20 pm on May 29, 2006 (gmt 0)|
Flat sites do not work well for very large sites.
Heirarchical folder structures with breadcrumb navigation can do very well. They have good internal linking both up and down the structure, as well as supplying lots of nice keyword-loaded anchor text on all of the internal links.
| 7:41 pm on May 29, 2006 (gmt 0)|
Having 300 pages, each with the keyword and location in it, providing almost the same information (review) or information about the hotels in the area, for each area, seems excessive to me.
Not to critique jolene, as this is about intention, but I have seen many of these sites, usualy which list:
hotels this location ¦ hotels that location
etc at the footer. I think its unfortunate that Google seems to penalize this for the rare site who uses it with good intentions.
| 8:02 pm on May 29, 2006 (gmt 0)|
g1smd, we have a very large site and had the breadcrumb structure with keywords, etc. Static html pages too, with generated up and down navigation with no duplicate content in a tree structure. The thing that this whole thread is about that people still seem to not read is that they are arbitrarily choosing how deep they will crawl. No magic structure would change that. While it is all anecdotal I don't know how you can say this isn't the case when I dropped from 57,000 pages indexed to 700, then did a flat directory of the articles and went back up in one day after being at 700 for a week. We are a very large site and do the flat structure just fine for seo indexes. Another guy here has been corresponding with me and his site had the same thing, just a few days ago built around six site maps and wham he was indexed again. All the good seo in the world was not gonna change the fact Google wasn't crawling deep into my site.
The funny thing, is that since I put the flat navigation in site maps Google indexed all those pages, and liked them so much I guess that now it is crawling deep in other areas. I just think there is too much to gain in doing it this way to simply shove it off as being unhelpful. I think this method is especially helpful for those with prs of 5 and below as it will vastly improve the page rank dilution of deep pages. It helped everything for me. A 7 times increase in traffic and 90k increase in pages indexed after only one week is certainly proof enough for me.
| 9:35 pm on May 29, 2006 (gmt 0)|
It's impossible to have a flat structure with 57k pages. You can't put 57k links on one page so what is there to talk about. You have to use a directory structure of some type. The problem people run into is making things only linear. Making it impossible to find a page except for a bot to crawl through five linear links is expecting a lot.
You have to give the bots multiple paths, you have to distribute your PR wisely, and you have to have a volume of links (good PR or not) to every page.
Google has rewarded well laid out structures for a long time, and now it is even more true. Pages with one single link to them at the end of five click daisy chai are going to have a hard time every getting found by the new weakbot.
| 5:19 am on May 30, 2006 (gmt 0)|
But on the one page I did put 10k links to content on one page and Google gobbled them up in a few days. I don't know how you can explain that, but it happened.
| 6:46 am on May 30, 2006 (gmt 0)|
"You can't put 57k links on one page so what is there to talk about."
Of course you can. Why the hell can't you other than it "might not be feasible" or "google on likes 100 links" blah.
Steve you are missing the point of this thread. And here it is.
For some of us google is only indexing down to a certain level seemingly based on PR. So a PR 5 site may only get up to 3 levels indexed (depending on certain factors only google knows about). In other words it does not matter if you link 300 pages or just one page off of level 3 it won't get indexed or stay indexed. Related articles and sections linking to each other does not matter also...they won't get indexed.
Now this being the case; HOW CAN GOOGLE SUCCESSFULLY DISTRIBUTE PR WHEN NOT ALL PAGES ARE INDEXED AND CONTRIBUTING TO THE DISTRIBUTION?
Keep in mind also that it may not be wise to keep bulk content on higher levels from higher pr pages as this may not feasible from a business/visitor perspective, structurally, and presentation wise.
Since there is a problem that those pages are not being indexed the TEST/THEORY here is to somehow raise up that deep content up to a level where it can get indexed yet not take away from user experience. One way is to flatten the structure to where the bots hit the deep content on the third level. This can be done creating a table of contents or a sitemap that is flattened so the bots access most content on a higher indexable level. I believe this is what tsm26 did.
Another way is to increase links to main pages and deep pages. But now they must be the "right" kind of links and all that jazz making this a fix that will take much time and effor. (Even though we have many thousands of incoming links throught our site we still have problems) This is something that can and should be addressed but what do you do in the mean time?
The idea of flatening the site a bit is to help speed up the process a bit and get those important deep content pages indexed to where they CAN help distribute internal PR like they should. For some sites this may be all that is needed as PR distribution can be restored. For others it may require this and finding better links to completely restore the site but at least you can get some pages back in the index immediately.
This is what this thread is all about and I don't see one bit of a problem in testing a "flattened site map".
| 7:18 am on May 30, 2006 (gmt 0)|
Just to reinforce arubicus' point. I have a few product links from the home page - these aren't getting indexed either and I believe this directory structure is the key.
They are actually only one click away while my other product pages are all two clicks away (first click to a category, then click to a product). However, because of my directory structure they look as if they are buried deep on the site.
OK, my fault I suppose. But when I developed this site a couple of years ago this indexing by structure was never an issue.
| 7:21 am on May 30, 2006 (gmt 0)|
The thing is. I am getting reasonable traffic to the site.
Problem being that Google is sending most people to the home page or perhaps a category page which is not much use to a person who wants the right info quickly.
| 7:39 am on May 30, 2006 (gmt 0)|
in the good old days when G had a busy bot I could get most things crawled within a few days even on lowish PR sites now its just taking forever despite having links from the index and they call this BD algo an advance crawling structure
I put some new pages up 4 weeks ago linked from the index (PR4) with several one way links from themed sites (friends) and that page took 3 weeks to get crawled and even now the links that are leaving the new page (3rd level) aren't getting touched
AND to cap it all my page count that for the last 4 weeks has slowly been going up has just taken another hit and dropped from 343 pages to 303 but I suppose thats better than the figure of 2 months ago 243!
| 7:44 am on May 30, 2006 (gmt 0)|
"For some of us google is only indexing down to a certain level seemingly based on PR."
Which is what I said. You can't put 57k links on one page and expect that to "work".
Directory structure level has zero to do with anything and never did and there is no reason to bring that into a discussion. Pages seven or eight folder levels down get indexed as easy an anything else. Googlebot follows links, it doesn't make address bar judgments.
"This can be done creating a table of contents or a sitemap that is flattened so the bots access most content on a higher indexable level."
And to beat the dead horse, this what Google Guy has been reccomending for years, and it goes back to waht I said. You need to give the bots different ways to a page, and you need to have sufficient PR and/or raw volume of links to pages you want crawled. One link from a PR5 page might do the trick, or 100 PR3 links.
Getting deep content to rank has always meant sacrificing some power of the top level pages. Now this becomes a higher priority since you aren't just striving to get deep contant ranked, but indexed at all. If pages aren't indexed, they can't help your top pages with anchor text or send their small amount of PR back, or send bots crawling back.
| 8:39 am on May 30, 2006 (gmt 0)|
"Which is what I said. You can't put 57k links on one page and expect that to "work"."
You are thinking in terms of PR distribution. My statement was about crawl priority in that sites under certain conditions with a pr of 5 will only get so many levels crawled. PR 4 gets 2 levels. All of my sites see this same effect. Now incoming links to deep content can change things a bit but since google has changed that aspect who knows. If other sites are suffering from this same occurance then deep content linked from deep content may not have much of an effect.
"Directory structure level has zero to do with anything and never did and there is no reason to bring that into a discussion. Pages seven or eight folder levels down get indexed as easy an anything else. Googlebot follows links, it doesn't make address bar judgments."
This is NOT about harddrive folder address bar / judgments! This is about how many levels off the index page your content is getting indexed no matter what the file name or file directory it is in. I can place all files under root or 40 directories down and link them all I want. What we are seeing is that google wants to index our pr 5 site down to 3 levels. This means 2 clicks off of home page. All links are crawled and indexed that are linked off of the home page. All links are crawled and idexed off the those pages. It won't index (rather index for very long) pages those 3rd level pages are linking to. Do you get what I am saying?
"And to beat the dead horse, this what Google Guy has been reccomending for years, and it goes back to waht I said. You need to give the bots different ways to a page, and you need to have sufficient PR and/or raw volume of links to pages you want crawled. One link from a PR5 page might do the trick, or 100 PR3 links."
This has never been the case. Why until now has there NEVER EVER been a problem indexing and keeping content on any of our sites an other people's sites? It never mattered if a page had 1 link or 5000 links. If it was linked somewhere on the site accessable to googlebot it would be crawled and indexed. But it isn't a problem of getting crawled necessarily it is more that pages at a certain level won't get indexed or they won't stay indexed. Hence dropped pages.
Our site is a pr of 5 should be sufficient plus thousands of incoming links to internal content should be more than sufficient to get 4 levels indexed.
The site map worked because the content got MOVED up 1 level in googlebots eyes. Now those links came from brand NEW site map pages with pr 0 so this looks as if this isn't necessarily directly PR distribution related but may be related to PR of the site (again 5) and crawl priorities based on that. To add to that the shear ammount of links diluting the PR distribution from those site maps shouldn't have all that much effect on PR anyway.
"If pages aren't indexed, they can't help your top pages with anchor text or send their small amount of PR back, or send bots crawling back. "
This is exactly my point. MAKING THE SITE FLAT THROUGH THE USE OF A LINKS PAGE DIRECTLY OFF THE HOME PAGE SO WE CAN GET THOSE DEEP PAGES INDEXED SO WE CAN THE PR OF THOSE PAGES ACCOUNTED FOR AND DISTRIBUTED BACK THROUGH THE SITE IS WHAT WE TRYING TO ACCOMPLISH!
Now if this were a PR distribution thing then please tell me how tsm26's HUGE link page(s) passed enough PR to get all of those pages indexed. The thing is the PR would get so diluted it wouldn't have hardly any effect. This makes me believe it has less to do with PR distribution and more to do with shear levels and priorities based on unkown factors.
| 9:36 am on May 30, 2006 (gmt 0)|
"You are thinking in terms of PR distribution."
Huh? No, I'm thinking of a page with 57,000 links on it. How can you say such a thing is possible?
"This is NOT about harddrive folder address bar / judgments!
That is what webdevfv is talking about, folder depth.
"This is about how many levels off the index page your content is getting indexed no matter what the file name or file directory it is in."
That is what I've been talking about. The issue involves the extreme importance of pagerank, and the importance of multiple paths to pages, and distributing pagerank (and volume of links) to those various paths.
"Do you get what I am saying?"
You seem to be stating the obvious now. You probably should go back and read the previous posts. Websites with a best page PR of a PR4 are in extremely dire straits. There is very little (though some) you can do to get maximum crawl value. Sites where the best PR page is a PR6 have a lot more to work with, including the ability to screw things up and only get 100 pages indexed instead of 10,000 or more.
"This has never been the case. Why until now has there NEVER EVER been a problem indexing and keeping content on any of our sites an other people's sites?"
You need to read more previous threads, but obviously the problem is more acute now. However, the basic solution remains EXACTLY the same... page rank, good structure, sacrifice some value of top pages to rescue a great volume of lower pages.
"It never mattered if a page had 1 link or 5000 links."
Of course it did. It just matters more now.
"If it was linked somewhere on the site accessable to googlebot it would be crawled and indexed."
That's just false. The IMdb or Amazon has always had many thousands of unindexed pages, despite many links to them.
"Now those links came from brand NEW site map pages with pr 0 so this looks as if this isn't necessarily directly PR distribution related"
No, pages created today have pagerank close to right away, whether the green bar shows it or not. Pagerank is pagerank, not the green bar. A new page linked from a PR8 page will instantly have links regularly crawled off it, because it is in reality instantly a PR7 or PR6 page, regardless of the green bar.
"KING THE SITE FLAT THROUGH...etc"
I've posted several times now about flattening and redistrubting PR, as well as the value of many links and crawl paths. And, I've pointed out this is the advice Google guy has been giving for years. But, you can't have a flat structure for a 57,000 page site. It's impossible. You can manipulate the structure one way or another to make it flatter or less flat, but flat is an impossibility.
There isn't something mysterious going on here. Google is just crawling weaker than before, by design. Single path sites were never optimum, but they are in much more trouble now. Big sites with low PR are in serious trouble, although better off with pages not indexed at all than with a high percentage of low reputation supplementals in the index.
| 9:45 am on May 30, 2006 (gmt 0)|
Even with very well indexed and crawled sites, the current algo seems to rank the home page or high-level category page more often, rather than the dead-on match for the search. At least that's what I'm seeing on quite a few sites. Very frustrating to see so much traffic coming in a click away from what they want. Feels like the exact match pages might as well have been dropped from the index sometimes.
These are mostly pages that don't change often -- which is another factor in the crawl pattern, I realize. Different domains most likely get profiled differently and crawled by different rules.
| 9:49 am on May 30, 2006 (gmt 0)|
If you see the homepage phenomenon, do a -uncommonword search where the uncommon word is on the homepage.
Set results to 100 per page, and it's fascinating to see a homepage rank 12th for a search, with no indented result, but then do a search with the -uncommon word and see two more appropraite results ranking 13th and 14th or whereever. I see this normally where individual pages have problems of some sort.
| 4:35 pm on May 30, 2006 (gmt 0)|
Steve you never answered my question
IF this was a PR distribution thing; how a links page as large as tsm26's passed enough PR to get the lower level pages indexed and ranked?
There is just more to it to be figured out.
| 4:39 pm on May 30, 2006 (gmt 0)|
I have stated the following: The most pages which have been thrown out at my page have an 301.
Url A -> dropped
Url B with 301 to Url A.
Url B ist very old and Supplemental in the index, but the Googlebot still visit this Url regularly.
First Url B, then Url A.
I still have 20% of my pages in the index and the Bot visit these regularly. The most of these pages have no old URL with 301.
I have changed my URL system in October 2005. Can it be that Google does not cope with that? Should I better send 410, instead of 301?
However, there also are a few counterexamples (Pages dropped out without 301, Pages in with 301).
I also did not have any problems on Bigdaddy by the end of February. (full indexed)
| 6:12 pm on May 30, 2006 (gmt 0)|
Steveb, I can't understand what you mean that a page with 57,000 links isn't possible. Of course it is possible. My page with 10k links on it is 733,506 kb, so a page with 57,000 links would be around 3.5 mb. That is a huge ridiculous page but it is certainly possible. If you can put a 3.5 mb photo on a page you can put up 3.5 mb of html. You are only really limited by the memory of the person's computer. Maybe you meant feasible?
What has happened in my experience is this change in algorithm has raised focus from individual content pages to overall site results. This makes it so more and more results for searches I do on Google return main pages and then I have to go search for the results lower. It also means that new content is hard to find. When I am looking up problems with recent software patches, mysql etc. I am now getting more and more old forum sites that are not recent but are the only ones who got single posts indexed because of their high main page PR. This lack of deep indexing is hurting both sides. It has ruined my ability to find many topics newer than a month old on Google except rss feeds from news organizations.
Your assertion that good internal linking structure only goes so far. You can take two different sites with PHPbb installed with the exact same options and linking depth and one will get the single posts indexed while the other won't because one site with a pr of 6 gets four levels indexed while the other only gets down to three. And since PR is just going crazy right now it is hard to figure out what to do. My blog which only has 12 incoming links according to msn and yahoo is the same pr as our site which has over 900 incoming external links, about 400 of which are quality non directory links.
All of this is combining IMO into a nightmare of an algorithm. Sites that have been shutdown for years are still showing up in the top ten for important searches and the new relevant content rich sites are getting deindexed down so you can't find what is needed. I used to be able to find my answers in forums for programming problems in a few minutes and on the first page of results. For the first time ever now I have had to go into the fourth and fifth pages and go into Google groups or other sites. In this age of Digg, Myspace, blogging etc, putting so much emphasis on age of site and links runs counter to what people want. I know they have an interesting dilemma, because you have to protect against fly by night spam sites, but this certainly isn't working.
| 8:54 pm on May 30, 2006 (gmt 0)|
"IF this was a PR distribution thing; how a links page as large as tsm26's passed enough PR to get the lower level pages indexed and ranked?"
Sorry, but I don't understand what you are trying to ask. What does one have to do with the other? It's like you are asking if this was a PR distribution thing why is the text black?
"Maybe you meant feasible?"
No I meant not possible. Google reads 101k. Making all 57,000 pages on your site have 57,000 links on them is both ridiculous and irrelevant.
| 9:28 pm on May 30, 2006 (gmt 0)|
"Sorry, but I don't understand what you are trying to ask. What does one have to do with the other? It's like you are asking if this was a PR distribution thing why is the text black?"
What I am asking is based on what you have been saying about PR distribution. If this problem is because of PR distribution how can a HUGE links page such as tsm26's pass enough pr to get those pages indexed. Why were those pages dropping out to begin with but are indexed and sticking now just because he flattened the structure. IMO there wouldn't be enough PR to pass into those deep content pages to make any more difference than going through the site with the normal directory structure.
To me this says there is a tad more going on here than just PR distribution.
| 10:09 pm on May 30, 2006 (gmt 0)|
"If this problem is because of PR distribution how can a HUGE links page such as tsm26's pass enough pr to get those pages indexed."
Again, how does PR distribution relate to this? You are asking a non-sequitor. Most obviously, a huge amount of links on the Yahoo main page is different than a huge amount of links on a PR0 page. Besides that, I can't figue out how you connect the two things.
"Why were those pages dropping out to begin with but are indexed and sticking now just because he flattened the structure."
Again two different things. Pages dropping out doesn't necesarily relate to them gettong back in, but again most obviously, someone put moe links to a page. How is that not an obvious reason why a page might not get crawled more often. As I've mentioned too many times now, flatten will often lead to crawling of more pages but probably less often.
"IMO there wouldn't be enough PR to pass into those deep content pages to make any more difference than going through the site with the normal directory structure."
PR is just one factor in crawling. I don't want to keep repeating the same thing, but in some cases one link from a PR5 page might accomplsih your goal while in another 100 links from PR3 pages might.
"To me this says there is a tad more going on here than just PR distribution"
Of course. It's beating a dead horse now, but besides what I've mentioned five times, there are other factors like domain reputation, bad neighborhood factors, duplicate content on a domain, (a very huge one) staticness of the content on a domain, etc etc etc.
If you want you pages crawled, make many paths to them, get more pagerank, get a good domain reputation, update the pages every week, get rid of duplicate content like non-www/www, make sure your key pathway (usually the main page) has no problems like /index.asp being indexed separately, and create bot-friendly easy to crawl sitemap pages (as many as you need).
Little has changed fundamentally, but getting things right is more important and more difficult. Googlebot used to be like Arnold Schwarznegger, now it is like Pee Wee Herman. You have to do a lot more to help it get your heavy work done. That's not to say it could do everything before, it just means however much it could do before, it is not as strong now.
| 10:27 pm on May 30, 2006 (gmt 0)|
This is wrong. G only SHOWS 101k in cache. Several experiments have shown that the bots read, index, and pass PR past 101k.
| 10:49 pm on May 30, 2006 (gmt 0)|
Stevenb we have no duplicate content, no canoninical problem, the pages referenced were indexed and crawled regularly for over two months then in one night disappeared. Raised it up a level and appeared in one day. Every piece of what you said at the end was worked on over the past year. Our domain is only a little over a year old, but can't do much about that. I just am suprised you can't seem to admit that maybe just maybe Google's algo is going a bit wacko and that *gasp* maybe all those phds made a few mistakes. In my opinion what they did is bad for everybody, and simple good old seo will not fix that. If you want to message me I can give you plenty of examples to show you what I mean.
| 10:55 pm on May 30, 2006 (gmt 0)|
"If you want you pages crawled, make many paths to them, get more pagerank, get a good domain reputation, update the pages every week, get rid of duplicate content like non-www/www, make sure your key pathway (usually the main page) has no problems like /index.asp being indexed separately, and create bot-friendly easy to crawl sitemap pages (as many as you need)."
AGAIN LISTEN UP THIS ISN'T about pages not being crawled. It is about them being crawled and added/not added/dropping.
"get more pagerank"
Getting more PR can be achieved by two ways: incoming links and creation of new pages which can vote how ever you please. WITH that being said the problem with pr is that a site which is NOT fully indexed gets much of the #2 pr creation ability taken away. At least it does not work to it's full potential. You are left to #1 which is a time consuming process. It is something to be addressed for sure just as internal voting which possibly a quicker fix. For the most part you want to take care of internal isssues first before moving to external issues. This is what we are working towards.
"make many paths to them"
Not a problem plenty of incoming links into the thousands plus a well done pyramid structure. No orphans plus alternative routes.
"update the pages every week"
Makes no difference from our experience.
"get rid of duplicate content like non-www/www"
Not an issue either
"make sure your key pathway (usually the main page) has no problems like /index.asp being indexed separately"
Not an issue. Taken care of long ago. Newer sites suffering from similar problems never once allowed such duplicates.
"create bot-friendly easy to crawl sitemap pages (as many as you need)"
Depends - if the indexing ONLY goes 3 levels deep your sitemaps MUST point to all content to make all level 3. Deep level 4-infinity must be placed in the site map to where all pages are accessed directly off of the site map page. That site map page must be directly off the home page. Now with a site with 50k pages tell me how you can cram all of that in and not break the 100 link barrier on your main page and on the site map. The thing is you can't. This is why tsm26 created huge site maps. This was little choice in the matter.
I believe you are forgetting what you wrote earlier:
"You have to give the bots multiple paths, you have to distribute your PR wisely, and you have to have a volume of links (good PR or not) to every page."
You have repeatedly mentioned distributing PR. I am saying this has less to do with distributing PR.
You said you must have volumes of links. This isn't a problem with google not "finding" links. Google crawls level 3 and it know what links are on those pages. It crawls level 4 and knows what lins are on those pages and what is linking to those pages. But the fact is that level 4 is not sticking regardless of incoming links or how many links are pointing to it. It is about what level in your site structure the page resides.
BTW - you never needed a volume of links. You just at least need one link so the page is not orphaned.
| 11:07 pm on May 30, 2006 (gmt 0)|
"AGAIN LISTEN UP THIS ISN'T about pages not being crawled. It is about them being crawled and added/not added/dropping."
Again you state the obvious. We know that. What are you not understanding? Pages have always dropped out of the index. (They have also gone URL only or supplemental... two things it appears Google may be doing less of now, dropping pages fast instead.)
"I am saying this has less to do with distributing PR."
Less, more, so what? You seem like you need only ONE answer. There isn't one. You have to do a lot of things, many of which are very important, some of which are less important, but all of which help.
"But the fact is that level 4 is not sticking regardless of incoming links or how many links are pointing to it"
Stop fixating on your own site and you'll be better off.
"You just at least need one link so the page is not orphaned."
Again, this is utter nonsense, and I don't want to go over this same ground again and again. Amazon and the imdb are examples of sites where "one link" wasn't enough. In fact, dozens of links weren't. One reason is duplicate content. All versions could be discarded because of dupes. But that is trivia at this point.
"I just am suprised you can't seem to admit that maybe just maybe Google's algo is going a bit wacko and that *gasp* maybe all those phds made a few mistakes."
Um, Google makes mistakes all the time, both screw ups and bad policies. What does that have to do with anything?
You guys just seem to want to complain instead of work on making things work. Good luck.
| 11:35 pm on May 30, 2006 (gmt 0)|
"What are you not understanding?"
What I don't understand that you keep reverting back to the crawling - here is your own words:
"If you want you pages crawled"
You KEEP repeating this and I have said repeatedly that this isn't NOT the problem. What part do you not understand?
"Stop fixating on your own site and you'll be better off."
Weak argument. Won't go there. Stop fixating on your ego a second ok.
"You guys just seem to want to complain instead of work on making things work. Good luck. "
Ummm until you arrived we were doing just that - making things work. So good luck and hope to see you in another thread. Again drop the da#n ego for a second and LISTEN to what we have said and a temporary solution that was presented.
"Um, Google makes mistakes all the time, both screw ups and bad policies. What does that have to do with anything?"
EVERYTHING. If they messed up and there is a possible temporary way around I'll take it until they fix it. The policies they make are what WE have to live by if we want their traffic. IT matters ALOT.
| 12:53 am on May 31, 2006 (gmt 0)|
I am amazed at that last statement -
"You guys just seem to want to complain instead of work on making things work. Good luck. "
That is exactly what we are doing, is making things work. I said I made a fix with bringing everything up and as of today I have 73k pages indexed up from 700. That sounds like making things work to me. My traffic went from 3k page views the days they were gone to 19k page views today and a ton in advertising revenue.
What I am saying is that it is awful I had to resort to that. They should be penalizing the too flat structure rather than rewarding it. I put a well organized pyramid structure in and what do I get. Deindexing. When I move to a flat one, they say "hey we will gobble up all 10k links on your one page no problem". How ridiculous is that.
What we are offering in this whole thread is a solution to help alleviate things in the near term by flattening out navigation. We are complaining so that in the longterm, Google, (if they are actually reading anymore), will reward lower pr sites (<=5) who have good navigation. Until then I will keep racking in the traffic from that "impossible" index page of 10k content links.
| 12:10 pm on Jun 1, 2006 (gmt 0)|
I'm going to preface this post by saying that some of the complaints I'm reading since the Big Daddy update really don't have any merit.
I've spot checked at least 30 sites since the BD update. A few of those from members here at WebmasterWorld who were kind enough to include either their URI or their email address which led me to sites that have very little ground to stand on.
What have I seen? Sites with very low PR (2-3) who have primary content at levels 2, 3 and even 4+ in some instances. After reviewing the architecture of those sites, I can clearly see why those pages are no longer a priority for crawling.
Sites with every known piece of metadata to mankind. While I personally don't think this would matter, if I were a search quality engineer (SQE), I'd have to take that into consideration when looking at all the different factors involved in determining the quality of a page.
Up until BD and even previous updates, Google would index just about anything. Just because Googlebot is indexing doesn't mean those pages are going to perform. In fact, Googlebot will index for months before pages start to appear for their targeted keyword phrases. That time factor between when a new page goes up and when it starts to pull it's weight is relative to the overall PageRank of the site (and other determining factors). Remember, Toolbar PR is for public consumption. There is more to that little green bar than meets the eye. ;)
What I'm seeing are sites that have unruly URI structures that Googlebot was indexing. Why and how it was able to index some of the URI structures I'm seeing remains to be questioned. I personally believe it was all part of developing the largest index of documents first and then going back and continually reindexing and purging based on the crawl criteria.
I'm seeing URI structures with multiple hyphens, spaces, tildes, you name it. There are some creative naming conventions taking place. A site with a URI structure like this can't expect to have a quality indexing, can it?
Especially when it has 1,000+ pages? Where's the hierarchy? Flat sites with a large number of pages may not perform as well as a structured hierarchical site.
Nor would one expect a PR3 site with a URI structure like this to have a quality indexing (note that the primary product pages are at the 4th level)...
How about this for an HTML Validation result. Can anyone explain what may happen with a page that has these errors?
Line 14, character 6:
^Error: element HEAD not allowed here; check which elements this element may be contained within
Line 18, character 7:
^Error: required attribute TYPE not specified
Line 71, character 34:
^Error: element BODY not allowed here; check which elements this element may be contained within
Line 261, character 7:
^Error: HEAD not finished but containing element ended
Line 261, character 7:
^Error: missing a required sub-element of HTML
After reading about this update and the fallout that has occurred, I'd have to say that some of us were lucky those pages were indexed and pulling results and we can be thankful that we enjoyed those positions for whatever time period.
Things have definitely changed. If what I'm seeing is an indication of things to come, we're going to be seeing a daily plethora of topics related to a continuing pattern of deindexing.
Google CEO admits, "We have a huge machine crisis"
Did you catch the above topic posted on 2006/05/03? Google is having a problem with storage. I could only assume that if storage is an issue, then they need to purge some of the data so that they can implement a solution. So, what data should they purge?
BidDaddy's crawl priority changed and this was one of the major effects of the update. It appears that part of that crawl priority entails the purging of documents (either temporarily or permanently) to allow their current systems some breathing room.
| This 200 message thread spans 7 pages: < < 200 ( 1 2 3 4  6 7 ) > > |