Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Pages Dropping Out of Big Daddy Index

         

GoogleGuy

6:11 am on Apr 25, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Continued from: [webmasterworld.com...]


One thing to bear in mind is that Bigdaddy will have different crawl priorities. That can account for some of it. If you've run into any spam problems in the past, you might also want to do a reinclusion request. Otherwise, please send an email to bostonpubcon2006 at gmail.com with the subject line "crawlpages" (all one word), and I'll ask someone to see if they notice any commonalities.

old_expat

12:58 pm on May 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"I have been checking and the pages (approx 100) dropped for my site are ones which have a duplicate description which I used as a templete page for the order forms and although the content is different I didn't change the key words or the description tag. "

I found a lot of common meta tags (keywords and descriptions) that were either duplicated or incorrect or both .. (trying to hurry, I suppose) in a site that went from 100+ pages to 2!

I spent a good part of the day making corrections. Interesed to see what G will do with them.

[edited by: old_expat at 1:01 pm (utc) on May 3, 2006]

old_expat

1:00 pm on May 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"Also make sure that each page has only one URL that can reach it."

Can you clarify this?

Otter44

1:18 pm on May 3, 2006 (gmt 0)

10+ Year Member



"Also make sure that each page has only one URL that can reach it."

Regarding the above - we have a dynamic site, so I think depending on the product category you are browsing in, you might find the same page with a different URL string. What would you suggest?

Also, I believe our category pages can only be reached through 1 URL. If this is the case, would these pages be OK?

Thank you!

g1smd

1:26 pm on May 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you have both www and non-www showing then use a 301 redirect on one of them. Same if you have .com and .co.uk and .org, you should redirect all but one of them. Additionally, make sure that every page of the site has a unique title and unique meta description too. They should match the on-page content on the page that they are on.

.

Make sure that each page has only one URL that can get to it. For a popular forum package, vBulletin, by default it allows every post and thread to have at least 10 different URLs that can access it. For example, a post on a vBulletin forum could be expressed as:

/forum/showthread.php?t=54321
/forum/showthread.php?t=54321&p=22446688
/forum/showthread.php?t=54321&page=2
/forum/showthread.php?mode=hybrid&t=54321
/forum/showthread.php?p=22446688&mode=linear#post22446688
/forum/showthread.php?p=22446688&mode=threaded#post224466 88
/forum/showthread.php?t=34567&goto=nextnewest
/forum/showthread.php?t=87654&goto=nextoldest
/forum/showthread.php?goto=lastpost&t=54321
/forum/showpost.php?p=22446688
/forum/showpost.php?p=22446688&postcount=45
/forum/printthread.php?t=54321

and that is without introducing URLs that include the page parameter, for threads that are more than one page long, and the pp parameter for changing the default number of posts per page; either or both of which can be added to most of the URLs above too.

Another big problem is the "next" and "previous" links that cause massive duplicate content issues because they allow a thread like
/forum/showthread.php?t=54321 to be indexed as
/forum/showthread.php?t=34567&goto=nextnewest and as
/forum/showthread.php?t=87654&goto=nextoldest too.

Additionally if any of the three threads is bumped, the "next" and "previous" links that are indexed no longer point to the same thread, because they contain the thread number of the thread that they were ON (along with the goto parameter), not the real thread number of the thread that they actually pointed to.

This is a major programming error by the people that designed the forum software. The link should either contain the true thread number of the thread that it points to, or else clicking the "next" and "previous" links should go via a 301 redirect to a URL that includes the real true canonical thread number of the target thread.

The thread index pages showforum.php must confuse search engines greatly because the content on each numbered page changes every time a new thread is posted, or an old one is bumped. Page 1 should be the oldest page, not the newest.

There is no need for the single "showpost" pages to be indexed. Coming into one of those from the search results, showed a single post out of context with its thread, and the page had no navigation to reach any other place in the forum. Just get whole threads indexed.

The "newreply" and "newpost" pages take you straight to the "login" screen. They have no useful purpose to people arriving from a search engine. Get rid of those too. Either use the <meta name="robots" content="noindex"> tag or robots.txt file to do this. My preference is the meta tag.

The "member", "join", "login", "controlpanel" "editprofile" and "sendmessage" pages also ask for a login, and have no place in search engine listings. Get rid of those in the same way.

This stuff is specific to vBulletin but other packages like PHPbb and osCommerce and just about any dynamic script-generated site has these types of issues and they account for the majority of the indexing problems that these types of sites have.

.

Use these searches to see what you have indexed:

site:domain.com
site:domain.com -inurl:www
site:www.domain.com

F_Rose

1:29 pm on May 3, 2006 (gmt 0)

10+ Year Member



Please clarify:
"Make sure you have one url that can reach it"

A lot of our pages are linked within our site from multiple pages within our site.
For example:

The page could be linked from the home page
The same page can be linked from a product page and so on...
Is that a problem?

Our site was redesigned and launched in January 06, are we trapped in a redesign filter, if so is thier anything I can do about this?

g1smd

1:41 pm on May 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



See my post above, where I give an example of how a single page of content has 12 different URLs that can reach it.

That is what I am talking about. I am not talking about how many pages link to you - you can have as many incoming links as you like.

I am taking about a page of content having many different URLs that return "200 OK" and returning the same content for all of them. Google sees that as duplicate content and it can cause a lot of problems.

Clever usage of the noindex tag on some versions of the page (it is a scripted page, and the script can be set up to "detect" the exact URL that was requested when the page was displayed, and take appropriate action), or a disallow statement in the robots.txt file can clean these listings up quite easily.

F_Rose

2:12 pm on May 3, 2006 (gmt 0)

10+ Year Member



Thank you gs1md,

I got it..

Our site was redesigned and launched in January 06, are we trapped in a redesign filter, if so is thier anything I can do about this?

texasville

3:04 pm on May 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>>Our site was redesigned and launched in January 06, are we trapped in a redesign filter, if so is thier anything I can do about this? <<<<

You will eventually get back to normal. If I am reading this right, google is just cataloging history so to speak.
I really think that in the end your current state of your site will be what is indexed.
BUT! If I am reading this right...everyone would need to clean out their folders, your root, make sure that in a redesign, any pages you left in there, even tho they are no longer connected by a url, should be gone especially if they are close to being duplicate of what you are using now. All old unused pages should 404.

F_Rose

3:19 pm on May 3, 2006 (gmt 0)

10+ Year Member



"All old unused pages should 404."

All of our old pages are 404, so we are totally clean with that..

So I guess as of now I have to sit and wait in great suspense.. That is all left to do for us now..Unfurtunately...

darnoc

5:33 pm on May 3, 2006 (gmt 0)

10+ Year Member



g1smd: [Make sure that each page has only one URL that can get to it]

g1smd could you please clarify your point. Do you mean that ALL sites that legitimately use multiple URL's to reach the same content are now penalised by Google?

Example: Our site has 20,000 real estate listings in a dynamic database. Visitors can search through the listings using a number of different criteria (e.g. all apartments with at least 2 bedrooms in a certain town). They can also search for all apartments with at least 3 bedrooms in the same town. In this example both sets of results will include all apartments with 2 bedrooms.

Does this mean that Google will not include any of these pages in the index?

tedster

5:41 pm on May 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No, not penalized, but it does mean that Google must decide which of the urls to return in the SERPs for various searches - and that decision is automated and may not be the decision you would wish. It also can mean, in cases where many urls are involved, that Google gets tangled up and doesn't return any of them at all.

Most of the time, one or two urls for the same "deep" content doesn't seem to affect things badly. But if the situation is such that many urls will all resolve (or potentially an infinite number of urls) then things can get dark indeed.

What is your current experience?

g1smd

5:51 pm on May 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If one particular apartment page can be got to through a URL like this: www.yoursite.com/search?size=2&location=florida&item=45045
and through a slightly different URL like:
www.yoursite.com/search?size=3&location=florida&item=45045
then, yes, you have multiple URLs for the same content, and that is "duplicate content".

The fix is to decide on one canonical format for each listing, and use <meta name="robots" content="noindex"> on all other pages of the site that have a different URL format.

Check the example about vBulletin forums, posted above, for guidance.

darnoc

7:02 pm on May 3, 2006 (gmt 0)

10+ Year Member



Thanks for your responses.

Specific property searches generate a page of thumbnails for a list of appropriate properties. Each thumbnail has a link to a specific property page. This property page is a static html page.

By the way, Pre BD we had 200,000 indexes generated from 20,000 pages. However, now NONE of these 20,000 static property description pages are now included in the google index.

Apart from a small percentage of properties that may have a similar description we do not have duplicate content.

hk995

7:28 pm on May 3, 2006 (gmt 0)

10+ Year Member



if i have 2 pages with 3 subjacts but on one page the contents are A,B,C and on the other page, the contents are C,A,B or C,B,A , is that stil "duplicate content" thanks.

g1smd

7:47 pm on May 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It might be. Don't risk it.

Elect one to appear in the SERPs and "noindex" all the others.

vanessa19

8:05 pm on May 3, 2006 (gmt 0)

10+ Year Member



"I have been checking and the pages (approx 100) dropped for my site are ones which have a duplicate description which I used as a templete page for the order forms and although the content is different I didn't change the key words or the description tag. "

Apart from this problem I have also wondered if approx 2 months ago I took out Addsense on my main site (against professional advise) only on the links page, but about that time my site suddenly dropped from it's usual position. It is impossible to say if this is because of a BD problem or just a Coincidence with the addsense. I have now removed the addsense and going through the dropped pages for duplicate descriptions and key words. Hopefully by tackling both problems my site will return to its 5 year old position.

Freedom

8:11 pm on May 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think it's too early to make changes to one's site based on yet another Google SNAFU. OUt of general principal, I'm not changing anything because I've done nothing wrong - template based pages or not. I get enough from MSN/Yahoo/ASK to survive.

Lorel

8:35 pm on May 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Re the site I mentioned earlier that had lost it's main pages (they have since returned) - but now it's loosing other less important pages at a rate of 9 per day). This site was previously owned by someone else 2 years ago (although nothing was changed on the site for a year other than name of owner).

I redesigned the site in mid/late Feb of this year and uploaded the new pages as they were finished about 10-20 per day hoping to excape a redesign filter. It's been ok last few months.

This site was designed mostly for aesthetic reasons and getting rid of code bloat, along with fixing title and descriptions so they were unique for each page. I installed full urls on all nav links and base href tags to help prevent hijacking. Very few file names were changed and if so set up with 301s in htaccess. The keyword rank for main pages is still high. However lots of inner pages have all dropped to PR 0 and low Keyword rank (all pages are on same level and there are no sub directories on this site although they not all listed on the home page).

I suspect the drop in PR and SERPs of these pages is because Google is going through the site and checking things out and once it's done they will return like the other pages cause the site is a lot more search engine friendly than it was before.

vanessa19

8:38 pm on May 3, 2006 (gmt 0)

10+ Year Member



"I think it's too early to make changes to one's site based on yet another Google SNAFU. OUt of general principal, I'm not changing anything because I've done nothing wrong - template based pages or not. I get enough from MSN/Yahoo/ASK to survive. "

I would like to say the same but as 95% of my hits are from google and my site has dropped from no.1 to nothing for my main search term I am trying to pin point the problem without having changed anything major on my site for the last 4 years, but obviously something is not quite right.!

dentedmind

9:01 pm on May 3, 2006 (gmt 0)

10+ Year Member



I don't believe that "recent design changes" make a difference. An old site 4yrs - no changes...dropped from 20 pages to 2. A recent major update to a different site..went from 43 pages to just 12. All within the last couple weeks. No recovery at all of any of the pages...yet. Most of all my traffic is from g also. Or should I say...was. But, it is interesting to note that the site with 12 pages left, were the "original 12" that was indexed for years without changes.

Jane_Doe

9:39 pm on May 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It looks to me like they are doing "triage" and trying to delete pages that are the weakest and least likely to be useful, and the variations on algos we see are the different tests on how to best accomplish that.

I suspect pages deemed most useful in general are getting kept and weaker pages are all potential candidates for getting the ax, i.e. - inner pages, especially those with duplicate content, low page rank, not a lot of content, no deep links, too many links on the page, no links to the site for the topic of the page, deep links but only from supplemental pages which may now be deleted, no fresh links, limited links from trust rank sites, etc.

Swanson

2:24 am on May 4, 2006 (gmt 0)

10+ Year Member



It does seem that way.

You have to ask the question, why delete? Just return the most relevant to the query.

Oh! If we are to believe the stories they have run out of disk space so they do need to delete!

Baffers

12:53 pm on May 4, 2006 (gmt 0)

10+ Year Member



boy am I glad I found this topic... I thought I was screwed and google was not indexing my site at all...

In my case googlebot visits often enough but only visits the main page and then upon search gives me pages that havnt been existing for 3 years...

What I want to know since I am new to SEO stuff is the following:

If my site has a unique topic for every page but the keywords and description remain the same then will google see it as a duplicate?

the site theme is very specific so the same theme is carried thrue the whole site. Does this mean I have to become REALLY invetive with my description's and keywords?

Baffs

mattg3

1:18 pm on May 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Interestingly enough my PR6 site is steadily gaining pages while the PR5 ex PR 6 is bouncing up and down.

McClaw

1:40 pm on May 4, 2006 (gmt 0)

10+ Year Member



The property site I manage has recently got back two of its lost pages.

Both of these are second level dynamic pages with one querystring element and both have one "deep link" from fairly high PR pages on another site.

trinorthlighting

1:41 pm on May 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



All due to the servers running out of room more than likely.

iProgram

2:07 pm on May 4, 2006 (gmt 0)

10+ Year Member



Seems they couldn't solve this problem immediately and today I have to make the final decision to replace the current google API with some other comapny's service. MS API looks better than Y! because Y! uses some strange XML standard which is not supportted by PHP. Currenly 6 results for an inhouse forum search engine is so stupid.

g1smd

10:13 pm on May 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Having the same meta description on multiple pages is one form of "duplicate content". Fix it as soon as possible, but be aware that it might take Google a couple of months to pick up on it and rectify your listings. I am looking at a site that I helped to fix about 6 weeks ago, and Google still hasn't reindexed the new descriptions. Thankfully there are no supplemental pages.

hvacdirect

10:23 pm on May 4, 2006 (gmt 0)

10+ Year Member



Oddly enough, I just started a new domain. Threw up a few pages as a skeleton for the structure. I haven't even decided which direction I am going with the site. No links to it (since there is nothing much to link to) I certainly didn't submit it, and googlebot has been there everyday since two days after I registered it. Today I tried the site command, and the homepage is already in the index. So they either picked up through their spybar or registration watching.

On existing sites that lost their pages the bot is crawling 5 to 10 times the amount of pages a day that are listed in the index, but no new pages being added.

dcre8r

11:44 pm on May 4, 2006 (gmt 0)



Having the same problem here with a twist. All of our 1000+ pages have been dropped but now all we have are 3 pages that are all the same:

1) www.mysite.com/
2) www.mysite.com/default.asp
3) www.mysite.com/Default.asp
4) www.mysite.com/default.asp?
5) www.mysite.com/default.htm <-- dead since 2004

Think its duplicate content...

This 254 message thread spans 9 pages: 254