Forum Moderators: Robert Charlton & goodroi
I found a lot of common meta tags (keywords and descriptions) that were either duplicated or incorrect or both .. (trying to hurry, I suppose) in a site that went from 100+ pages to 2!
I spent a good part of the day making corrections. Interesed to see what G will do with them.
[edited by: old_expat at 1:01 pm (utc) on May 3, 2006]
Regarding the above - we have a dynamic site, so I think depending on the product category you are browsing in, you might find the same page with a different URL string. What would you suggest?
Also, I believe our category pages can only be reached through 1 URL. If this is the case, would these pages be OK?
Thank you!
.
Make sure that each page has only one URL that can get to it. For a popular forum package, vBulletin, by default it allows every post and thread to have at least 10 different URLs that can access it. For example, a post on a vBulletin forum could be expressed as:
/forum/showthread.php?t=54321
/forum/showthread.php?t=54321&p=22446688
/forum/showthread.php?t=54321&page=2
/forum/showthread.php?mode=hybrid&t=54321
/forum/showthread.php?p=22446688&mode=linear#post22446688
/forum/showthread.php?p=22446688&mode=threaded#post224466 88
/forum/showthread.php?t=34567&goto=nextnewest
/forum/showthread.php?t=87654&goto=nextoldest
/forum/showthread.php?goto=lastpost&t=54321
/forum/showpost.php?p=22446688
/forum/showpost.php?p=22446688&postcount=45
/forum/printthread.php?t=54321
and that is without introducing URLs that include the page parameter, for threads that are more than one page long, and the pp parameter for changing the default number of posts per page; either or both of which can be added to most of the URLs above too.
Another big problem is the "next" and "previous" links that cause massive duplicate content issues because they allow a thread like
/forum/showthread.php?t=54321 to be indexed as
/forum/showthread.php?t=34567&goto=nextnewest and as
/forum/showthread.php?t=87654&goto=nextoldest too.
Additionally if any of the three threads is bumped, the "next" and "previous" links that are indexed no longer point to the same thread, because they contain the thread number of the thread that they were ON (along with the goto parameter), not the real thread number of the thread that they actually pointed to.
This is a major programming error by the people that designed the forum software. The link should either contain the true thread number of the thread that it points to, or else clicking the "next" and "previous" links should go via a 301 redirect to a URL that includes the real true canonical thread number of the target thread.
The thread index pages showforum.php must confuse search engines greatly because the content on each numbered page changes every time a new thread is posted, or an old one is bumped. Page 1 should be the oldest page, not the newest.
There is no need for the single "showpost" pages to be indexed. Coming into one of those from the search results, showed a single post out of context with its thread, and the page had no navigation to reach any other place in the forum. Just get whole threads indexed.
The "newreply" and "newpost" pages take you straight to the "login" screen. They have no useful purpose to people arriving from a search engine. Get rid of those too. Either use the <meta name="robots" content="noindex"> tag or robots.txt file to do this. My preference is the meta tag.
The "member", "join", "login", "controlpanel" "editprofile" and "sendmessage" pages also ask for a login, and have no place in search engine listings. Get rid of those in the same way.
This stuff is specific to vBulletin but other packages like PHPbb and osCommerce and just about any dynamic script-generated site has these types of issues and they account for the majority of the indexing problems that these types of sites have.
.
Use these searches to see what you have indexed:
site:domain.com
site:domain.com -inurl:www
site:www.domain.com
A lot of our pages are linked within our site from multiple pages within our site.
For example:
The page could be linked from the home page
The same page can be linked from a product page and so on...
Is that a problem?
Our site was redesigned and launched in January 06, are we trapped in a redesign filter, if so is thier anything I can do about this?
That is what I am talking about. I am not talking about how many pages link to you - you can have as many incoming links as you like.
I am taking about a page of content having many different URLs that return "200 OK" and returning the same content for all of them. Google sees that as duplicate content and it can cause a lot of problems.
Clever usage of the noindex tag on some versions of the page (it is a scripted page, and the script can be set up to "detect" the exact URL that was requested when the page was displayed, and take appropriate action), or a disallow statement in the robots.txt file can clean these listings up quite easily.
You will eventually get back to normal. If I am reading this right, google is just cataloging history so to speak.
I really think that in the end your current state of your site will be what is indexed.
BUT! If I am reading this right...everyone would need to clean out their folders, your root, make sure that in a redesign, any pages you left in there, even tho they are no longer connected by a url, should be gone especially if they are close to being duplicate of what you are using now. All old unused pages should 404.
g1smd could you please clarify your point. Do you mean that ALL sites that legitimately use multiple URL's to reach the same content are now penalised by Google?
Example: Our site has 20,000 real estate listings in a dynamic database. Visitors can search through the listings using a number of different criteria (e.g. all apartments with at least 2 bedrooms in a certain town). They can also search for all apartments with at least 3 bedrooms in the same town. In this example both sets of results will include all apartments with 2 bedrooms.
Does this mean that Google will not include any of these pages in the index?
Most of the time, one or two urls for the same "deep" content doesn't seem to affect things badly. But if the situation is such that many urls will all resolve (or potentially an infinite number of urls) then things can get dark indeed.
What is your current experience?
The fix is to decide on one canonical format for each listing, and use <meta name="robots" content="noindex"> on all other pages of the site that have a different URL format.
Check the example about vBulletin forums, posted above, for guidance.
Specific property searches generate a page of thumbnails for a list of appropriate properties. Each thumbnail has a link to a specific property page. This property page is a static html page.
By the way, Pre BD we had 200,000 indexes generated from 20,000 pages. However, now NONE of these 20,000 static property description pages are now included in the google index.
Apart from a small percentage of properties that may have a similar description we do not have duplicate content.
Apart from this problem I have also wondered if approx 2 months ago I took out Addsense on my main site (against professional advise) only on the links page, but about that time my site suddenly dropped from it's usual position. It is impossible to say if this is because of a BD problem or just a Coincidence with the addsense. I have now removed the addsense and going through the dropped pages for duplicate descriptions and key words. Hopefully by tackling both problems my site will return to its 5 year old position.
I redesigned the site in mid/late Feb of this year and uploaded the new pages as they were finished about 10-20 per day hoping to excape a redesign filter. It's been ok last few months.
This site was designed mostly for aesthetic reasons and getting rid of code bloat, along with fixing title and descriptions so they were unique for each page. I installed full urls on all nav links and base href tags to help prevent hijacking. Very few file names were changed and if so set up with 301s in htaccess. The keyword rank for main pages is still high. However lots of inner pages have all dropped to PR 0 and low Keyword rank (all pages are on same level and there are no sub directories on this site although they not all listed on the home page).
I suspect the drop in PR and SERPs of these pages is because Google is going through the site and checking things out and once it's done they will return like the other pages cause the site is a lot more search engine friendly than it was before.
I would like to say the same but as 95% of my hits are from google and my site has dropped from no.1 to nothing for my main search term I am trying to pin point the problem without having changed anything major on my site for the last 4 years, but obviously something is not quite right.!
I suspect pages deemed most useful in general are getting kept and weaker pages are all potential candidates for getting the ax, i.e. - inner pages, especially those with duplicate content, low page rank, not a lot of content, no deep links, too many links on the page, no links to the site for the topic of the page, deep links but only from supplemental pages which may now be deleted, no fresh links, limited links from trust rank sites, etc.
In my case googlebot visits often enough but only visits the main page and then upon search gives me pages that havnt been existing for 3 years...
What I want to know since I am new to SEO stuff is the following:
If my site has a unique topic for every page but the keywords and description remain the same then will google see it as a duplicate?
the site theme is very specific so the same theme is carried thrue the whole site. Does this mean I have to become REALLY invetive with my description's and keywords?
Baffs
On existing sites that lost their pages the bot is crawling 5 to 10 times the amount of pages a day that are listed in the index, but no new pages being added.
1) www.mysite.com/
2) www.mysite.com/default.asp
3) www.mysite.com/Default.asp
4) www.mysite.com/default.asp?
5) www.mysite.com/default.htm <-- dead since 2004
Think its duplicate content...