Forum Moderators: Robert Charlton & goodroi
Working on a site with 20k + pages indexed in google.
Performing a site:www.xyz.com - 4 pages come up, the rest are in the "omitted section". Over 20,000 unique pages, many static, not orphaned, original content pages.
Trying to figure out the puzzle by looking at the source code. I noticed that a non existant HTML tag is present on all pages - the </META> tag. Meta tags don't close. Could this be tripping up Googlebot?
Meta tags don't close. Could this be tripping up Googlebot?
Meta tags do close in XHTML so be careful. If you are using XHTML markup, you don't want to remove those </meta> tags. They must be on the same line as the opening tag or a validator will throw errors.
Performing a site:www.xyz.com - 4 pages come up, the rest are in the "omitted section". Over 20,000 unique pages, many static, not orphaned, original content pages.
Penalized site maybe? How new is the site? If its an older site and you see 19,996 supplemental listings, then there may be penalty issues. There are so many factors at play here it is hard to really say.
Google truncates the listing if it sees duplicate content: having the same meta description on every page is enough to trigger it.
HTML: <META name="Author" content="Dave Raggett">
XHTML: <META name="Author" content="Dave Raggett" />
Most likely this:
<META name="Author" content="Dave Raggett"
would only affect the next tag. I would be surprised if Google would take much notice, especially if the pages display correctly in the browser.
[edited by: Patrick_Taylor at 10:45 pm (utc) on Dec. 20, 2005]
<META name="Author" content="Dave Raggett" />
Last time I checked, Brett's SIM Spider [searchengineworld.com] had some problems with that closing tag.
<added> Just checked about a minute ago and the SIM Spider still does not recognize the meta description when using " />. When closed with </meta>, it sees the meta description.
1. Removed all on page css and javascript in the header and called it as a remote file.
2. Moved any bulk navigation code below the unique content using the rowspan technique.
3. Created images for any repetitive content such as addresses etc.
4. Removed sitewide "comment tags" caused by using templates or othewise.
5. Removed calls to offsite tracking codes.
6. Randomly changed the names of "common" images within the site.
7. Changed the navigation based upon the category of the page.
8. Removed the "alts".
And so on.
These techniques were used on non-scrapper sites that were having problems. Success rate was about 75%.
Performing a site:www.xyz.com - 4 pages come up, the rest are in the "omitted section"
Good move. Highly recommended for many reasons.
>> 2. Moved any bulk navigation code below the unique content using the rowspan technique. <<
Might be useful. Using includes for this code can make your site admin a tad easier too (no SEO effect).
>> 3. Created images for any repetitive content such as addresses etc. <<
I really can't see Google penalising for a repeated address. Anyway you need that detail in text on at least a few pages of the site otherwise you'll never rank for your own address.
>> 4. Removed sitewide "comment tags" caused by using templates or othewise. <<
Comment tags are ignored for indexing and ranking. They might be scanned as an indicator of spam. If you only had a few tags per page then I see no difference in removing them; especially if they make the developers work more difficult.
>> 5. Removed calls to offsite tracking codes. <<
I can't see what that would achieve, at all.
>> 6. Randomly changed the names of "common" images within the site. <<
Why? Extra work for no gain, and maybe some loss. If you refer to a file that has a particular size but it appears on your site under 50 different filenames, Google might think you were trying to pull some stunt using keywords as file names. Bad move. By using the same filename for the file on every page, as most people already do, the file is cached and saves you bandwidth: the file is then served once per visitor, not once per page view. This random-name change is a backwards step. I wouldn't do it that way at all. You're clutching at straws here.
>> 7. Changed the navigation based upon the category of the page. <<
Sounds fair enough. If the new system is useful to visitors and bots alike, then no complaints.
>> 8. Removed the "alts".
An image tag is NOT valid HTML if the alt is missing. The minimum allowed is alt="". The alt text is used for accessibility reasons. Put them back, only adding text for important content images and for navigation images. For other images use the minimal alt="" instead.