Forum Moderators: open
example - [keywords*a*b*c*.com...]
When the site was first conceived, I built a template to work from and uploaded roughly 200 pages with various names, but with the same content. This was so the menu would not point to non-existent pages and return 404 errors. Over the course of the last 6 weeks, I have been replacing each of these pages with content. My site was submitted to Google during the first few days and was deep crawled 2 1/2 to 3 weeks ago. There is still no sign of my site on Google and the Google toolbar still shows a PR that is grayed out. Is Google somehow perceiving that I was spamming them with roughly 150 similar pages? All the temporary pages were tagged to expire and set to no-cache so they would not be returned in the results of a search engine query.
<meta http-equiv="expires" content="0">
<meta http-equiv="pragma" content="no-cache">
The updated pages have the following header with the expire and no-cache tags removed.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>*****</title>
<meta name="description" content="*****">
<meta name="keywords" content="*****">
<meta name="abstract" content="*****">
<meta name="robots" content="index,follow">
<meta name="distribution" content="global">
<meta name="revisit-after" content="5 days">
<meta name="copyright" content="© 2002 *****">
<meta http-equiv="content-type" content="text/html; charset=iso-8859-1">
<meta http-equiv="window-target" content="_top">
<link rel="stylesheet" type="text/css" href="style.css">
</head>
Despite the revisit tag, the Googlebot has not returned since the deep crawl. These are some of the results from my logs after the deep crawl.
Host: 216.239.46.13 Url: /*****.html Http Code : 200
Date: Nov 11 07:50:28 Http Version: HTTP/1.0 Size in Bytes: 4700
Referer: - Agent: Googlebot/2.1 (+http://www.googlebot.com/bot.html)
Host: 216.239.46.27 Url: /*****.html Http Code : 200
Date: Nov 11 07:44:03 Http Version: HTTP/1.0 Size in Bytes: 4700
Referer: - Agent: Googlebot/2.1 (+http://www.googlebot.com/bot.html)
Host: 216.239.46.146 Url: /*****.html Http Code : 200
Date: Nov 11 07:35:43 Http Version: HTTP/1.0 Size in Bytes: 4700
Referer: - Agent: Googlebot/2.1 (+http://www.googlebot.com/bot.html)
Host: 216.239.46.166 Url: /*****.html Http Code : 200
Date: Nov 11 07:31:21 Http Version: HTTP/1.0 Size in Bytes: 4700
Referer: - Agent: Googlebot/2.1 (+http://www.googlebot.com/bot.html)
Host: 216.239.46.20 Url: /*****.html Http Code : 200
Date: Nov 11 07:29:13 Http Version: HTTP/1.0 Size in Bytes: 4700
Referer: - Agent: Googlebot/2.1 (+http://www.googlebot.com/bot.html)
This appears to have been the main Googlebot and not the Freshbot as I seen mentioned on these threads. Am I wrong?
As my site is still relatively new and still incomplete, I have not yet asked for any reciprocal links, however all pages have a link back to the homepage and certain key pages. From what I have read, links within my site should help in my PR although outside links are of greater benefit.
Another thought is that Google may be perceiving that I am putting white text on a white background. On IE, the blue background graphics file displays correctly and the text is completely legible. I was checking my site on a friend's Mac and noticed that the background graphics did not display properly. In that case, it was showing white on white and appeared invisible. This invisible text is only comprised of about 10 words at the bottom of the page (my copyright info). What are the chances Google is penalizing me for this?
Should I be concerned that my site is still nowhere to be found after 3 weeks of being crawled?
BTW, what does the acronym serps stand for?
Thanks in advance for any suggestions or ideas.
nativenewyorker
1; Have incoming links from a page already listed on Google. Links are extremely important with Google.
2: Have genuine content on your pages.
3. Don't do anything that may be interpreted as cheating!
It seems that your not being indexed derives from not being fully aware of these facts.
One thing you mentioned was having 150 similar pages. That, I would never do. If they are very similar then may be spamming with them. Make pages unique with a target keyword phrase for each page or unique material.
I also always make it easiest for Google to find things using only the following. Especially not needed to ask them to
<meta name="robots" content="index,follow">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>*****</title>
<meta name="description" content="*****">
<meta name="keywords" content="*****">
</head>
</html>
The 150 temporary template pages were all very basic. Basically showed my logo, background, javascript menu link, and some empty tables where content was to go. I did not fill up the temporary pages with nonsense such that it could in anyway be perceived as spam. Would it be preferable to delete all these temporary pages and take the chance of driving away visitors with 404 errors? At this point I am down to 88 temporary pages, and have 138 with varying amounts of content.
I do have a robots.txt file, but thought that it was on the safe side to have the meta robots tag in place as a backup.
I guess I'll spend some time tonite tweaking the keywords on each page so that the remaining 88 temporary pages are not all identical other than the filename.
Thanks,
Ted
You are best to have the unique text on each page and not have any dummy pages, like bobmark mentioned.
Anyway, if you revisit your pages, make sure that it's the text of the page you are changing to make more unique. As you know Google isn't interested in the meta keyword and discription. I always use them just in case, but Google isn't evaluating those because they have been used to much for spamming.
You are supposed to be safe if you use the robot txt file to tell Google not to go to pages. But, I have heard some reports that they don't pay attention to them. But, they say they do so I don't really know whom to trust to be sure. For myself, I'd probably risk the 404's. But make sure you have a unique page there with a link back to your site so you don't probably will be able to keep them.
Given that the pages have almost no content and are mostly tables and scripts pointing to menus, do you really think that Google would penalize me for these pages?
I had originally tagged these pages as expired and no-cache so that Google would recognize that these pages are under construction. What reason would I have to tag them as such if I was really a spammer?
I'll add those temporary pages to my robots.txt file as an extra precaution, in addition to making some text changes to highlight what is to come on each page.
Thanks for the input guys. It is really appreciated.
Ted
If there is a clear reason for doing what you are going, if a human from Google looked at them, you would be fine. I would just wonder a bit about the algorithm evaluating all of them. Final analysis, have to keep experimenting and see what happens. :-)
I'd just keep changing those pages and getting them clean and crisp, saying what you are about as a site. Then see what happens after two or three more crawls.
Good luck.
I think you're attributing too much human (or artificial) intelligence to the spidering process and it's attendant evaluations of spam/mirror techniques. You have an algo which is essence says "if these 2 pages are 95% (90? 85?) the same according to my page comparison algo, then they are mirrors/spam."
People use nocache for a variety of reasons and it's not clear if Google pays much attention to it; certainly they cache "nocache" pages routinely.