Forum Moderators: open
There are tons of links to the site, and I have painstakingly contacted webmasters from sites who listed the old url. They have updated their links, but it doesn't seem to matter.
Anyone have any ideas? I have tried submitting the site to DMOZ, but they have not listed it.
Thanks for any help. I am at my wit's end.
Good luck and welcome to Webmaster World.
The other departments have not had any trouble being indexed...just this one. Also, other search engine bots have spidered the site like crazy. I have tried to write to Google but have not had a response other than one pointing me to a page on what not to do.
I'm a great believer in (re)checking the obvious so here are a few obvious suggestions.
1. No hidden text (or text that Google might construe as hidden).
2. Simple HTML links rather than javascript links (that Google probably does not follow).
3. Regularly make small changes (or at least do so once after the bot visits). This may encourage the bots to look further.
4. I use frames without major problems but there have been many comments that Google does not like frames.
5. Check the headers. Keep it simple. Try an html validation tool.
6. Experiment with a few links at the bottom of the index page to a few test pages. See if you can get these pages into the Google index. Adjust such factors as file names (extensions, underscores and anything else you can think of) file sizes (some large and some small) etc.
Obviously, test pages are not desirable on a live site, but I think you will have to try it unless someone else has a better idea.
Kaled.
PS
Until the problem is sorted, I would definitely delete (or rename) the robots.txt file unless there is a good reason not to. There is a site you can use to validate this file but I don't have the url to hand.
Your robots.txt file is invalid - I'd remove it immediately.
Robots.txt validator: [searchengineworld.com...]
A Standard for Robots Exclusion: [robotstxt.org...]
Server Headers checker: [webmasterworld.com...]
Jim
Doing the search mechanical engineering queen's kingston put you on the first page, for example. It didn't look like many people at the university had linked to your site though. Also, I'd recommend more text links and static html and less mouseover-y javascripty stuff. Just my two cents from a quick look..
The general advice I'd give is to read Brett's 26-step program and try to add more content to your site--but that's good advice for any webmaster, too.
It's been three months since I was indexed (just the "/" and "robots.txt" eight times in all) and happily enough I have a PR 2 because of a few incoming links - still building there - but never anything more than the abovementioned content indexed.
I have used, as the W3C standards say I should, the 'noembed' tag (yes, that again) for non-flash browsers, and the same text on another page for those who DO have it - after writing google several times I still have no answer to my "is this penalized" question. I could also be getting a double whammy for duplicate content. I think I should yank it but have yet to hear otherwise.
I can't really afford to hire an SEO but I can't seem to get out of this jam. Eight visits, PR 2 and no spidering past the "/" - normal? I think I'm doing something wrong but can't see what.
(added) but I will read Brett's text again ...
(added added) yes, my site does seem 'google laughable' after reading all that again... but you should see some of the mails I've recieved about it - we're getting interviewed for it! . Talk about having my a** between two chairs...
I have a lot of angry profs blaming me for Google not listing their research.
I'm having it in a worse way even :(
I'm about to lose my job because of Google not listing all the site pages. Web site is the only way of hitting orders and it doesn't seem to work...
I'm at my wits' end as well and don't nkow what to think...
1)My site (look it up in the profile) is bilingual i.e. almost for every english page there's a russian counterpart with the same information in the other language. Sounds silly I know, but can Google really construe this as duplicate content?
2)I also got js navigation on top with normal href links to the site pages down below. What's strange - my russian pages got indexed alright - english pages didn't and that pisses off my management really strong as US market is our major objective.
3)the first href link that Gbot sees on home page (the page that was fed to Gbot at addurl.html) is link to russian home page that was followed and all russian pages got indexed. My question is: if Google followed it and indexed russian part why didn't it come back to index the rest?
4) My idea now is to arrange a site map page with all hrefs and "feed" it to Gbot in order to get all pages indexed. Has anybody practiced that before? Will that be penalized?
Plus there's still loads of junk when I do search for "site:www.mysite.com +www.mysite.+com" or "site:mysite.com –asdfsadf"
- a lot of urls from last site version yet didn't get deleted. (the site was redesigned 3 months ago)
Sorry to be such a bloody nuisance, GoogleGuy, but it would be just great if you replied to this.
GoogleGuy - I jumped for joy when I saw that you were lending your expertise and knowledge. Thanks!
Indeed, the Mech home page has finally been indexed, but the spider won't seem to explore further. Looking in the logs I see one visit, several times a month to the index page and to the robots.txt, and that's it. The home page doesn't have a whole lot of content, and does have the javascript-y stuff...but the rest of the site is packed with content. Also, the scripting on the home page hasn't seemed to have had the same detrimental effect on the faculty site, or any other department. I changed the page to straight HTML, loaded with content and left it up for more than a month, with no effect.
There are 38 pages indexed? Please note that anything beginning with "conn" is the old domain name, and is not valid. I have painstakingly contacted every site I could find with those older links and asked the webmasters to update their links, but some remain (to old pages that do not have pages on the new site). Also, sites that begin with "ferrari." or "sellens.", etc. are not part of the site, and are on different servers (though they are affiliated). Could this cause any problems?
Also...does it matter if one uses site:www.domain.com vs. site: www.domain.com (note the space). It seems to yield very different results in our case.
As for the robots file - the validator says it's okay. Thanks for the link to the checker. I do not use any session IDs, nor do I use frames or hidden text. I try to keep lots of regular html links, and I have validated the headers. Whew!
Something I forgot to note that might be important...we are using IIS 5, and the pages are ASP. Does that make a difference? When I compare the site to several others I have here that use the same template, I can't imagine why they get spidered like crazy, and this one doesn't. I am confident that it is not a code problem...but the SysAdmin and I have checked the server config, too.
Finally...how does one go about hiring a SEO? I'd be willing to invest in the help. There is one prof who has accused me of ruining his career because his research pages (which used to be listed on Google) fell off a year ago when I changed the site.
Thanks so much,
Crow
As far as the prof whose career you ruined: why not throw up a quick domain, like professordemento.com, and stick his stuff up there. Get a few links, and he'll be back in business.
Might I point out that the ODP is a valuable resource: individual professors' sites are often listable separately, if they have significant online content. The science categories are not high spam targets (except for really really stupid spammers), so the unreviewed ratio is often less than 1% (where 10% is average, and 100% is not unusual in spam magnets). This means submittals are highly appreciated, and get listed "fairly quickly" -- generally within a semester (to translate time units into a form recognizable in the Ivory Towers.)
As for ruining careers, just add a bit of hidden text on each page. But you already knew that if you've been following the BOFH technical files...
Crow_Song, javascript-y stuff isn't too harmful, but I don't remember seeing a site map. Site maps are your friend. If the domain has truly moved forever, then you should have a 301 in place from the only host to your new host--verify that is so. Someone else mentioned that individual pages can be suitable for garnering links. I noticed a few pages indexed because it looked as though they have their own links. You may want to include a static link on every page back to the root of your site.
But my main advice is still to get a few more links (e.g. from within the university; campus directory, etc.), and instead of the javascript-y mouseovers, go with static links without fragments (the '#' stuff I saw a lot while mousing over the root page).
There is a sitemap, however - in a straight text link at the bottom of the screen. And there are straight text links on all pages. The javascript-y stuff occurs on only a few pages: the home pages of each major section. But the thousands of content pages are all text (and some graphics of course) with plenty of straight links, etc. Ironically, it is only the javascript-using homepages that appear at all in the Google index. All of the "meat" of the site is ignored by the Googlebot, despite the links I have snuck into the frequently spidered sitemap of another (PR 7) site (the parent Faculty site).
GoogleGuy - is the old conn server the problem? If I talk the powers that be into giving us the domain so we can bring it back to life (it's been gone for about 8 months) and put redirects in place, will that help?
Cheers,
Crow
When designing a new version of an old website, would you recommend keeping the old version live while the new one gets Googled? This is not pretty, and it means that there will be some duplication of content, but it might save the odd career or two.
Perhaps Google could add some guidance regarding how best to handle transitions of this sort on the Webmaster Guidelines page.
Kaled.