Forum Moderators: open

Message Too Old, No Replies

11000 Pages Indexed as Links With No Content

         

brakkar

11:44 pm on Nov 15, 2004 (gmt 0)

10+ Year Member



Hi,
I brought back up online a site that was taken off line 2 years ago.

It was a forum with about 11000 topics.

So when I brought back the site a couple of weeks ago, I've setup a good site map with all the topics.

Google spidered all the links: when you do a search on my domain you find the 11000 links.... but only the links. No title, no content... as if all the links were in a disalowed directory which is of course not the case.

Now what? Should I expect a real indexing of all this content during the next full crawl? Or is this definitely a bad sign that all these links will never get updated?

Thanks in advance,
Brakkar

annej

3:26 pm on Nov 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've had that happen when the spider just happened to hit my site when it was down. It should be corrected next time Google spiders.

The inbound links are still there so Google knows the site is there but has no information on the page.

I once had a site down for about a month and the homepage still remained #1 on the keyword phrase for it's main topic. Hmm, that is a good example of how much more important inbound links are compared with on page factors.

zeus

3:34 pm on Nov 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



mostly when a new site get only there are many url only with out description, but this month is a whole new story many well know old sites have the same problem.

uncle_bob

4:38 pm on Nov 18, 2004 (gmt 0)

10+ Year Member



I'd give it time, if Google knows the pages are there, it will eventually get round to indexing them properly.

zeus

8:16 pm on Nov 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I also noticed that the site count can change everytime you click the search and some has fallen big time and some have more then is on there site, so something just dont work right.

Seo1

3:30 am on Nov 19, 2004 (gmt 0)

10+ Year Member



I once had a site down for about a month and the homepage still remained #1 on the keyword phrase for it's main topic. Hmm, that is a good example of how much more important inbound links are compared with on page factors

Actually this is a result of Google only indexing and updating the WWW once a month. Your website is stored in Googles database. If your site falls off line that has no effect on Googles database.

Now try taking that site down for just two weeks, and see what happens since Google moved to continual indexing.

Clint

giga

5:29 am on Nov 19, 2004 (gmt 0)

10+ Year Member



example of how much more important inbound links are compared with on page factors
Actually this is a result of Google only indexing and updating the WWW once a month. Your website is stored in Googles database. If your site falls off line that has no effect on Googles database.

Oh my god not you polluting another topic..

brakkar please go here for the discussion on a disappearing site. We are all experiencing similar problems as you've described recently. Also sitemaps and new sites seem to be the target of this new penalty(?). Also thusfar.. all indications are that this penalty/ban is permanent.

[webmasterworld.com...]

Seo1

12:10 pm on Nov 19, 2004 (gmt 0)

10+ Year Member



Hey Giga

Still havng issues I see. Your beginning to be on the edge of harrassing me.

This statement is so wrong its amazing you are allowed to spread misinformation as you do.

Your comment:
Also sitemaps and new sites seem to be the target of this new penalty(?)

This is from Google. Notice the second bullet explaining to build a site map.

Now if Google suggests building a site map do you THINK they would ban for building one?

Design and Content Guidelines:

* Make a site with a clear hierarchy and text links. Every page should be reachable from at least one static text link.
* Offer a site map to your users with links that point to the important parts of your site. If the site map is larger than 100 or so links, you may want to break the site map into separate pages.

Brakkar don't let people give you misinformation like was done by giga. Tearing apart a site because someone has not done their research is never a good thing.

Clint

fighter

4:05 pm on Nov 23, 2004 (gmt 0)

10+ Year Member



My advice for you is just to wait a little bit before you do something. Most probably what has happened is that when the Google bot crawled your website might have just cached the index of you website or the site map if you have one, it hasn’t actually spidered your website content. I believe that the bot will hit your site again to deep crawl it and everything should come to its place, should be just a matter of time. If that doesn’t happen however then there is something that is preventing the bot from indexing the content of your website.

Hugene

7:57 pm on Nov 23, 2004 (gmt 0)

10+ Year Member



I have noticed that new pages or sites sometimes first apear as only a link in the G results, with no content, description or title. Eventually though, the page is crawled and properly index. Seems like G might have a 2 stage process.

I have qanother similar question though. When I search site : www.widget.com in G I get tons of garbage results in the shape of

http: // www.widget.com/ www.widget.com/index . html

(without the spaces obviously, I just dont want the parser to think that this is an url)

These links are totaly wrong. It reminds me of the effect of forgetting the http in a A tag. Does anyone knwo what this is all about

my3cents

12:11 am on Nov 24, 2004 (gmt 0)

10+ Year Member



I don't know why the urls are strange like that, I have a similar problem and have had it for nearly 5 months now.

My site is vanilla html, static, handcoded. The index is showing all kinds of crazy urls for the same page of my site, url's that don't exist, like:

www.mysite.com.asp (I have no asp pages at all)
www.mysite.com/ index.shtml
www.mysite.com/?searchengine=3j4hbr2i3uh45f9234f2jb3noifub235098ufvb09ufed
www.mysite.com/?someothersearchengine=gi3h4t9gh394tghv3904uithgv0934ht
www.mysite.com/?businessdirectory=v02489vurh023u4hrbv03u4hbrv032u4hrvmysite/
mysite.com/%1F
www.mysite.com%22

not to mention several listing that include tracking urls from different ppc engines.

My home page, a PR6 has not had a title or description since all of this started months ago, but some of the strange urls and tracking urls have full title and description. The urls with spaces seem to actually point to the correct url, just the green line displayed in the serps has spaces in it.

We asked G last week in Vegas about this and were told to 301 the incorrect urls.

The problem with that is, a 301 does not forward or get rid of tracking urls or snippets that is being added to the end of the url.

To make things worse if G has an incorrect url for my site and it leads to a 404 page, it indexes the content of the 404 page with a title "404 Not Found" and a snipet "the requested url was not found on this server"

I would like to add that no other search engine in the world is doing this, this is specific to Google.

Seo1

12:46 am on Nov 24, 2004 (gmt 0)

10+ Year Member



Hi

Write your webhoste

Have your webhost turn off symbolic links

This should elimante your problem

I would also do a whois. on your site and see how many other sites share your server.

The other explanation is you bought a domain name that had been owned previously and drawing traffic as well as having been indexed by google.

Remember those results that are displayed are pulled from their database of indexed pages.

If a site drops off line this has no effect on Googles database other than the pages inside it will not be updated.

Clint

zeus

1:11 am on Nov 24, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Wow whats happening I see searches for big keywords, where the count of sites found droped to old count or 1/3 of what was 2 days ago and lets take my site, there is only 95 pages left of 2500 2 weeks ago, but this time there IS description with the URL, so this could be a change.

Ok now we are back to 24mill results from 5mill 5 min ago and the page count for the site is now 600 wich it has had for 2 days, so it could look like they are making some changes again.

Hugene

1:40 am on Nov 24, 2004 (gmt 0)

10+ Year Member



Have your webhost turn off symbolic links

what are symbolic links?

Seo1

2:10 am on Nov 24, 2004 (gmt 0)

10+ Year Member



Hi

Symlinks and Hardlinks explained.

Unix/Linux files consist of two parts: the data part and the filename part

The data part is associated with something called an 'inode'. The inode carries the map of where the data is, and the permissions, etc for the data.

The filename part carries a name and an associated inode number.

More than one filename can reference the same inode number; these files are said to be 'hard linked' together.

With hard links you can remove the original file you hard-link to and still have the data, whereas with softlinks, if you remove the original file, you are removing the data and the softlink which remains, points to nothing.

Hard links must be on the same partition whereas soft links can span across partitions and networks for that matter.

For My3cents

I am not sure this will help you, if you did indeed purchase a previously owned domain name I doubt that it will. If your domain name was never owned before this step should help.

Clint

my3cents

2:48 am on Nov 24, 2004 (gmt 0)

10+ Year Member



Thanks for the suggestions, the domain has never been owned before, has been a top performer in google for about 3 years and is on a dedicated server and dedicated IP.

Q. If I have these symbolic links turned off, will the tracking urls for the ppcs still work, what about dynamic links from other directories and search engines. LookSmart and Business.com dynamic links are being shown instead of my real urls. I have looksmart and business.com remove the tracking portion of the links 3 or 4 months ago, but they are still in the index with full title and description. No PR, no ranking and the real urls have lost all positioning. On top of that, every month I see a whole new slew of "dynamic" type urls pointing to my site, usually from a search engine or directory.

I would imagine I am getting a duplicate content penalty since the same page is in the index 5-10 times and this is happening to most of my main category pages. The pages one level below that are mostly suplimental links now.

crobb305

3:25 am on Nov 24, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



my3cents,

I am having the exact same problem. My index page indexed 10 times, each with a slightly difft/incorrect url. I emailed Google about it today. You are correct in that no other search engine seems to have this problem.

C

Seo1

3:44 am on Nov 24, 2004 (gmt 0)

10+ Year Member



my3cents

Turning off symlinks will only effect your site and not any inbound links.

As far as being outranked by other directories and search engines I believe this is a matter of current relevant content as opposed to a duplicate content penalty.

Duplicate content penalties would be for page content and not for head tags.

I think you have one of two options.

1. Update content per page and wait two weeks to be reindexed and relisted.

2. Change your head tags per page to beat the search engines and directries. Give two weeks.

My two week rule and current relevant content thoughts comes from an article I wrote based on Google moving to continual indexing.

On November 10th I released a press release for a client.

Keywords used were americas diamonds and diamond brokers.

As of November 20th the press release ranked # 3 for americas diamonds and # 5 for diamond brokers on googles front page ;->

Hope this helps

Clint

londoh

11:18 pm on Nov 25, 2004 (gmt 0)

10+ Year Member



Write your webhoste

Have your webhost turn off symbolic links
This should elimante your problem


Symlinks and Hardlinks explained.
Unix/Linux files consist of two parts: the data part and the filename part

I don't think my webhost wants to turn symlinks and hardlinks off.

And I dont understand what they have to do with the url that apache dishes up so that only google can misinterpret them.

please explain some more

DerekH

11:26 pm on Nov 25, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Unix/Linux files consist of two parts: the data part and the filename part

I don't understand...
What has the operating system of the server and its quirks got to do with what is served up to an Internet client?
DerekH



Continued over here:
[webmasterworld.com...]

[edited by: Brett_Tabke at 11:57 pm (utc) on Nov. 25, 2004]