URL only in SERPS

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

URL only in SERPS

...and no cached pages either

jsaipe

12:04 pm on Jun 16, 2005 (gmt 0)

Google SERPs for my site often show only the title tag and not the description. And furthermore they don't show a cached page link in these cases either.

What does this mean?

It seems to be very random as some of these pages have good content.

SEOtop10

4:46 pm on Jun 16, 2005 (gmt 0)

Try Google Sitemaps. Search for webmasterworld + Google Sitemaps and you will find a lot of information.

I have got pretty good success with URL only sites and Google Sitemaps.

discrete298

5:14 pm on Jun 16, 2005 (gmt 0)

When a page comes up in the SERP as url only, does it mean that this page has been spidered by the bot but not indexed?

webdude

5:33 pm on Jun 16, 2005 (gmt 0)

Generally, yes, However, I have pages that took months to show the title and description. For me it seems very slow. Googlebot does a deep crawl every week or so, new pages are added as URL only, then the pages seem to get their title and description at a rate of one to three pages per week. Most of these are forum pages. The rate that these pages get added are about ten to twenty per day. What this means is that GBot will never catch up and have a full index of the site.

As for the order of the pages being included with title and description, there seems to be no rhyme (sp) or reason. Some of my pages that date back over a year still have no title or description, some new pages get picked up in a matter of days. I have pounded my head against the wall trying to figure this one out.

Go Figure...

AlexK

8:41 pm on Jun 16, 2005 (gmt 0)

webdude:

I have pages that took months to show the title and description ... pages seem to get their title and description at a rate of one to three pages per week ... I have pounded my head against the wall trying to figure this one out.

Cannot offer a definitive explanation, but can offer some evidence. See msg #:59 + 60 [webmasterworld.com] for the excruciating detail.

The tentative upshot is that a specific URL needs to be crawled 3 times before it will change from URL-only + no cache to Title + Description. The waters of this are muddied, however, by the existence of a sleuth-bot (identified in the referer string by

"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

) which operates under HTTP/1.1 and does not seem to count towards the total of hits, whereas the common police-bots (IDed by

"Googlebot/2.1; (+http://www.google.com/bot.html)

and HTTP/1.0) *do* count towards the 3-hits-and-you-are-in.

If accurate, this explanation would account for the apparent G bias towards long-established sites, the so-called "Sandbox", and the URL-only issue.

I have piqued mine own interest. The original research was possible because my former hosts had, by their behaviour, driven my site into the hands of a former colleague. I had the time whilst my server was uprooted to gather the info. That server + site transfer has literally just been completed, but I shall take a couple of hours to update the previous info from the logs, and report back.

jsaipe

10:05 am on Jun 17, 2005 (gmt 0)

That's interesting. I had convinced myself that Google was penalising me for something hence removing the description.

But as mentioned above, the more likely story is that these are new pages that have been added that in time will appear with descriptions et al....

AlexK

10:39 am on Jun 17, 2005 (gmt 0)

I shall take a couple of hours to update the previous info from the logs

Couple of hours, hmm...

The PRE code does not work properly on this forum, so the following is not so easy to read, but here is a synopsis of the results:

Site-Hits by the GBot, 17/Jun/2005:07:00 - 30/Jan/2005:04:02:26. 
(for 1st 20 results on site:my-site.com SERPs, May 17) 
..............................................................................*.......... 
.................................*........................*...................*.......... 
..................*....*.........*........................*...................*.......... 
........x....x....*....*....x....*........................*...................*.......... 
.......01...02...03...05...06...07...08...09...11...12...13...14...15...16...17...19...20 
13.Jun...........................G..............M....M....G....M..............M.......... 
06.Jun............G....MMG.......M....M....MM...M....M....M....M..............M....M....M 
30.May...........................G........................G...................G.......... 
23.May....................................................G...................G.......... 
16.May......................G.............................G.............................. 
09.May...........................G.............................M..............G.......... 
02.May..M....M....M....M..............M....M....M....M.........M....M....M....G....M....M 
25.Apr......................G....G........................G.............................. 
18.Apr......................M....G........................G...................G.......... 
. 
28.Feb...........................G........................G...................G.......... 
21.Feb......................G.........M.......................................M....M..... 
14.Feb..G....M..............G....G.........M....M.........G....M....M....M....G.........M 
07.Feb...........................G...................G....G.........M.........G.......... 
31.Jan......................G.......................................G.................... 
. 
Notes: 
[no available logs between 03/Mar/2005:09:25:10 and 17/Apr/2005:04:03:48]  
. 
**** = Title + desc May 17, now URL only 
 *** = Title + desc May 17 + Jun 17 
 ** = Title + desc Jun 17, prev url-only 
 x = not in first 100 results 
all others are still url-onl 
. 
G = 1 x visit from standard GBot: HTTP/1.0 Googlebot/2.1 (+http://www.google.com/bot.html) 
M = 1 x visit from Mozilla GBot: HTTP/1.1 "Mozilla/5.0 (compatible; Googlebot/2.1 
. 
URLs lost on hits #4, 10 + 18.

Currently, 16 of 1st 20 and 83% of 1st 100 SERPs are url-only.
On 17 May, 17 of 1st 20 and 87% of 1st 100 SERPs were url-only.

It only takes one hit from a standard GBot for the title + snippet to appear. However, that ignores the effects of the Mozilla GBot, the snippet-eater, the 3-times-a-second-roast-my-site GBot.

I now need to get back to my normal work.

molsen

5:09 pm on Jun 17, 2005 (gmt 0)

webdude:

As for the order of the pages being included with title and description, there seems to be no rhyme (sp) or reason. Some of my pages that date back over a year still have no title or description, some new pages get picked up in a matter of days. I have pounded my head against the wall trying to figure this one out.

I'm right there with you webdude. I think I dented the wall the other day.

The Google SiteMap so far has not shown much improvement for my site in our test. I put up a SiteMap with 1500 URLS, 500 each of: unindexed, partial indexed (URL only), and fully indexed pages. So far the main bot activity (98%) is on pages already fully indexed.
Regarless of this test, we will be rolling out a full SiteMap for our full site. Hopefully it will help since our main problem seems to be Google not crawling deep enough to find all of our forum style content.

webdude

5:56 pm on Jun 17, 2005 (gmt 0)

Yeah,

I would be interested if Google SiteMap will help in situations like this.

mcneely

9:05 pm on Jun 17, 2005 (gmt 0)

We've determined that in our case, when just the url shows up and nothing else: It was that the robots.txt was not allowing the Google bot(s) to parse or otherwise enter the site.

The url appeared in the SERPS, but that was all, as Google had no other information than just the url.

Once the bot(s) were allowed back in, the placement appeared, along with the coresponding decription and title.

I can't say that this would be your case, but, it was ours.

Hope this helped.

sailorjwd

12:03 am on Jun 18, 2005 (gmt 0)

3 days after submitting site map I got back 98% of my pages - had been missing or url-only about 60% since bourbon start.

wattsnew

12:17 am on Jun 18, 2005 (gmt 0)

Good to hear sailorjwd.

Sixty percent URL-only so got my map in yesterday (two more pages went URL only that day as well). Here's hoping. Let us know if your full listings "stick"!

webdude

1:28 pm on Jun 20, 2005 (gmt 0)

sailorjwd

I find that very interesting. I am going to give it a wack. While I have lost nothing in Bourbon, I have many pages that are URL only, about 50%. I would really like to get them listed with title and description. The bot is taking forever to include these pages in the index. It seems that Google SiteMap is taking the load off of the first googlebot that lists URLS. Could be that Google SiteMap is telling the second bot to crawl the links provided (the bot that usually adds title and desc.).

surfer67

3:25 pm on Jun 20, 2005 (gmt 0)

All my pages, (Approx. 3400), that are url only in the google search results, are being listed with title and description in the "supplemental" results.

Now if only google would include the supplemental in the main search.

g1smd

3:57 pm on Jun 20, 2005 (gmt 0)

Another reason for URL-only listings is duplicate content.

Google sees domain.com and www.domain.com as being two separate sites.

Set up a 301 redirect from non-www to www to solve this problem.

alexo

10:56 pm on Jun 20, 2005 (gmt 0)

to AlexK

can i ask u , from where is this report (u post on previous page).

is it any seo tool or its logs of any soft installed on ur server.

thx

AlexK

4:54 am on Jun 21, 2005 (gmt 0)

alexo:

from where is this report (u post on previous page).

It is (1) grepping the apache access-logs and (2) manual hard work.

Have a look at msg #:59+60 [webmasterworld.com] - I give actual examples of the commands used.

webdude

7:36 pm on Jun 22, 2005 (gmt 0)

Well, I tried to submit and got parsing errors. Even after using some of the on-line tools available. ...

The XML page cannot be displayed
Cannot view XML input using XSL style sheet. Please correct the error and then click the Refresh button, or try again later.

--------------------------------------------------------------------------------

A semi colon character was expected. Error processing resource 'file:///C:/Documents and Settings/Administrator/Desktop/sitemap.xml'. Line 142, Position 99

<loc>http://examplesite.com/MyState-Widget-Forum.taf?_function=detail&ForumMasterThreads_uid1=454&start=1</loc>
--------------------------------------------------------------------------------------------------^

There seems to be a problem at the = sign in "_uid1=454"

I have looked at this entry and I don't get it.

Needless to say, this is bobmging the G sitemaps. I would really like to figure it out. Any clues?

g1smd

7:53 pm on Jun 22, 2005 (gmt 0)

bobmging ?

webdude

8:09 pm on Jun 22, 2005 (gmt 0)

Sorry,

That was supposed to be bombing.

Any way, a word of caution on some of the on-line, auto generating programs for use in the G SiteMaps. It appears that XML cannot use the & in the <loc>. After much digging, I found that all & characters must be changed to &, otherwise it will error out the xml file. In other words, the URL must conform to RFC 2396 (http://www.ietf.org/rfc/rfc2396.txt).

It seems some of the on-line programs are not replacing these characters.

More Info....

Note: All data values, including URLs, in your Sitemap files must be XML-encoded. The chart below provides a list of characters with their corresponding encoded values. You can use either the entity or the character code to XML encode a character. Please see the FAQ for more information about XML encoding.
Entity Character Code
Ampersand & & &
Single Quote ' ' '
Double Quote " " "
Greater Than > > >
Less Than < < <

Anyway, I learned the hard way. We'll see if the new xml file works.

Reid

8:29 pm on Jun 23, 2005 (gmt 0)

when I update a file it will go URL-only for a few days and then re-appear with title and desc.

URL-only means:
google is aware of the URL but has not crawled it (or cant crawl it) period.

webdude

12:03 pm on Jun 24, 2005 (gmt 0)

Reid

Sorry to disagree just a a tad with you here. I have pages that are generated from a forum. There are hundreds of these pages, some listed with full title and description, some as just URL. In both cases, at least in my case, age of the page seems to have nothing to do with whether or not the title/description gets displayed. I have pages that have been URL only for the past 6 months. Every once in a while, one of those will go to title/description in the SERPs. Some pages get picked up right away and display title/description. Some go URL only for a couple of days then go to title/description. Some it takes weeks -- some it takes months.

What I am seeing on this particular site is that there seems to be no rhyme or reason to the way/how/why some pages get picked up with title/description while some don't. Nor does it make sense to me the time frame it takes to get these listed correctly.

Now I know some of you are going to say that this is the dupe content filter/penalty/yada yada that is causing this, and that may be true, but I am still confused as to the way/how/why. These pages all have different titles, descriptions, text and links on them. Of course they are templated, but why some and not others? There is a lot of valuable info there for the bot to see.

Anyway... I have successfully had the XML file downloaded and acknowledged by G yesterday and will see what happens. There seems to be more bot activity on the site right now. I'll wait a few days and see how it goes.

wattsnew

6:34 pm on Jun 24, 2005 (gmt 0)

Duplicate content is only one reason for URL-only listings. I had several completely unique pages go U-O in May including small internal directory pages, my sitemap.htm, product instructions....

I have had many pages re-indexed since submitting a sitemap, however their position in the SERPs is way down. Several remain URL only.

All pages dropped a point in Page Rank over the period as well, the site now has no "similar pages" shown in the SERP listings, backlinks are way down. Hurt, hurt! But not a simple diagnosis.

Panacea

10:33 am on Jun 25, 2005 (gmt 0)

Yes - I agree, I would try Google Sitemaps.

Reid

3:28 am on Jun 26, 2005 (gmt 0)

webdude - is it possible that everytime someone posts in the forum it causes that page to go URL=only because the content has changed?
I notice on my site when I change the content of a page it goes URL-only and then gets updated.

sit2510

5:06 am on Jun 27, 2005 (gmt 0)

>>> What I am seeing on this particular site is that there seems to be no rhyme or reason to the way/how/why some pages get picked up with title/description while some don't. Nor does it make sense to me the time frame it takes to get these listed correctly.

I tends to believe that this symptom is due to "inadequate PR or inbound links". Googlebot may appear to crawl every URL, but it indexes some portions with title/description. In my case of URL only, this happens when pages are too many level deep from Googlebot's entry points OR there are too many links on the hub page.

webdude

4:53 pm on Jun 27, 2005 (gmt 0)

Reid,

That's an interesting thought and correct too. The pages I am referring to do get updated a lot. Some are old pages that haven't seen updates in months though. That is the "no rhyme or reason" of my previous post. Others are very recent that show the title and description and are still being updated -- go figure.

As for SiteMaps, I did get a complete crawl and now this is very interesting. It picked up almost all of the pages with the title and description -- Yippie Skippie! -- but now, the other url-only pages are still there too. So now I have 2 links to every page for about half of my forum.

mmmmm, I wonder if this is going to trip a duplicate content penalty?

Suggestions?

AlexK

6:31 pm on Jun 27, 2005 (gmt 0)

webdude:

..SiteMaps, I did get a complete crawl ... picked up almost all of the pages with the title and description

Was the bot(s) that crawled HTTP/1.0 or HTTP/1.1? Most of my pages are URL-only; the latter bot scrapes my site constantly (23,225 so far this month) and seems to add nothing to the SERPs, whilst the former crawls in desultory fashion (1,438 in the same period) and appears to add all the title+desc pages.

webdude

6:43 pm on Jun 27, 2005 (gmt 0)

I am not capturing that data in my logs. A vast majority of the ip hitting the site is 66.249.65.208

AlexK

5:31 am on Jun 28, 2005 (gmt 0)

Do you get the referer string?

This is an edited sample from a recent access log of the 2 different beasts:

# fgrep 'GET /robots.txt' /var/log/httpd/access_log* � less
66.249.64.4 "GET /robots.txt HTTP/1.0" "Googlebot/2.1 (+http://www.google.com/bot.html)"
66.249.65.232 "GET /robots.txt HTTP/1.1" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

(the first is the 'good' bot, whilst the second is the HTTP/1.1 snippet-eater.)

A very quick sampling shows accesses from this latter on:

...but I cannot find any at all on IP:66.249.65.208.

This 46 message thread spans 2 pages: 46

URL only in SERPS

...and no cached pages either

jsaipe

SEOtop10

discrete298

webdude

AlexK

jsaipe

AlexK

molsen

webdude

mcneely

sailorjwd

wattsnew

webdude

surfer67

g1smd

alexo

AlexK

webdude

g1smd

webdude

Reid

webdude

wattsnew

Panacea

Reid

sit2510

webdude

AlexK

webdude

AlexK

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week