Forum Moderators: open

Message Too Old, No Replies

Getting into Google image search

         

ZopeMaven

3:48 am on May 23, 2004 (gmt 0)

10+ Year Member



Hi all,

I've created a new site, it is basically a large photo archive, and pages have started showing up in Google (at least for highly targeted phrases, such as the domain name).

However, none of the images on the site are showing up in Google image search for those same phrases.

A few possibilities occured to me:

1) Google image search simply uses a different crawler on a separate schedule., and will get around to my site eventually.

2) Images with dynamic URLs (ie somedomain.com/images/image1?size=medium ) are ignored by Google image search, and will never be indexed.

3) I could be misunderstanding how Google image search associates terms with images (I'm assuming that an image is associated with the terms the indexer picks up from the page), and it actually works in some other way (such as being dependent on alt-text, which my images don't have yet).

There could be other reasons, of course, but rather than continue to guess, I figured I'd pose the question here: Why have my images not shown up on Google image search yet? Are *any* of my hypotheses correct?

TIA.

nakulgoyal

5:47 am on May 23, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There are couple of things that make images available via Google Image Search. PM me your URl and let me see.

ZopeMaven

5:14 pm on May 23, 2004 (gmt 0)

10+ Year Member



Ok, I've sent the information via StickyMail (twice, second time with more details). However, I'm not sure it went through, as it is not showing up in my 'Sent' folder.

Macguru

5:31 pm on May 23, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Welcome to the board ZopeMaven,

>>Google image search simply uses a different crawler on a separate schedule., and will get around to my site eventually.

You are correct. Look for googlebot-image user-agent in your logs. It has a separate schedule, wich explain there is a lot of 404 errors in the image search.

>>Images with dynamic URLs (ie somedomain.com/images/image1?size=medium ) are ignored by Google image search, and will never be indexed.

Those links have 2 query strings in it. This can be a barrier. Can you afford 'clean' links to images?

>>I could be misunderstanding how Google image search associates terms with images (I'm assuming that an image is associated with the terms the indexer picks up from the page), and it actually works in some other way (such as being dependent on alt-text, which my images don't have yet).

Here are sure bets :

image file name
page title
alt text
text proximity

ZopeMaven

6:24 pm on May 23, 2004 (gmt 0)

10+ Year Member



> You are correct. Look for googlebot-image user-agent in your logs. It has a separate schedule, wich explain there is a lot of 404 errors in the image search.

Any idea what that schedule is?

> Those links have 2 query strings in it.

Two? I only count one name/value pair.

> This can be a barrier. Can you afford 'clean' links to images?

Not very easily. Is this a deal-breaker? The site uses a searchable image database, and URLs are of the form:

- 'somesite.com/archive/187/view' for the HTML page

- 'somesite.com/archive/187?display=small' for the image actually included on the page

- 'somesite.com/archive/187' for the high resolution original file

Links to the small, medium, and large displays and to the original file are all on the page for the image.

If you can describe how the URLs should ideally look to minimize problems, I can try to modify the next version of the software accordingly.

> Here are sure bets :
>
> image file name

Unless this is a deal-breaker, I can't satisfy this requirement. Images will likely always have an identifier based on a number generated automatically by the system when they are added.

> page title

Yes, the pages for individual images have titles. I'll likely be adding titles to various search result pages as well (as these are already being indexed by Google).

> alt text

Not yet. I have two options here. I can include the title as alt text, or I can use the description, both of which are on the page.

The page title has the form 'Photo detail page for Joe Schmoe' generated from the <H1>.

The <H1> on the page says 'Joe Schmoe'.

The page has a longer description on it like 'Joe Schmoe at the May 1995 whozit meeting'.

So, should the alt text read something like 'Photo of Joe Schmoe at the May 1995 whozit meeting', or 'Photo of Joe Schmoe'? What do you think?

> text proximity

I'll have to think about that. Right now, the image is on the page inside a Div, followed by another div containing (in order) a link to a preview pop-up, a list of available versions (size and format) of the image, and a table of metadata about the image: copyright, associated event, person names, and a description.

Not all of these metadata fields will be there for every image, so I have to balance site usability with indexing relevance.

Thank you for the information, is there anything else I should be doing?

Macguru

8:47 pm on May 23, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi ZopeMaven,

First thing let me confess I never optimised for an image database. I just noticed a few things on spare time. So, I dont have all the answers.

>>Any idea what that schedule is?

Typically 2 or 3 months late. I got some sites crawled only once by the image bot, even if the regular bots picked new content. Some static sites (no new content added) gets visited by the image bot on a regular basis. I can't see any reason why.

>>Two? I only count one name/value pair.

My bad, sorry.

I remember seeing 2 kind of pages in Google image search. Pages calling images by source

<img src="gizmo.jpg"> and with plain HTML links to images <a href="gizmo-large.jpg">

I cant rember seeing a page with a 'dynamic' link to an image listed, but I could be wrong. I would try to keep image filenames and alt attributes short. Filenames are very important for good ranking.

>>Thank you for the information, is there anything else I should be doing?

If your pop-ups are in JS, I would wrap them in a <a> tag.

<a href="page.htm" onClick="window.open('page.htm','','scrollbars=XX,resizable=XX,
menubar=XX,toolbar=XX');return false;"
target="_blank">Enlarge</a>

You can also browse trought [google.com] this site for more.

ZopeMaven

5:29 am on May 24, 2004 (gmt 0)

10+ Year Member



>>Any idea what that schedule is?

> Typically 2 or 3 months late. I got some sites crawled only once by the image bot, even if the regular bots picked new content. Some static sites (no new content added) gets visited by the image bot on a regular basis. I can't see any reason why.

Well, *that's* annoying. Can GoogleGuy comment?

> I cant rember seeing a page with a 'dynamic' link to an image listed, but I could be wrong. I would try to keep image filenames and alt attributes short. Filenames are very important for good ranking.

I think you're right. I did a three-term search in Google that has a single result on another site of mine. This page has been in existence for almost two years, includes an image with a URL in a similar format, and Google image search returns 0 results for the same query.

Rather than blindly experimenting with URLs etc., does anyone know with any confidence what the specific criteria are for getting into the Google Image Search index?

> If your pop-ups are in JS, I would wrap them in a <a> tag.

I'm already doing this, but it isn't really necessary, as the images also have 'normal' links to them (for download purposes).

Thanks for the info!

newsphinx

9:43 am on May 24, 2004 (gmt 0)

10+ Year Member



Any idea what that schedule is?

According to my experience, the update cycle of Google image is much longer, 6 months perhaps.

ZopeMaven

5:09 pm on May 24, 2004 (gmt 0)

10+ Year Member



> According to my experience, the update cycle of Google image is much longer, 6 months perhaps.

Wow, that's a pretty slow update cycle. Do you think it is beacause Google has little incentive to crawl more often than that since image searches don't show AdWords, or some other reason?

Still, the two-year-old page I referenced above *should* have shown up in the image search by now, so it is likely the 'dynamic image URL' problem.

mark aardsma

7:29 pm on May 24, 2004 (gmt 0)

10+ Year Member



Hello,

I often set up a combination of an apache RewriteRule directive and some mods to the script so the images (or pages, etc) can be accessed with static-type URLs.

instead of:

somedomain.com/images/image1?size=medium

perhaps:

somedomain.com/images/image1medium.jpg

or even better

somedomain.com/name-of-this-image-1size.jpg

as in

somedomain.com/red-Ford-focus-1m.jpg

Makes search engines very happy.

Mark

ZopeMaven

7:53 pm on May 24, 2004 (gmt 0)

10+ Year Member



Well, here is a URL scheme that I think I can create without too much trouble:

- HTML page about the image (with title & description):
somesite.com/images/187/view

- Original (full size) file:
somesite.com/images/187/original

- Version included on the page:
somesite.com/images/187/small

- Other versions (medium, large, etc.) avaialable at:
somesite.com/images/187/size

Would this work? Or are the suffixes (.jpg, etc.) really necessary?

mark aardsma

8:07 pm on May 24, 2004 (gmt 0)

10+ Year Member



Hello,

That would essentially work but without the advantage of descriptive words in the image URLs, and it would make the images look a lot deeper in the site with all those slashes.

I'm not sure if the file extensions matter to Google, or if the Content-type in the HTTP headers is the only thing they look at.

If I were doing it I would put those file extensions on so that the URLs would be indistinguishable from true static URLs. It would look to the outside world like you actually had JPEG and GIF files by those names on your site. Basically a better safe than sorry argument, since I don't know exactly how Google will treat it otherwise.

But in any case, getting rid of the?'s is a step in the right direction.

Mark

ZopeMaven

2:49 am on May 25, 2004 (gmt 0)

10+ Year Member



I'd rather leave off the extensions if I can get away wityh it, as a particular view might be a JPG, a GIF, a PNG, or something else. There is no guaranty that an image size will remain in the same format, and I'd rather not break links or serve up a content-type that didn't match the extension.

One way around this might be to do a redirect to the correct URL (ie. from .jpg to .gif), but that seems pretty ugly, and I'd rather not.