Forum Moderators: Robert Charlton & goodroi
The basic premise of the site is users upload an image of their car, all images are of cars. Images are then shown one at a time, on the main page. Clicking on the image, shows you another random image from the pool of images.
There is little, but some value to each page from a SERPS perspective. Under each car is a description of the car.
As far as I can tell, google is only hitting http://example.com./index.php
Each time an image of car is clicked, it just reloads http://example.com./index.php and a new random image is shown, along with description. So, I can see why google does not want to crawl the entire site, thousands of pages, one for each image.
I think, though I would like confirmation, that I can make google crawl with a simple change to the url string, such as http://example.com./index.php?image=x where x is some random string. These seems a little shady to me though.
What is the best practice here, should I take the approach that google indexing many thousands of pages with nothing more than an image and a short description on it as being mostly irrelevant to a search user? Or should I consider it relevant enough and use the url arguments as a way to get google to crawl?
Also, I would think it is valuable to get the images into google image search. I am not sure how google pulls these. In looking over the sitemap.xml spec a bit, I did not see a provision for telling it about images, can someone point me to it? I would think many thousands of images would in fact be valuable to google image search.
Finally, as the site moves forward, I am learning that many people do not want to take the time to enter in a description of their car. This leaves nothing more than an image on the page. In those cases, what is best practice? Still try to get google to crawl those pages as well? And in regards to the images on those pages, I can give them an alt tag of something generic like "car" which while less valuable to image search, still seems valuable.
Thanks, looking for the best way to go here.
* On launch of the site, I had empty meta keys and description. Google found the site and picked up on the copy at the top of the page, which said "Welcome to the cars site". This is a terrible line to have in the SERPS as a description. I have since added meta keys and description, and I see in my stats google has crawled, but the SERPS page still lists the stale data. How long until I can expect that to change?
You're also loooking at the other issues in a useful way, but they will need a robust solution in order to differentiate the pages enough to get them in the Google index. You could start by forcing the user to type in a descripton of their image of some decent length - possibly asking for several descriptive fields for the make, model, year, color, why I love my car, and so on. "Thin pages" just won't get included.
You could also use those various fields to piece together an "alt" tag that is not an exact duplicate of the page text. You want to create a situation where the title tag, description, on-pagre text and alt tag show variation within each page, and are unique from one page to another.
Moving on from url indexing to image indexing - how to get into Image Search is not well known. Google does automated discovery of images, and their algo for inclusion is a bit mysterious. Giving enough detail on the page for Google's algo to decide on some keyword relevance is a help.
I'd say set up your Webmaster Tools account to see what's going on. Sign up for Google's "enhanced image search" (under the "Tools" section), and consider submitting an xml sitemap. Then watch for Google's feedback in the "Diagnostics" section.
What in the world are you talking about? I wrote the site myself, it is not a CMS in any way, and does exactly what it is supposed to. What do I need to elaborate on so I can make my post more clear as to what my questions are?
Your site architecture is horrible [for seo].
Each time an image of car is clicked, it just reloads http://example.com./index.php and a new random image is shown, along with description. So, I can see why google does not want to crawl the entire site, thousands of pages, one for each image.
Googlebot doesn't get tired. It would love to index thousands of pages of useful/unique content on a site. However you do not have thousands of pages, you reload 1 single page (index.php) thousands of times.
I think, though I would like confirmation, that I can make google crawl with a simple change to the url string, such as http://example.com./index.php?image=x where x is some random string. These seems a little shady to me though.
Finally, as the site moves forward, I am learning that many people do not want to take the time to enter in a description of their car. This leaves nothing more than an image on the page. In those cases, what is best practice? Still try to get google to crawl those pages as well? And in regards to the images on those pages, I can give them an alt tag of something generic like "car" which while less valuable to image search, still seems valuable.
The reason I said to use a [better] CMS is because with a CMS like Drupal you could replicate your entire site in about 2 days and it would have a lot more functionality than it currently does. CMSs are extremely powerful out of the box nowadays.
While I have thought of using mod_rewrite to make better urls, in the end, I would have to force users to add a description, which takes away from the simple in and out nature of the site. I could then randomly pull from a pool of works, and invent the description, and in the eyes of SE's, I am sure it would fool them, since there is no way to relate text to an image to validate it's accuracy. I do not want to do that. If I fall short on the SEO side, that is the nature of the site, I do not want to pollute the SE's just to get ranked. If the site is popular, that should happen naturally.
While I have used Drupal and even WordPress when applicable, in the case of this site, it would be more work. It is a few hundred lines of php, the most being an image uploader that watermarks the images. CMS's are nice, in this case, they are just not a great fit.
Thanks for the help though.