Forum Moderators: phranque

Message Too Old, No Replies

Accidental image uri leads to a significant traffic surge, but why?

         

JS_Harris

2:07 am on May 16, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In my haste I accidentally omitted the .jpg file ending on an image thumbnail.

ie: www.example.com/images/some-picture instead of www.example.com/images/some-picture.jpg

The mistake was live for roughly 36 hours and the thumbnail appeared on the index page and a category page only. The image resolved, strangely, in both firefox and IE but search engines went mad (4x more attempts to load this file than any other page on the site in that time span) trying to find a page called "images/some-picture/", notice the ending backslash.

It's a standard wordpress site however it has a custom canonical fix in place that adds a trailing slash to category pages... which apparently added the backslash to the image file since it had no ending.

I'm a bit puzzled as to why:
1) the image resolved without a file type (.jpg)
2) search engines tried to load the page /some-picture/ over 3800 times in 36 hours, getting a 404 each time.
3) I can't think of a way to make this knowledge beneficial to the site somehow.

Any ideas ?

edit: analytics and WMT dodn't report the surge or any 404 errors, server logs and another tracking service do.

[edited by: JS_Harris at 2:13 am (utc) on May 16, 2009]

SEOMike

4:45 pm on May 16, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Firefox and some other browsers have rendering engines that will attempt to correct HTML coding errors in order to deliver content to their users. The autocorrect is a little annoying sometimes as a webmaster and even more so as an SEO. Clients wonder why they need so many HTML fixes while the site displays just fine in their browsers. Also, browsers autocorrect a little differently which is why website designers / webmasters need to do browser testing to make sure that the site looks the same in all browsers, especially their CSS.

So, to try to answer your questions;
1. The browsers are probably displaying the image because they are guessing that you intended to put a .jpg on the uri. Browsers guess that when you do an <img src=""> you are not sending them to get a page, but an image.

2. Since the search engines don't do code-correction like browsers, they are seeing the trailing slash and must think that you are intending to send them to a directory to look for an default page.

3. It'd be interesting to see if the SEs tried requesting page names from your images directory in an attempt to fish out a default page. Also, it's a good reminder to do browser testing.

Very interesting that WMT and Analytics don't report the problem. Keep your eye on WMT and it may show up soon.

phranque

3:57 am on May 17, 2009 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



here's my wild guess.
your image tags probably didn't have the trailing slash added, so you were able to view the image because the url was correct.
it would be interesting to know what Content-Type header was returned for those typeless files.
maybe the browser guessed.
the SE however is more interested in the url linked to by the thumbnail, which url probably had the trailing slash.
if the trailing slash finally comes down to a request for a directory index, you start a whole new brand of magic depending on the server and how it's configured.
and ultimately that directory doesn't exist.

100+ requests per hour indicates a lot of inbound links to those pages.