Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Googlebot getting images with html ?

         

stricknine

9:26 am on Apr 9, 2011 (gmt 0)

10+ Year Member



Hi,

I noticed something some days ago. Until now, Googlebot only crawled html pages (and occasionnally javascript and css files). And Googlebot-images crawled my images.

Now, sometimes, Googlebot (not Googlebot-images) crawls a page, then, crawls 1 or 2 images that are present on this page. And for the request for the images, the logs show the html page it just crawled as a referer, exactly like if it was rendering the pages, and for that, crawling the images it doesn't already have in cache.

(I'm in France, so this would be for google.fr, which hasn't yet rolled out the Panda update as far as I know).

Did anybody see this behavior?

Thanks

asabbia

4:09 pm on Apr 9, 2011 (gmt 0)

10+ Year Member



have you checked if the ip of googlebot requesting your images is one of the official?

Leosghost

4:30 pm on Apr 9, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Been seeing panda from France for a while now..on both the dot com ( our dot com is not the same as the dot com other countries get ..IME each county gets a different dot com Google and within that it varies by DC ) and the dot fr.

And don't forget, whether you are signed in or signed out you are getting personalised serps on any dot ..in spite of Google's official line.

Btw Welcome / bienvenue to WebmasterWorld stricknine :)

stricknine

4:25 pm on Apr 10, 2011 (gmt 0)

10+ Year Member



Leosghost: are you sure about that? Didn't see much change here.

But my question was : did you ever see googlebot crawling images too? (not googlebot-images or googlebot preview, but googlebot/2.1).

levo

4:32 pm on Apr 10, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Could it be for instant previews?

tedster

4:35 pm on Apr 10, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



stricknine - what you are describing does not sound like typical googelbot behavior, and the first thing I'd check is whether some other crawler is spoofing googlebot. Heck, what you described might even be me switching user agents on one of my browsers.

Here's the thread with the full explanation - it's a bit more complex than a simple IP check, although just that much will weed out 99% of the spoofing (including mine).

How to Verify Googlebot and avoid rogue spiders [webmasterworld.com]

stricknine

4:46 pm on Apr 10, 2011 (gmt 0)

10+ Year Member



levo> no, instant preview uses a specific user-agent.

tedster > well it was the right IP, and it was in the middle of a normal crawling session from googlebot, so I think it really is googlebot. But this behavior is rare, 99.9% of the time it will just crawl the html.

Sgt_Kickaxe

4:46 pm on Apr 10, 2011 (gmt 0)



I've noticed this when I use thumbnails that link to the full version of an image, Googlebot crawls through the link.

indyank

5:16 pm on Apr 10, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Around wednesday, i saw a wikipedia image ranking high for a good keyword I track.It wasn't there before but it is still there at the top.

It does seem to be true when I map what I saw to what you say.

Leosghost

6:00 pm on Apr 10, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Leosghost: are you sure about that? Didn't see much change here.

Yep ..But change hasn't had quite such dramatic consequences here ..may reasons IMO ..the effect of panda has been limited in French language serps because there is a great deal less "scraping" goes on in the French language..( there are dissuasive legal consequences for scrapers that are relatively easy to put into action by the "originating sites") ..so scraping happens less

In the English language but aimed at the French market, or hosted here, but aimed at the English speaking resident market ,I watch a group of 100 or so sites ( some big, some small, some my own ),in almost all areas there as been some movement.

Each time if ehow had a page on the search term, it has gone into the top 3 from where ever it had been before ( in some cases not previously on the first page ) and some of the previous top 3 pages / sites have dropped out of the first 50 results.

I have one site which exists in both English and French ..I haven't touched it in 5 years ..I use it ( amongst others, all of which are set up differently to "gauge" what is happening at any one time with Google ..I don't change them ..they are like finely tuned bells ..each reacts vis a vis its serp position to specific things being dialed up or down by G ) ..this site ( for a very specific KW1KW2 query moved down from position 3 to position 5 run from Google.com or Google.fr languages ..ehow and 2 youtube videos on the subject moved past it within 10 days of panda rollout in the USA ) the two sites above me ( both with far far more links than I ) slid down ..pushing me down ..

ehow uses a rehash of my text ..complete with two errors ( but not typos ) that I have left in as "controls".

query KW1KW2 returns 2,000,000 results.

I also launched two single page sites recently ..one ( site A ) 30 days prior to the roll out of panda in the USA ..one ( site B ) 5 days after ..both are SEO'd on page .site A has one single inbound ..from the footer of one of the sites mentioned above ..site B has one single inbound from the same site, same page footer but has 10 others from other sites..neither have links from anywhere that are not under my control ..

Site A ..is English language only ( for now ) is at #1 on Google.com and #5 on Google.fr the sites above it on .fr all have French text and French inbounds.
Query is KW1 ( name of site ..6 letter made up word easy to say ..sounds like a real word..I have the dot com and the dot fr ..and the trademark registered in English and French ..plus companies in both countries registered in that name ).Query KW1 is 60K results.

Site B is at #5 on Google.com ..query 400,000,000 and #3 on Google.fr ..query ..name is an actual word 7 letters..( in English ).. ( can also be written/split as 2 words in English ).but the French use the English words..( web, IT related ) .if searched for as the two words separated the KW1 KW2 query is around 800 million.

Site stays around the same position #3 to #5 when you search KW1 KW2.

The sites above it on the Google.com have hundreds of thousands of inbounds ..the sites above it on Google.fr have likewise from French language sites that use the English term .

On both the Google.com and the Google.fr the sites that rank above it ( site B ) started shuffling about 15 days after panda rolled in the USA ..they are now more or less stable ..those that moved up have more links than those that were there ..but their links are less "relevant" and more social ..less pro sites linking to them ..more John and Jane or Jacques et Jean saying how they like it.

btw ..depending on the time of day the number of results for siteB KW ("name") etc change from 400 million to 240 million ..daytime 240 million night time 400 million ..this is regular ..every 24 hours it cycles.like the tides ;-) ( we live next to the sea ;-)

btw2..I have other sites that back up this observation ..this is merely one example, with some detail for you.

btw3.re gbot taking images ? can't help you on that for the moment ..I'm replacing the "home and business LAN" and am not looking at my stats anywhere much ..nor emails, all is more or less on auto pilot, 'til I'm finished pulling cat 5 and fibre.