homepage Welcome to WebmasterWorld Guest from 54.226.136.179
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Crawl rate since mid december
member22

5+ Year Member



 
Msg#: 4637094 posted 3:11 pm on Jan 13, 2014 (gmt 0)

The number of pages crawled per day on website has more that doubled since mid december ( according to the crawl stats or the webmaster tools )

Is it the same for everyone or just me and what does it mean ?

 

bumpski

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4637094 posted 9:38 pm on Jan 13, 2014 (gmt 0)

Using Webmaster Tools:
Go to "Google Index", "Index Status"
Click the Advanced Tab, and check "Ever Crawled"

If this number has jumped up Google has found a new way to access your content.
One very common recent occurrence is Google discovers that your site pages can be crawled using both https (SSL) and the conventional http: protocols.
To Google this can apparently double the number of pages you have, and perhaps (but I doubt) will lead to a duplicate content penalty.

The best fix is using the "rel=canonical" mechanism.

member22

5+ Year Member



 
Msg#: 4637094 posted 6:47 am on Jan 14, 2014 (gmt 0)

I just checked and the number has been steady and hasn't moved since the 7 th of april 2013.

The only time I got a spike was on the 10 th of march 2013 when i went from 1500 to 2700 pages crawled in less than 2 weeks.

However, the reason was that was due to a bug in my cms when upgrading to a newer version.

By the way my website is only 38 pages and it says over 1500 ever crawled when does it find that number and why is it so high in comparison to my website number of pages ?

EditorialGuy

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4637094 posted 3:25 pm on Jan 14, 2014 (gmt 0)

Google has been crawling our site a lot more in the past few months. It used to be that there would be a deep crawl, followed by an update (such as a Panda update) a few days later, then nothing much until the next month. Now we see deep crawls over a one- or two-day period every week or so. The monthly graph has gone from zzzzzzzzzzz!zzzzzzzzz to a roller coaster, with higher minimum and average crawl figures than before.

I suppose it's possible that our sitemap has something to do with this. After a long period of having given up on sitemaps, I found a sitemap generator that I liked and began using it late last year. (I've got "change frequency" set to "monthly," for whatever thatever might be worth.) Could the sitemap be encouraging Google to crawl the site more often? Beats me. (The site should be easy for Googlebot to crawl with or without the sitemap--it's a static site of only 5,000+ pages with mostly evergreen content and plain-vanilla internal links.)

bumpski

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4637094 posted 4:13 pm on Jan 14, 2014 (gmt 0)

By the way my website is only 38 pages and it says over 1500 ever crawled when does it find that number and why is it so high in comparison to my website number of pages ?
If your webpages do not filter out bad "parameters" or "http query strings", the ever crawled number can get very large.
If the server responds with to a URL like this:http://www.example.com/topic.htmlxxx
with the
page:http://www.example.com/topic.html
your ever crawled count is likely to climb and climb.
Google is very resourceful in actually inventing new BAD URL's to access your content.

As I mentioned before https, is another source of these "ever crawled" stats. Also many webhosts create a secret subdomain to access your site, something like, "YourDomainAcronym.YourWebHostDomain.com". Google finds these "secret" domains and crawls them and adds them to their index, sometimes before a webmaster has even published the actual site on the desired domain!
Then to make some money the webhost makes sure that all the error pages that result from misspelling etc, have advertisements on them.

member22

5+ Year Member



 
Msg#: 4637094 posted 4:20 pm on Jan 14, 2014 (gmt 0)

How do I make sure my website doesn't have a secret subdomain ?

bumpski

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4637094 posted 5:10 pm on Jan 14, 2014 (gmt 0)

Usually the webhost does tell you the secret subdomain name, on their domain.

One method that works for my specific host:

site:YourWebHostDomain.com -site:www.YourWebHostDomain.com

The -site: operator partially removes the webhosts content itself from the results.

This search produces 1,470,000 results mostly excluding my WebHost. If I then click on some of these sites I can find their real domain name, usually in links, or perhaps in the source code itself.
Perhaps many of these websites don't know about this or they don't even care about the duplication in Google's index. My host in the past claimed to host 700,000 domains, they have since been bought up by a conglomerate that owns many other webhosts, that also use secret domains, as an aid to new webmasters creating sites. These secret domains can be assets but the webmasters have to take steps to secure them.

I oft times take a sentence from one of my pages and search for it in quotes (perhaps adding the site: command above). This can pop up the secret pages in Google's index, and great; I just found another idiot blatantly copying my entire article, which I chose at random! You try and put up free content to help people and you just end up fighting Google (to get the content to the people that want the help) and, fighting idiots trying to take advantage of you!

member22

5+ Year Member



 
Msg#: 4637094 posted 5:33 pm on Jan 14, 2014 (gmt 0)

Excellent thank you and any command that I can type that I can show me all the pages that google has index ( even the ones blocked by robots.txt or in supplemental index )

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved