We all know that, to the dismay of some if not many, that Google controls the fate of our websites, regardless of the type of site it is. But if we want to be able to have a chance of being seen by people who are searching for our services and/or products, we’re going to have to bite the bullet and get ourselves ranked on search engines, such as Google.
No matter how great your site is though, you need to make sure you optimize it correctly for search engines. An important aspect of SEO is to be sure that search engine crawlers are able to access the pages on your website. If for any reason they aren’t able to do their job, your site is going to have quite a time ranking in the search results.
There are quite a number of ways that Google can determine if they should stop crawling your website, and we aren’t even talking about the obvious ones we’ve talked about before, like disavow, robots.txt and nofollow tags.
Google’s Webmaster Trends Analyst, Gary Illyes, shared a tidbit of information with the audience of SMX East on two technical ways that Google determines when GoogleBot should either slow down or do a full stop when it comes to crawling your website. What are these ways that Google can choose to crawl your site or not?
1. Connect Time
This seems like a no brainer. Even your average human searcher is going to avoid your site like the plague if the amount of time it takes to connect to your site or to an individual page takes too long. Google feels the same way. If the connect time is too long, or gets longer and longer, then that is bad news. Google will begin pulling back from crawling your site or page. In order to avoid taking down your server, connect time is used to determine if your website/web page should even show up on search results.
2. HTTP Status
In my experience, or in anybody else’s experience in web surfing for that matter, we don’t like seeing HTTP statuses when we try accessing a website. They suck, plain and simple. Well, Google doesn’t really like them either. If Google encounters a status that is in the 500 range, then they will slow or stop crawling your site. If you are a webmaster who has a site known for its HTTP statuses, you may want to hurry and get them fixed. Not only is it bad since Google will stop crawling it, resulting in a lowered ranking on search results, but I have a feeling people would stop visiting your site, no matter of your rankings to begin with.
For those who don’t know, or don’t remember, a 5XX Server Error means that something is ary with your server and it isn’t responding. Google will stop crawling your site, not because they are doing it to be a total jerk, but to avoid causing any other problems with your site (which obviously has its own set of problems already).
For more information on server errors, check out the full list here on Wikipedia.