Have you ever wondered how a search on Google actually works? You ask a question, and then magically somehow get an instant answer. But what’s really happening in between that split-second? Today, Google released an interactive site that attempts to illustrate this process called How Search Works. The site will take you through the entire life of a search query, from typing it in on the web, to crawling and indexing, algorithmic ranking and serving, to finally fighting off webspam. And what’s amazing is that this interactive infographic illuminates everything that goes behind that almost instant feeling like journey from the moment you search, through the algorithm rankings, to getting answers.
Crawling and Indexing
Search starts with the web which is made up of over 30 trillion individual pages and rapidly growing. Google looks to navigate the web by crawling which means they follow links hopping from page to page. Keep in mind, webmasters can choose whether their sites or portions of them are crawled or not. Google provides you a one-of-a-kind interactive look and graphical explanation behind what is a Google Search.
Google Search Algorithm
Google then sorts the pages by their content quality and other factors (site quality, freshness). And they keep track of all of it in an index of over 100 million gigabytes. Google tries to write programs and formulas that deliver the best results possible to you as you search (using string theory). These algorithms look for clues trying to better understand what we mean. Based on these clues, Google will pull relevant documents from their index. Google looks at over 200 factors that are constantly changing. These changes form in the minds of engineers who constantly run experiments, analyze the results, tweak them as necessary, and run them over and over again. Google provides a view into all major features of its search algorithm and a 43-page document explaining how they evaluate search quality in results.
How Google Fights Spam
Google also fights spam 24/7 to try and keep your results as relevant as possible. Types of spam include hidden text or keyword stuffing, unnatural links from or to a site, cloaking or other sneaky redirects, spammy free hosts and dynamic DNS providers, parked domains and thin content with little or no relative value added. Most of the spam removal is automatic, however they need to examine some of the questionable and less obvious documents by hand. When they take action they attempt to notify website owners who can attempt to fix their websites and regain listing. You can see some interesting graphs illustrating the spam problem and how Google thwarts it, plus a list of policies explaining how and when they remove content.
So that’s how search works in a nutshell. Within the split-second behind your simple page results lies a system ever increasing in complexity, carefully crafted and tested, that supports more than one-hundred billion searches each month!