Search Engine Spider / Web crawler

Simplified view of spiders crawling the web.

Search Engine Spider / Web crawler

A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion. Other terms for Web crawlers are ants, automatic indexers, bots, and worms or Web spider, Web robot, or—especially in the FOAF community—Web scutter.

This process is called Web crawling or spidering. Many sites, in particular search engines, use spidering as a means of providing up-to-date data. Web crawlers are mainly used to create a copy of all the visited pages for later processing by a search engine that will index the downloaded pages to provide fast searches. Crawlers can also be used for automating maintenance tasks on a Web site, such as checking links or validating HTML code. Also, crawlers can be used to gather specific types of information from Web pages, such as harvesting e-mail addresses (usually for spam).

A Web crawler is one type of bot, or software agent. In general, it starts with a list of URLs to visit, called the seeds. As the crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, called the crawl frontier. URLs from the frontier are recursively visited according to a set of policies.

Like in real estate, it’s all about location, location, location. The more prominently you place keywords important to your website, the more relevant your website appears to be for those particular keywords. Frequency also plays a role — repeated too few times and you lose relevancy, yet repeated too many times and you can be seen as spamming the search engines. There is a delicate balance that must be reached between location and frequency.

The weight placed on each location also varies between search engines. That is why search results will often be different among the various search engines.

However, keyword placement and frequency are not all that the search engines look at in order to rank pages by relevancy. There are other variables that come into play that are commonly referred to as “off-page” factors. These off-page factors include the number of incoming inks to your site and click-through measurement.

The incoming links show the search engines how ‘popular’ you are among sites with similar topics or themes. Essentially, the more links or ‘votes’ you have, the better. However, the search engines do have sophisticated techniques that are used to screen out attempts by websites to build artificial links designed specifically to increase their placement in the search engines.

Click-through measurement is a way for the search engines to watch what results searchers click on for any given search term. Through this analysis the search engines may ‘drop’ pages from its top rankings and ‘boost’ lower ranked pages if they are generating more clicks than the top-ranked pages.

To learn more about an Internet Solution that really works for your business, use the form on “Contact Us” page to secure a free one-on-one consultation session- to help your business gain full benefit from a web presence.