2 May 2009

Q1. What is a Spider? What does it do?

Exercise 23: Searching Mechanisms

Q1. What is a Spider? What does it do?

Answer:

Spider is a program that automatically fetches Web pages. Spiders are used to feed pages to search engines. It's called a spider because it crawls over the Web. Another term for these programs is webcrawler.
Because most Web pages contain links to other pages, a spider can start almost anywhere. As soon as it sees a link to another page, it goes off and fetches it. Large search engines, like Alta Vista, have many spiders working in parallel. (Webopedia, 2009)

A web crawler (also known as web spider) is a program which browses the World Wide Web in a methodical, automated manner. A web crawler is one type of webbot. Web crawlers not only keep a copy of all the visited pages for later processing - for example by a search engine but also index these pages to make the search narrower.
In general, the web crawler starts with a list of URLs to visit. As it visits these URLs, it identifies all the links in the page and adds them to the list of URLs to visit. The process is either ended manually, or after a certain number of links have been followed.
Web crawlers typically take great care to spread their visits to a particular site over a period of time, because they access many more pages than the normal (human) user and therefore can make the site appear slow to the other users if they access the same site repeatedly.
For similar reasons, web crawlers are supposed to obey the robots.txt protocol, with which web site owners can indicate which pages should not be spidered. (Knowledgerush, 2003)

Reference:

1. Webopedia (2009). "What is Spider?". The #1 Online Encyclopedia dedicated to computer technology, Retrieved from URL - http://www.webopedia.com/TERM/s/spider.html
2. Knowledgerush (2003). "Web Spider". Knowledgerush.com, Retrieved from URL - http://www.knowledgerush.com/kr/encyclopedia/Web_spider/






No comments:

Post a Comment