Most people use an Internet search engine almost every day. Millions of hits are provided in a fraction of a second, and there is a powerful technology underlying the entire process. This sample computer science essay explores how search engines work.
The essay will have five parts.
- The history of the development of search engines
- General functioning of search engines
- Two key search metrics of relevance and popularity
- Where all the data processed by search engines is stored
- Emerging implication of how search engines work for businesses
History of Internet search engines
The rise of search engine technology parallels, of course, the rise of the Internet itself. There were some conceptual precursors reaching all the way back to the 1940s, in the broad sense of developing an electronic data storage and retrieval system that tended to mimic in its structure the workings of the human mind. The search engine as it is known today, though, first emerged in the 1990s. As Wall has written:
“By December of 1993, three full fledged bot-fed search engines had surfaced on the web: JumpStation, the World Wide Web Worm, and the Repository-Based Software Engineering (RBSE) spider.
These initial search engines used fairly primitive algorithms that were based primarily on just matching the words of the search with the words of the web content; and they provided no real sorting of the retrieved information, instead simply reporting the search results in the actual order they were retrieved by the search engines themselves.”
Several other search engines also emerged (and faded) over time. Eventually, though, over the course of the decade of the 2000s, Google rose to prominence as the undisputed premier search engine on the Internet. To a large extent, this was due to the level of integration that Google was able to achieve in its operations: Google provided not only a search service but also, over time, an advertising service, an e-mail service, a video service, a scholarly database, and a virtual library.
Google also offered the option of stratifying searches according to genre (e.g. news, videos, and so on). By now in the year 2015, the suggestion can surely be made that although other search engines (most notably Yahoo) still continue to exist, it is safe to conclude that in the minds of most people, the technology of the search engine itself has become indissociably connected with Google. Indeed, this is reflected in the fact that “Google” itself has by now become an English verb (i.e. “to Google” something).
General Functioning of a search engine
In order to access any requested information from the Internet and provide it to the user in a matter of a fraction of a second, the search engine must, first of all, develop a comprehensive index of all the material available on the Internet. As Moz has written:
“Imagine the World Wide Web as a network of stops in a big city subway system. Each stop is a unique document. The search engines need a way to ‘crawl’ the entire city and find all stops on the way, so they use the best path available—links” (2).
The robots used by the search engine, often called crawlers or spiders, serve the purpose of mapping out all of the relevant interactions of the Internet through the link structure. It is this structure that is utilized when a given Internet user posts a search query to the search engine.
The main key for navigating the link structure is the algorithm. As Google itself has indicated:
“We write programs & formulas to deliver the best results possible;” the algorithms provide metrics that can match up a given web page with the Internet user’s search query; and “based on these clues, we pull relevant documents from the index.”
The more sophisticated the algorithms involved, the more likely it is that the results provided for a given search query will be salient and meaningfully address the needs of the Internet user (Mostafa). By these times, most people perhaps almost take for granted as a fact that Google will almost always provide one with the actual information that one is looking for. This was not, however, always the case: more primitive search engines would have provided hits in a far more jumbled way, causing a fair level of frustration for the user.
Using the right keywords to determine relevance and popularity
One of the primary metrics used by a modern search engine such as Google when retrieving and sorting information is relevance. This would seem to be straightforward enough. As Moz has pointed out, though:
“To a search engine, relevance means more than finding a page with the right words…Today, hundreds of factors influence relevance” (3).
In general, relevance for search engines has come to mean a broad metric pertaining to the quality of the web pages retrieved; and this is for the simple reason that if a web page is of low quality, then the information it contains likely will not be relevant for the Internet user, even in the event that the words on the web page technically match up with the words of the search query.
Therefore, one of the key purposes of the algorithms utilized by a modern search engine is to evaluate the quality of a retrieved webpage. This is often done by considering the extent to which a given webpage is formatted accord to generally accepted quality standards, the extent to which it is linked to other webpages on the Internet, and so on (Enge).
Location of the Internet and search results
Now, in order to further understand how a search engine such as Google operates, it is necessary to consider a question that seems obvious but very profound at the some time: namely, where is the Internet? People generally experience the Internet as a virtual cloud that is everywhere and nowhere at the same time. And yet, the Internet must have some material basis in the physical world.
This is part of why and how, for example, it is even possible for individual nations to attempt to censor the Internet. The fact of the matter is that the Internet is “stored” in immense data computers around the world. When an individual Internet user types a search query into Google, he is essentially linking up his own computer with these data computers: he transmits data (i.e. the search query) to these computers, and those computers transmit data (i.e. search results) back to the Internet user.
This material basis of the Internet is in fact quite immense, consisting as it does of enormous computers that lack screens but have additional processors in order to fulfill their specified tasks. As Glanz has written:
“Most data centers, by design, consume vast amounts of energy in an incongruously wasteful manner, interviews and documents show. Online companies typically run their facilities at maximum capacity around the clock, whatever the demand” (paragraph 7).
In any event, the main point for present purposes is that a company such as Google has its own vast stores of these data computers. The indexed Internet is contained within these computers, and the specific links accessed by a given Internet user are in fact stored in the form of data within these computers, whether owned by Google or some other stakeholder who produces content for the Internet.
Internet searches and business marketing strategies
The emergence of the Internet in general and search engines in particular has had a very important implication for businesses around the world. This implication can be summarized with the term of search engine optimization. Essentially, any given business clearly wants to be noticed by as large a number of people as possible; and within the context of the Internet, this often means that the business will need to come up relatively high on the search lists produced by queries related to the products or services provided by the business.
One way to do this would be to consciously design one’s web pages in such a way that they will produce high scores according to the algorithms used by modern search engines such as Google. This would “optimize” the business’s web pages and help them develop more effective online marketing strategies. As Stemle has written:
“Loosely defined, anytime someone mentions your company on their website, that’s a citation” (paragraph 2).
More citations mean that there are more potential web pages through which a potential Internet user may come across one’s business. Moreover, in order to build citations, a given business may seek to develop a complex website that contains several internal web pages, each of which may prominently contain the name of and other information regarding the company.
Enge, Eric. “Search Engine Basic Concepts.” Search Engine Watch, 16 Jun. 2014. Web 30 Nov. 2015. http://searchenginewatch.com/sew/how-to/2350169/search-engine-basic- concepts#.
Glanz, James. “Power, Pollution and the Internet.” New York Times. 22 Sep. 2012. Web. 30 Nov. 2015. http://www.nytimes.com/2012/09/23/technology/data-centers-waste-vast- amounts-of-energy-belying-industry-image.html?_r=0.
Google. “How Search Works: From Algorithms to Answers.” Google Inside Search, 2015. Web. 30 Nov. 2015. http://www.google.com/insidesearch/howsearchworks/thestory/.
Mostafa, Javed. “How Do Internet Search Engines Work?” Scientific American. 29 Nov. 2004. Web. 30 Nov. 2015. http://www.scientificamerican.com/article/how-do-internet-search- en/.
Moz. The Beginner’s Guide to SEO. Author, n.d. Web. 30 Nov. 2015. http://d2eeipcrcdle6.cloudfront.net/guides/Moz-The-Beginners-Guide-To-SEO.pdf.
Steimle, Josh. “Simple SEO Tip for Small Businesses: Local Citations.” Forbes. 7 Nov. 2013. Web. 30 Nov. 2015. http://www.forbes.com/sites/joshsteimle/2013/11/07/simple-seo-t ip-for-small-businesses-local-citations/.
Wall, Aaron. “History of Search Engines: From 1945 to Google Today.” Search Engine History, 2015. Web. 30 Nov. 2015. http://www.searchenginehistory.com.