A workable search engine written in C which crawls and caches webpages, indexes html attributes of each page, and queries a search string with page ranking
Search Engine Architecture
The simple search engine architecture is based off of a 2001 paper (Searching the Web) by Arasu et al. published by the Association for Computing Machinery

Implementation of Search Engine
See my C implementation at https://github.com/srb-private-org/tiny-search-engine (email for access)
The implementation is broken up into three modules: Crawler, Indexer, and Querier.
Crawler
The Crawler module includes a standalone program crawler which crawls the web starting from a “seed” url, fetches links from pages continuing to a certain depth, and then caches these pages in a specified folder.


Indexer
The Indexer module implements indexing functionality in that it reads the document files produced by the Crawler, builds an index, and writes this index to a specified index file.

Querier
The Querier module implements querying functionality in that it reads the index file produced by indexer, and the page files produced by crawler, and answers search queries inputted to stdin.

