Search Engine (Full Text Search)

About

Full-text search is a battle between:

  • precision—returning as few irrelevant documents as possible
  • and recall—returning as many relevant documents as possible.

While matching only the exact words that the user has queried would be precise, it is not enough. We would miss out on many documents that the user would consider to be relevant. Instead, we need to spread the net wider, to also search for words that are not exactly the same as the original but are related (word stem, synonym ?)

Keywords / Search term

keyword are the search term filled in the search form known also as search term

OpenSearch

opensearch is a system that permits to make your internal search engine available publicly. Visitors can then searched directly from the browser.

Type

There are three types of search intent:

  • Informational (to find information)
  • Navigational (to locate a specific website)
  • Transactional (to make a purchase)

List

the supported Google list, you can also add on into your analytics account

Engine Example Domain Names
360.cn http://360.cn/
Alice http://www.alice.com/
http://aliceadsl.fr
Alltheweb http://www.alltheweb.com/
Altavista http://www.altavista.com/
AOL http://www.aol.com/
Ask http://www.ask.com/
http://search.aol.fr
alicesuche.aol.de
etc.
Auone http://search.auone.jp/
Avg http://isearch.avg.com
Babylon http://search.babylon.com
Baidu http://www.baidu.com/
Biglobe http://biglobe.ne.jp
Bing http://www.bing.com/
Centrum.cz http://search.centrum.cz/
Comcast http://search.comcast.net
Conduit http://search.conduit.com
CNN http://www.cnn.com/SEARCH/
Daum http://www.daum.net/
DuckDuckGo http://duckduckgo.com
Ecosia http://www.ecosia.org
Ekolay http://www.ekolay.net/
Eniro http://www.eniro.se/
Globo http://www.globo.com/busca/
go.mail.ru http://go.mail.ru/
Google All Google Search domains (e.g., www.google.com, www.google.co.uk, etc)
goo.ne http://goo.ne.jp
haosou.com http://www.haosou.com/s
Incredimail http://search.incredimail.com
Kvasir http://www.kvasir.no/
Live http://www.bing.com/
Lycos http://www.lycos.com/
Lycos http://search.lycos.de
other regional TLDs
Mamma http://www.mamma.com/
MSN http://www.msn.com/
http://money.msn.com
http://local.msn.com
Mynet http://www.mynet.com/
Najdi http://najdi.si
Naver http://www.naver.com/
Netscape http://search.netscape.com/
ONET http://szukaj.onet.pl
Ozu http://www.ozu.es/
Rakuten http://rakuten.co.jp
Rambler http://rambler.ru/
Search-results http://search-results.com
search.smt.docomo http://search.smt.docomo.ne.jp
Sesam http://sesam.no/
Seznam http://www.seznam.cz/
So.com http://www.so.com/s
Sogou http://www.sogou.com/
Startsiden http://www.startsiden.no/sok
Szukacz http://www.szukacz.pl/
Terra http://buscador.terra.com.br
Tut.by http://search.tut.by/
Ukr http://search.ukr.net/
Virgilio http://search.virgilio.it/
Voila http://www.voila.fr/
Wirtulana Polska http://www.wp.pl/
Yahoo http://www.yahoo.com/
http://yahoo.cn
m.yahoo.com
other regional mobile sites
Yandex http://www.yandex.com/
http://yandex.ru
Yam http://www.yam.com/

Database Modeling

See What are models of text in NLP? (Natural Language, Text Modeling).

Technology

Server

Lucene is the most known full text search library that powers:

Web Services

Client

Client Side Search. At its most basic functionality, a search component will simply provide an index file which is no more than a JSON file containing the content of all pages.

Task Runner