Web Science/Part2: Emerging Web Properties/Search Engine Ecosystem

Survival of the fittest

  • Fit for whom?
    • Search engine operator, search users, advertisers
    • Unfit for spammers
  • Key performance indicators (multi-criteria optimization problem!)
    • Value per click
      • User: usability, relevance of search results, coverage of the Web
      • Operator: advertising revenues, low cost and scalable technical infrastructure, low personell costs
      • Advertiser: click-through and conversion rate

part 1

edit

what is a search engine?

edit
  • why is it important
  • what is key word search?

Search engine history

edit
  • Archie, 1990
  • Gopher, 1991
  • WebCrawler, Lycos, Yahoo search 1994
  • AltaVista search 1996
  • Google search 1998
  • Sequels: Baidu, Yandex, Bing
  • Alternatives: ask.com, wolframalpha.com
  • Vertical search: for products - amazon.com, for people: peoplefinder.com, for egosearch (identity theft prevention): garlik.com,...

Search system architecture

edit
  • what is a web crawler
  • what is a search index (inverted index)
  • (for now) blackbox ranking
  • binary search relevance
  • interface (auto completion, search results,...)

ranking in search I: application of tf idf

edit
  • show how tf idf can be used for ranking.

ranking in search II: random surfer model

edit
  • explaining random surfer model
  •  
double[][] transitionMatrix = { { 0., 1. / 3., 1., 1. / 3., 0. },
		{ 1. / 2., 0., 0., 0., 0. }, { 0., 1. / 3., 0., 1. / 3., 1. },
		{ 1. / 2., 0., 0., 0., 0. }, { 0., 1. / 3., 0., 1. / 3., 0. } };
int numberOfNodes = 5;
int steps = 100;

int[] frequency = new int[numberOfNodes];
int page = 0;
for (int i = 0; i < steps; i++) {
	// Make one random move.
	double r = Math.random();
	double sum = 0.0;
	// go through a column of the matrix
	for (int j = 0; j < numberOfNodes; j++) {
		sum += transitionMatrix[j][page];
		// if propability is high enough see this as a jump
		if (r < sum) {
			System.out.println("Go from: " + page + " to:" + j);
			page = j;
			break;
		}
	}
	frequency[page]++;
}

comparison tfidf vs random surfer

edit
  • Random surfer + tfidf
  • showing how to combine two models.
  • even more methods can be included

relevance is a choice: Trust issues with search engines

edit
  • understand that algorithms are programmed by humans and it is up to us to trust a search engine / choose one
  • it will be hard to sense manipulations (magic keyword barack obama)
  • large search engines are about the most powerful institutions on the web (money wise but also with regards to impact)

SPAM and SEO

edit
  • understand that search results can be manipulated
  • metadata (schema.org)

The following video of the flipped classroom associated with this topic are available:

You can find more information on wiki commons and also directly download this file

part 2

edit

multi stakeholder system

edit
  • search engine
  • end user
  • web site owner
  • advertiser
  • (web master (SEO))

economics of a search engine

edit

personalization of search results

edit
  • key methods of personalization (using a coockie)
  • graph view of user interests
  • collaborative filtering

filter bubble effects

edit

Technologies for your own search engine

edit
  • hadoop
  • solr
  • nutch
  • Elastic search

Key to the most successful search engines was their successful competition for search customers and advertisement customers. Both competitions will be explained in the next two weeks

Advertising

edit

Stakeholders

  • advertiser
  • customer
  • content owner/portal
  • advertising network

Intermediaries:

  • markets (ebay)
  • advertising networks (doubleclick,...)

push out advertisement service from the portal into ad network

  • customer: more exact profile, better ad targeting
  • content owner/portal: better targeted ads lead to higher revenue
  • advertiser: higher click-through rate/conversion rate
  • ad network: valuable business model
  • Technology
  • Business model
  • Pricing, auctions
  • real-time bidding