Web Science/Part2: Emerging Web Properties/Search Engine Ecosystem
Home | Part1: Foundations of the web | Part2: Emerging Web properties | Part3: Behavioral Models | Part4: Web & society | Participate | About the Web Science MOOC |
Course elements
- PART1: Week1: Ethernet · Internet Protocol · Week2: Transmission Control Protocol · Domain Name System · Week3: Internet vs world wide web · HTTP · Week4: Web Content · Dynamic Web Content
- PART2: Week5: How big is the Web? · Descriptive Web Models · Week6: Advanced Statistic Models · Modelling Similarity · Week7: Generative Modelling of the Web · Graph theoretic Web Modelling
- PART3: Week8 : Investigating Meme Spreading · Herding Behaviour · Week9: Online Advertising · User Modelling
- PART4: Week10 : Copyright · Net neutrality · Week11: Internet governance · Privacy
Survival of the fittest
- Fit for whom?
- Search engine operator, search users, advertisers
- Unfit for spammers
- Key performance indicators (multi-criteria optimization problem!)
- Value per click
- User: usability, relevance of search results, coverage of the Web
- Operator: advertising revenues, low cost and scalable technical infrastructure, low personell costs
- Advertiser: click-through and conversion rate
- Value per click
part 1
editwhat is a search engine?
edit- why is it important
- what is key word search?
Search engine history
edit- Archie, 1990
- Gopher, 1991
- WebCrawler, Lycos, Yahoo search 1994
- AltaVista search 1996
- Google search 1998
- Sequels: Baidu, Yandex, Bing
- Alternatives: ask.com, wolframalpha.com
- Vertical search: for products - amazon.com, for people: peoplefinder.com, for egosearch (identity theft prevention): garlik.com,...
Search system architecture
edit- what is a web crawler
- what is a search index (inverted index)
- (for now) blackbox ranking
- binary search relevance
- interface (auto completion, search results,...)
ranking in search I: application of tf idf
edit- show how tf idf can be used for ranking.
ranking in search II: random surfer model
editdouble[][] transitionMatrix = { { 0., 1. / 3., 1., 1. / 3., 0. },
{ 1. / 2., 0., 0., 0., 0. }, { 0., 1. / 3., 0., 1. / 3., 1. },
{ 1. / 2., 0., 0., 0., 0. }, { 0., 1. / 3., 0., 1. / 3., 0. } };
int numberOfNodes = 5;
int steps = 100;
int[] frequency = new int[numberOfNodes];
int page = 0;
for (int i = 0; i < steps; i++) {
// Make one random move.
double r = Math.random();
double sum = 0.0;
// go through a column of the matrix
for (int j = 0; j < numberOfNodes; j++) {
sum += transitionMatrix[j][page];
// if propability is high enough see this as a jump
if (r < sum) {
System.out.println("Go from: " + page + " to:" + j);
page = j;
break;
}
}
frequency[page]++;
}
comparison tfidf vs random surfer
edit- Random surfer + tfidf
- showing how to combine two models.
- even more methods can be included
relevance is a choice: Trust issues with search engines
edit- understand that algorithms are programmed by humans and it is up to us to trust a search engine / choose one
- it will be hard to sense manipulations (magic keyword barack obama)
- large search engines are about the most powerful institutions on the web (money wise but also with regards to impact)
SPAM and SEO
edit- understand that search results can be manipulated
- metadata (schema.org)
The following video of the flipped classroom associated with this topic are available:
You can find more information on wiki commons and also directly download this file
part 2
editmulti stakeholder system
edit- search engine
- end user
- web site owner
- advertiser
- (web master (SEO))
economics of a search engine
edit- understand the concept of keyword based advertising
- understand the auction system of keywords
- understand the model of shared econnomy and man in the middle business models
- taken from b:Strategy_for_Information_Markets/Search_engine_business_models and w:Vickrey_auction
- w:Generalized_second-price_auction
personalization of search results
edit- key methods of personalization (using a coockie)
- graph view of user interests
- collaborative filtering
filter bubble effects
editTechnologies for your own search engine
edit- hadoop
- solr
- nutch
- Elastic search
Key to the most successful search engines was their successful competition for search customers and advertisement customers. Both competitions will be explained in the next two weeks
Advertising
editStakeholders
- advertiser
- customer
- content owner/portal
- advertising network
Intermediaries:
- markets (ebay)
- advertising networks (doubleclick,...)
push out advertisement service from the portal into ad network
- customer: more exact profile, better ad targeting
- content owner/portal: better targeted ads lead to higher revenue
- advertiser: higher click-through rate/conversion rate
- ad network: valuable business model
- Technology
- Business model
- Pricing, auctions
- real-time bidding