Web Science

(Redirected from Topic:Web Science)


Introduction (0th week)

follow this link to learn

  1. What is a mooc?
  2. What are open educational resources?
  3. How to use this platform?
  4. What will this course be all about?


Web Science/Part1: Foundations of the web

Lessons

  • jump to video
  • download the video
  • jump to script
  • jump to quiz
  1. understand the basic problems when communicating over a shared medium
  2. understand the origins of ethernet
  • jump to video
  • download the video
  • jump to script
  • jump to quiz
  • be able to name the ethernet header fields
  • be able to explain the reason for the preamble
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • understand that the cable length has an influence to transfer rate
  • understand that speed of light is responsible for the connection between cable length and transfer rate
  • be able to calculate the maximum cable length for a given transfer rate
  • understand that the cable length is part of the Ethernet protocol
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • Understand that Ethernet is a non deterministic program
  • Be able to reconstruct a collision detection / resolve algorithm
  • Understand what happens if two computers send data at the same time
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
    no learning goals defined
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • get introduced to the concept of an IP-network
  • understand that networks can be interconnected
  • learn about the importance for decentralization as a design principle
  • realize that Local area networks can be fragmented via IP networks
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • understand that an IP network as an overlay network is an abstract thing that is not directly reflecting the hardware settings
  • understand the notion of an IPv4 address and its components like network and host part
  • understand why MAC addresses do not fulfill the requirements of IP addresses.
  • get introduced to the notion of an IP router / gateway
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • review the definition and concept of an IP network
  • understand that IP routing works on the level of IP networks
  • understand the concept of subnetting
  • review network classes and understand classless inter domain routing.
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • get a feeling for the IP header
  • get a better understanding of how the protocol works
  • understand which header fields are changed while routing
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
    no learning goals defined
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • understand which problems of IP will be solved with the transmition control protocol
  • be aware of the limitations of the internet protocol and the internet architecture
  • get to know the end to end principle and in which only sender and receiver take care that communication works properly
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • understand the concept of a logical connection (virtual communication channel) between two computers on the internet
  • understand the importance of acknowledging received messages
  • be able to understand the process of establishing a tcp / connection
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • understand the concept of a socket in a TCP/IP package
  • understand that ports are part of the TCP header
  • be able to explain the difference between solicited and unsolicited TCP/IP traffic
  • understand how ports can be used for multiplexing internet connections
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • understand the concept of windowsize and sliding window
  • understand how flow control can prevent TCP connections to overload link layer protocols and slow networks
    • jump to video
    • can not download video
    • jump to script
    • jump to quiz
    no learning goals defined
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
    test the mooc system
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
    no learning goals defined
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
    no learning goals defined
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
    no learning goals defined
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
    no learning goals defined
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
    no learning goals defined
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
    no learning goals defined
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
    no learning goals defined
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
    no learning goals defined
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
    no learning goals defined
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
    no learning goals defined
    • jump to video
    • can not download video
    • jump to script
    • jump to quiz
    no learning goals defined
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
    no learning goals defined
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
    no learning goals defined
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
    no learning goals defined
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
    no learning goals defined
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
    no learning goals defined
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
    no learning goals defined
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
    1. In this lesson you will learn some basics on the Question: Why Web Content needs structure and proper markup.
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • Understand the Domain Object Model and the DOM tree
  • Understand that HTML is just a special dialect of XML
  • Understand the relationship between HTML and XML
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • Be able to write simple HTML code having learned a few example elements of HTML (headings, paragraphs, lists, tables, links, anchors, emphasize, input fields; but also few dirty ones like italics, color,...)
  • See that HTML really is just another simple mark up and has nothing to do with programming
  • Be able to structure web Content using HTML and create pages following a specified structure.
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • Know about the style attribute and how to use it within HTML elements
  • Know already realize that there are some limits using the style attribute
  • be able to create websites that follow a certain style guide
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • See the problems with inline styles
  • Understand that a style sheet gives you freedom
  • being able to explain people why they should use style sheets
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • be able to name at least 2 important point why to use style sheets
  • know how the cascading process works
  • know the basic syntax of cascading stylesheets
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • know how to include a media file like a graphic to your webpage.
  • understand that images like jpg, gif and bitmaps are hard for machines to understand.
  • Know how to use a XML based format to create images that are easy to understand for machines and humans an can even make use of stylesheets.
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • Understand that metadata is necessary to communicate the semantics of content
  • See that using metadata for ranking in search results is a bad idea
  • get introduced to modern ways of publishing media data as RDFa
    • jump to video
    • can not download video
    • jump to script
    • jump to quiz
  • Understand the separation between content, structure, layout and meta data
  • Review HTML, CSS, XML, SVG and RDFa
  • Understand what makes a clean HTML markup ("separation of concerns") vs. unclean one ("mixing responsibilities"); and implications (better or worse maintenance, better or worse personalization, better or worse accessibility)
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • become aware of the possibilities to create dynamic content within a webserver
  • see that you don't have to implement a webserver to be able to serve dynamic content
  • understand some main issues like blocking I/O that one should keep in mind when doing server side programming
  • see how the web server is the entry point for web applications
  • whitelisting of input vs blacklisting and a method of preventing XSS
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • understand the basics of HTTP POST requests
  • become aware of security issues while transfering data to a web server
  • be able to create a simple web form in HTML
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • See how a POST request is handled in a Java Servlet
  • get to know the Request object
  • see how a data base query and more advanced technology can be included to a servlet
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • understand how javascript was supposed to support people to fill out web forms
  • understand the issues and disadvantages that arise with javascript
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • be aware of JavaScript APIs
  • know some of the standard JavaScript libraries
  • be able to understand the concept of Ajax requests.
  • [[File:Web Science MOOC Exercises Week 4.pdf|frameless|300x170px|link=Web_Science/Part1:_Foundations_of_the_web/Dynamic_Web_Content/Summary,_further_reading,_homework]]
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
    no learning goals defined

    Discussion


    Web Science/Part2: Emerging Web Properties

    Lessons

    • jump to video
    • download the video
    • jump to script
    • jump to quiz
    1. The question will remain unanswered during the lesson and the entire course.
    2. question of size is underspecified because a measure is needed.
    3. measure depends heavily on the choice of how we model the web.
    4. We have not yet defined what we mean when we say World Wide Web.
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • The web as a software system.
  • The web as a collection of text documents.
  • The web as a graph of interlinked documents.
  • Even when choosing 1 point of view we have fundamentally different ways of modelling.
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • understand that only the model is described.
  • description of the model can be used for interpretation.
  • within the descriptive model one chooses measures to describe the object of study.
  • understand the notion of a modelling choice
  • be able to criticise a descriptive model and the modelling choices
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • Can be used to try to give a reason why something works.
  • need to be run more than once!
  • understand the notion of a modeling parameter
  • will be compared to the descriptive model of our object of study.
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
    no learning goals defined
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • Understand why we selected simple English Wikipedia as a toy example for modeling the web
  • Understand that a task already as simple as counting words includes modeling choices
  • Be familiar with the term “unique word token”
  • Know some basic tools to count words and documents
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • Be familiar with some basic statistical objects like Median, Mean, and Histograms
  • Should be able to relate a histogram to its cumulative distribution function
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • Understand the ongoing, cyclic process of research
  • Know what falsifiable means and why every research hypothesis needs to be falsifiable
  • Be able to formulate your own research hypothesis
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • Understand what a log-log plot is
  • Improve your skills in reading and interpreting diagrams
  • Know about the word rank / frequency plot
  • Should be able to transfer a histogram or curve into a cumulative distribution function
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • Get a feeling for interdisciplinary research
  • Know the Automated Readability Index
  • Have a strong sense of support for our research hypothesis
  • Be able to critically discuss the limits of our models
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • Be able to name some fundamental properties about how frequencies of words in texts are distributed
  • Be a little bit more cautious about visual impressions when looking at log-log plots
  • Know both formulations of Zipf’s law
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • Be able to do a coordinate transformation to change the scales of your plots
  • Understand in which scenario power functions appear as straight lines
  • Know in which scenarios exponential functions appear as straight lines
  • Be even more cautious about your visual impressions
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • Know the axioms for a distance measure and how they relate to norms.
  • Know at least two distance measures on functions spaces.
  • Understand why changing to the CDF makes sense when looking at distance between functions.
  • Understand the principle of the Kolomogorov-Smirnov test for fitting curves
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • Know how to transform a rank frequency diagram to a powerlaw plot.
  • Understand how powerlaw and pareto plots relate to each other.
  • Be able to explain why a pareto plot is just and inverted rank frequency diagram
  • Be able to transform the zipf coefficient to the powerlaw and pareto coefficient and vice versa.
  • Understand that building the CDF is basically like building the integral.
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • Know the properties of a similarity measure
  • Be able to relate similarity and distance measures
  • Know of two applications for modelling similarity
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • Understand how text documents can be modeled as sets
  • Know the Jaccard coefficient as a similarity measure on sets
  • Know a trick how to remember the formula
  • Be aware of the possible outcomes of the Jaccard index
  • As always be able to criticize your model
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • Be familiar with the vector space model for text documents
  • Be aware of term frequency and (inverse) document frequency
  • Have reviewed the definitions of base and dimension
  • Realize that the angle between two vectors can be seen as a similarity measure
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • Be aware of a unigram Language Model
  • Know Laplacian (aka +1) smoothing
  • Know the query likelihood model
  • The Kullback Leibler Divergence
  • See how a similarity measure can be derived from Kullback Leibler Divergence
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • Understand that different modeling choices can produce very different results.
  • Have a feeling how you could statistically compare the differences of the models.
  • Know how you could extract keywords from documents with the tf-idf approach.
  • Try to argue which model you like best in a certain scenario.
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • Understand the principle methodology for building generative models
  • Remember why people are interested in generative models
  • Know why descriptive models are needed when evaluating a generative model
  • Be aware of one way to create a model for text generation
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • Understand how to sample values from an arbitrary probability distribution
  • Have seen yet another application of the cumulative distribution function
  • Understand that sampling from a distribution is just a coordinate transformation of the uniform distribution
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • See that it makes sense to compare statistics
  • Understand that comparing statistics is not a well defined task
  • Be aware of the fact that very different models could lead to the same statistics
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • See that one can always increase the model parameters
  • Know that increasing model parameters often yields a more accurate model
  • Be aware of the bigram and mixed models as examples for our generative processes
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • Be familiar with a set theoretic way of denoting a graph
  • Know at least 4 different types of graphs
  • Have practiced your abilities in reading and writing mathematical formulas
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • Be able to model web pages as a graph
  • Know that the authorship graph is bipartite
  • Know what kind of graph the graph of web pages is
  • (as always) be aware of the fact that modeling is done by making choices
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • Know terms like Size and (unique) volume
  • Be able to count the in and out degree of web pages
  • Have an idea what kind of law (in & out) degree distributions follow
  • Know that degree is not distributed in a fair way
  • Know that the Gini coefficient can be used to measure fairness
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • Understand the notion of a path in a (directed) graph
  • Know that shortest paths between nodes need not be unique
  • Understand the notion of a strongly connected component
  • Know about the diameter of a graph
  • Be aware of the bow tie structure of the Web
    • jump to video
    • download the video
    • jump to script
    • jump to quiz
  • Be able to read and build an adjacency matrix of a graph
  • Know some basic matrix vector multiplications to generate some statistics out of the adjacency matrix
  • Understand what is encoded in the components of the k-th power of the Adjacency matrix of a graph
  • Discussion