Web Technologies/2021-2022/Laboratory 1

Communication architectures

Client-server (eg. Messenger, IRC, WhatsApp, Netflix) - centralized
Peer to peer (eg. BitTorrent, Blockchain, Gnutella, Kazza) - decentralized

About client server architectures

Most inter-process communication uses the client server model. These terms refer to the two processes which will be communicating with each other. One of the two processes, the client, connects to the other process, the server, typically to make a request for information. A good analogy is a person who makes a phone call to another person.

NOTE: The client needs to know of the existence of and the address of the server, but the server does not need to know the address of (or even the existence of) the client prior to the connection being established.

NOTE: Once a connection is established, both sides can send and receive information.

The system calls for establishing a connection are somewhat different for the client and the server, but both involve the basic construct of a socket i.e. one end of an inter-process communication channel. Each of the two processes must establish their own sockets.

There exist two different kinds of sockets TCP (Transmission Control Protocol, which are connection oriented) and UDP (User Datagram Protocol, which are datagram oriented).

TCP implies an overhead for establishing a connection, but the guarantee that our packages arrival is confirmed and they are ordered correctly.
UDP does not have a connection overhead, but we do not know if our packages have arrived and they are in the correct order (such mechanisms have to be implemented by the developers over UDP)

The Internet usually relies on this kind of communication model.

In what follows we present the steps required for both client and server to create sockets and communicate through them

Client socket steps (TCP)

import socket

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

sock.connect(('localhost', 8081))
sock.sendall(b'Hello, server!')
data = sock.recv(1024)
sock.close()

print('Received:', data.decode('utf-8'))

Server socket steps (TCP)

import socket

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

sock.bind(('127.0.0.1', 8081))
sock.listen()
conn, addr = sock.accept()

with conn:
    print(addr, 'connected.')
    conn.send(b'You are now connected.\n')
    while True:
        data = conn.recv(1024)
        if not data:
            break
        print('Recieved message:', data.decode('utf-8'))
        conn.sendall(data)

Testing client/server using GNU NetCat (nc) on UNIX systems:

You can use the GNU NetCat (nc) tool for testing TCP and UDP clients and servers.

Starting a TCP server with netcat:

$ nc -l 8081

Connecting to a TCP server with a netcat client:

$ nc localhost 8081

Starting a UDP server with netcat:

$ nc -u -l 8081

Connecting to a UDP server with a netcat client:

$ nc -u localhost 8081

Links:

TCP/IP

Set of protocols used for Internet communication (and other similar networks)

Its name comes from :

Transmission Control Protocol - TCP
Internet Protocol - IP

TCP is characterised by state; a message is always followed by a response. This in contrast with the UDP (User Datagram Protocol) protocol where there is no ensurance that a message has arrived at its destination.

It is composed of four layers: Link layer, Internet layer, Transport layer and Application layer. In contrast the OSI (Open System Interconnection) model is made up of 7 layers: Physical, Data, Network, Transport, Session, Presentation and Application layer.

TCP and UDP work at Transport level.

URI -- Uniform Resource Identifier

Used to identify resources. It is made up of the following subclasses:

URL - denotes a resource using the exact location by encoding the exact access method and parameters.
URN - denotes a resource by uniquely identifying the resource and not relating to its location.

URL syntax

URL: http://<host>[:<port>]/[<resource>][?<query>]

URN syntax

Specification: RFC 1630.

<scheme>:<hierarchy>[?<query>][#<fragment>]

Example:

  foo://example.com:8042/over/there?name=ferret#nose
  \ /   \______________/\_________/ \_________/ \__/
   |           |             |           |        |
scheme     authority        path       query   fragment
   |   ______________________|_
  / \ /                        \
  urn:example:animal:ferret:nose

Design criteria

(Quoted from RFC 1630)

Extensible:
- new naming schemes may be added later.

Complete:
- It is possible to encode any naming scheme.

Printable:
- It is possible to express any URI using 7-bit ASCII characters so that URIs may, if necessary, be passed using pen and ink.

HTTP

Is an application-level protocol for distributed, collaborative, hypermedia information systems. Its name comes from Hyper Text Transfer Protocol.

Lead to the creation of the World Wide Web (WWW) in 1990 by Tim Berners-Lee.

The HTTP/1.1 standard were released in June 1999. Amongst its features we enumerate: persistent connections, pipelining, virtual hosting, chunked transfer.

Actors

User agent:
- Is a client application which contacts a server on behalf of the user:
  - download client
  - web browser
  - web spider

Server:
- Is a server application which receives requests and answers them

Proxy:
- Is a server application that receives requests and decides to serve them itself, or pass them to the real server, or through a chain of servers. The requests and responses transferred may be modified by it:
  - caching proxy;
  - anonymizing proxy;
  - transparent proxy;
  - reverse proxy;

Protocol

Specifications: RFC 2616.

Request:

[method] [resource] [version]<CRLF>
[header]: [value]<CRLF>
<CRLF>

Example request:

GET /index.html HTTP/1.1<CRLF>
Host: www.example.com<CRLF>
<CRLF>

Response:

[version] [status] [message]<CRLF>
[header]: [value]<CRLF>
<CRLF>
[body]...

Example response:

HTTP/1.1 200 OK<CRLF>
Date: Mon, 23 May 2005 22:38:34 GMT<CRLF>
Server: Apache/1.3.27 (Unix)  (Red-Hat/Linux)<CRLF>
Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT<CRLF>
Etag: "3f80f-1b6-3e1cb03b"<CRLF>
Accept-Ranges: bytes<CRLF>
Content-Length: 438<CRLF>
Connection: close<CRLF>
Content-Type: text/html; charset=UTF-8<CRLF>
<CRLF>
<Content ...>

Request methods

HEAD - used to retrieve only the header of the response. Useful for requesting the meta-information without the actual content.

GET - Used to retrieve both the meta-information and the content of the resource. It is the most used method. It should have no side-effects. (It should be a safe method)

POST - Used to send some data to be processed. (For example as result of filling and sending some user forms)

PUT - used to replace a resource

DELETE - used to remove a resource

TRACE - used to debug or diagnosticate a request. Each server should echo the received request

OPTIONS - used to identify the capabilities of the server

CONNECT

Status codes

2xx -- Success
- 200 -- OK
- 201 -- Created
- 202 -- Accepted
3xx -- Redirection
- 301 -- Moved permanently
- 302 -- Moved temporarily
4xx -- Client error
- 400 -- Bad request
- 401 -- Unauthorised
- 403 -- Forbidden
- 404 -- Not found
- 405 -- Method not allowed
5xx -- Server error
- 500 -- Internal server error
- 501 -- Not implemented

Headers

Headers are important to HTTP, as they define some important characteristics of the connection and data sent or received.

Accept

Accept: text/plain

Accept-Charset

Accept-Charset: iso-8859-5

Accept-Encoding

Accept-Encoding: compress, gzip

Accept-Language

Accept-Language: da

Content-Encoding

Content-Encoding: gzip

Content-Language

Content-Language: da

Content-Length

Content-Length: 348

Content-Type

Content-Type: text/html; charset=utf-8

Host

Host: www.w3.org

If-Modified-Since

If-Modified-Since: Sat, 29 Oct 1994 19:43:31 GMT

Last-Modified

Last-Modified: Tue, 15 Nov 1994 12:45:26 GMT

Server

Server: Apache/1.3.27 (Unix) (Red-Hat/Linux)

User-Agent

User-Agent: Mozilla/5.0 (Linux; X11; UTF-8)

Links:

Python's socket module

socket

Miscellaneous - connecting to remote machines through digital certificates

During the following labs you could require to connect to remote machines in order to publish your web pages and projects. Because authentication requires you to enter each time a username and password bellow is an easier way which is base on digital certificates. After following the instructions bellow you will be able to connect from any Linux machine having the generated private key:

cd ~/.ssh
ssh-keygen -t rsa
choose no passphrase when asked and accept the default filename of id_rsa
scp id_rsa.pub <user>@<yourhost>:.ssh/authorized_keys
provide your password when asked and that’s the last time you’ll have to do it!

If you wish to connect to several remote machines you can reuse the created id_rsa.pub and copy it on each of them as indicated above.

Exercises

Create a simple chat application such that:
- you have one client and one server
- the client (human user) must send messages to the server (computer) which in turn will respond to them automatically:
  - For example:
    - client: hello
    - server: hi! what's your name?
    - client: John
    - server: nice to meet you John!
    - client: ...

Implement a simple HTTP client application which (bonus - for lab 3):
- takes on the command line an URL as an argument
- parses the given URL to obtain all the needed information, or uses the default values for the missing information
- contacts the specified web server
- requests the resource
- interprets the received status line
- prints the response body
- handles the most common errors that can be encountered

IMPORTANT: It is forbidden to use an existing HTTP library or class; you should implement the HTTP protocol yourself. You may use the URL class for parsing the argument. HINT: Use a socket connection to retrieve data (client).

Alexandru Munteanu, 26-09-2021, alexandru.munteanu@e-uvt.ro