Peer to peer networking

So I'm just starting learning about Peer- to- Peer networks. Seems like an interesting topic, excited to see where it takes me.

Let's start out with scouring Wikipedia and what it has to say about P2P.

- The P2P network is based on the notion that every node in the network function both as a client and server to other nodes in the network.

- Hence there is no central server

Whats an overlay network?

An overlay network is a computer network that is layered on top of another network.

Nodes in the overlay network can be thought of as being connected by virtual or logical links, each of which corresponds to a path, perhaps through many physical links, in the underlying network.

How does routing and resource discovery take place?

All peer to peer networks have a virtual overlay network on top of the physical network topology, where

the nodes in the overlay form a subset of nodes in the physical network.

Data is still exchanged over TCP/IP, but in the application layer, peers can talk to each other directly through the logical overlay links.

Peer-to-peer systems become independent from physical network topology as overlays are used for indexing and peer discovery.

Depending on how the nodes are linked to each other inside the overlay network and how resources are indexed and located, we can classify P2P into 2 types.

-Unstructured

- Structured

Unstructured

Unstructured peer-to-peer networks do not impose a particular structure on the overlay network by design but are formed by nodes that randomly form connections.
Examples: Gnutella, Gossip, and Kazaa.
Because there is no structure globally imposed upon them, unstructured networks are easy to build and allow for localized optimizations to different overlay regions.
Also, because the role of all peers in the network is the same, unstructured networks are highly robust in the face of high rates of "churn"—that is, when large numbers of peers are frequently joining and leaving the network.

Gnutella

Routes files in an overlay graph
It has 5 main messages types

Query(search)
QueryHit(response to a query)
Ping(to probe the network for other peers)
Pong(reply to ping contains the address of another peer)
Push(used to initiate file transfer)

Query

This sends a search throughout a network looking for a particular file.

Descriptor ID|Payload descripter|TTL|Hops|payload length

QueryHit(0x81)

The successful result to a query

Avoid excessive traffic

To avoid duplicate transmissions. each peer maintains a list of recently received messages
The query is forwarded to all neighbours except the peer from which it was received.
Each query(identifier by DescriptorID) forwarded only once
Queryhit routed back only to peer from which Query received with the same descriptorID
For flooded messages, duplicates with the same DescriptorID and Payload descriptor are dropped
QueryHit with DescriptorID and payload descriptor are dropped.
QueryHit with descriptorID for which Query is not seen is dropped as well(if the overlay graph changes)

What happens after receiving query hit messages?

Requestor chooses "best" QueryHit and then initiates the download.

Initiates HTTP request directly to responders' ip+port

Responder replies with HTTP 200 ok, file size and the type and after this, the file packet.
What if responders are behind a firewall that disallows incoming connection?

Requestor send a 'push' to the responder asking for the file transfer
Once the push message is received, the responder can initiate a connection

What does the push message look like?

Push(0x40)
servent_id | fileindex | ip_address | port

address at which requestor can accept connections
If both the responder and requestor are behind a firewall, then Gnutella doesn't work

Ping(0x00) : this has no payload

Pong(0x01)

Port | ip_address | num.files shared | num. KB shared

Pings and pongs are used to update the nodes' neighbour information.
peers initiate pings periodically
Pings flooded out like Querys, pongs are routed reverse path like queryhits
pong replies used to update the set of neighbouring peers

this is done to keep lists fresh in spite of peers joining, leaving and failing

Problems with Gnutella

a large number of freeloaders
flooding causing excessive traffic

Gossip Protocol

Multicast sender
Uses UDP

Push gossip protocol

- When a node receives a message or gossip, it periodically passes it on to other nodes, and that node is said to be infected

- all infected nodes periodically multicast to other nodes

Pull gossip protocol

- Periodically poll a few randomly selected processes for new multicast messages that you haven't received and gets those messages

- If there are multiple such messages, it polls a few of them randomly

Hybrid variant

- Mix of both push and pull types

The push protocol is lightweight in large groups, spreads quickly and is highly fault-tolerant. Let us see why.

Analysis of the Gossip Protocol

Interestingly the analysis of the protocol is similar to the analysis of epidemic disease spread (cough cough covid)

So,

If we have a population of n+1 individuals mixing homogenously
If the contact rate between any individual pair is denoted to by B
At any time, each individual is either uninfected (numbering x) or infected (numbering y),
Then x+y is a constant equaling to n+1
Intuitively, we can say that if an infected and uninfected nodes communicate, the latter obviously turns infected.

After solving a couple of differential equations, we get

x=n(n+1)/(n+e^(B(n+1)t)) as t goes to infinity we can see this approaches 0, thus all of the nodes are infected. The next equation collaborates this.

y=n(n+1)/(n+e^(-B(n+1)t))

Taking B=b/n we end up with y=(n+1)-1/(n^(cb-2)) in log(n) time.

Wow, that's pretty fast, right?

That explains why COVID spread so quickly!

Analysis of pull protocol

Get prepared to put your graph theory to the test.
As a fact, all gossip protocols take O(log n) rounds before half of them get to gossip.
This is because its the fastest way to send a message. Construct a spanning tree with a constant degree of every node is O(log n).
Once this point is achieved, we find that the pull protocol is faster than the push protocol.
Let p(i) be the fraction of non-infected processes, after ith round, then
p(i+1)=(p(i))^(k+1)
This is super-exponential.
Thus the second half of the gossip protocol finishes in O(log(log n))

Mixing pull and push in both halves gives rise to a hybrid version.

This a brief idea of how peer -to -peer network communication happen under the hood

Hope you found this an interesting read. Thank you for reading!

Search This Blog

On a scale from one to Tan