Peer to peer networking




So I'm just starting learning about Peer- to- Peer networks. Seems like an interesting topic, excited to see where it takes me.


Let's start out with scouring Wikipedia and what it has to say about P2P.

-  The P2P network is based on the notion that every node in the network function both as a client and server to other nodes in the network.
-   Hence there is no central server

Whats an overlay network?

An overlay network is a computer network that is layered on top of another network.
Nodes in the overlay network can be thought of as being connected by virtual or logical links, each of which corresponds to a path, perhaps through many physical links, in the underlying network.

 How does routing and resource discovery take place?


All peer to peer networks have a virtual overlay network on top of the physical network topology, where
the nodes in the overlay form a subset of nodes in the physical network.

Data is still exchanged over TCP/IP, but in the application layer, peers can talk to each other directly through the logical overlay links.
Peer-to-peer systems become independent from physical network topology as overlays are used for indexing and peer discovery.

Depending on how the nodes are linked to each other inside the overlay network and how resources are indexed and located, we can classify P2P into 2 types.

-Unstructured 
- Structured

Unstructured



  • Unstructured peer-to-peer networks do not impose a particular structure on the overlay network by design but are formed by nodes that randomly form connections.
  • Examples: Gnutella, Gossip, and Kazaa.
  • Because there is no structure globally imposed upon them, unstructured networks are easy to build and allow for localized optimizations to different overlay regions. 
  • Also, because the role of all peers in the network is the same, unstructured networks are highly robust in the face of high rates of "churn"—that is, when large numbers of peers are frequently joining and leaving the network.


Gnutella


  • Routes files in an overlay graph
  • It has 5 main messages types
  1. Query(search)
  2. QueryHit(response to a query)
  3. Ping(to probe the network for other peers)
  4. Pong(reply to ping contains the address of another peer)
  5. Push(used to initiate file transfer)
Query

This sends a search throughout a network looking for a particular file.
Descriptor ID|Payload descripter|TTL|Hops|payload length

QueryHit(0x81)
The successful result to a query

num. hits    |    port    |    ip_address    |speed    |    (fileindex,filename,fsize)    |    servent_id    


Avoid excessive traffic

  • To avoid duplicate transmissions. each peer maintains a list of recently received messages
  • The query is forwarded to all neighbours except the peer from which it was received. 
  • Each query(identifier by DescriptorID) forwarded only once
  • Queryhit routed back only to peer from which Query received with the same descriptorID
  • For flooded messages, duplicates with the same DescriptorID and Payload descriptor are dropped
  • QueryHit with DescriptorID and payload descriptor are dropped.
  • QueryHit with descriptorID for which Query is not seen is dropped as well(if the overlay graph changes)


What happens after receiving query hit messages?


  • Requestor chooses "best" QueryHit and then initiates the download.
    • Initiates HTTP request directly to responders' ip+port
  • Responder replies with HTTP 200 ok, file size and the type and after this, the file packet.
  • What if responders are behind a firewall that disallows incoming connection?
    • Requestor send a 'push' to the responder asking for the file transfer 
    • Once the push message is received, the responder can initiate a connection
  • What does the push message look like?
    • Push(0x40)
    •     servent_id    |    fileindex    |    ip_address    |    port
  •                                                      address at which requestor can accept connections
  • If both the responder and requestor are behind a firewall, then Gnutella doesn't work

  • Ping(0x00) : this has no payload

  • Pong(0x01)
    •     Port    |    ip_address    |    num.files shared    |    num. KB shared    

  • Pings and pongs are used to update the nodes' neighbour information.
  • peers initiate pings periodically
  • Pings flooded out like Querys, pongs are routed reverse path like queryhits
  • pong replies used to update the set of neighbouring peers
    • this is done to keep lists fresh in spite of peers joining, leaving and failing



Problems with Gnutella

  • a large number of freeloaders
  • flooding causing excessive traffic




Gossip Protocol


  • Multicast sender
  • Uses UDP

Push gossip protocol


    - When a node receives a message or gossip, it periodically passes it on to other nodes, and that node is said to be infected
    -  all infected nodes periodically multicast to other nodes
    

Pull gossip protocol


    - Periodically poll a few randomly selected processes for new multicast messages that you haven't received and gets those messages
    - If there are multiple such messages, it polls a few of them randomly


Hybrid variant

    - Mix of both push and pull types
 

The push protocol is lightweight in large groups, spreads quickly and is highly fault-tolerant. Let us see why.


Analysis of the Gossip Protocol


Interestingly the analysis of the protocol is similar to the analysis of epidemic disease spread (cough cough covid) 


So,
  • If we have a population of n+1 individuals mixing homogenously
  • If the contact rate between any individual pair is denoted to by B
  • At any time, each individual is either uninfected (numbering x) or infected (numbering y),
  • Then x+y is a constant equaling to n+1
  • Intuitively, we can say that if an infected and uninfected nodes communicate, the latter obviously turns infected.


After solving a couple of differential equations, we get

x=n(n+1)/(n+e^(B(n+1)t))  as t goes to infinity we can see this approaches 0, thus all of the nodes are infected. The next equation collaborates this.
y=n(n+1)/(n+e^(-B(n+1)t))



Taking B=b/n  we end up with y=(n+1)-1/(n^(cb-2)) in log(n) time.

Wow, that's pretty fast, right?


That explains why COVID spread so quickly!  


Analysis of pull protocol


  •  Get prepared to put your graph theory to the test.
  • As a fact, all gossip protocols take O(log n) rounds before half of them get to gossip.
  •         This is because its the fastest way to send a message. Construct a spanning tree with a constant degree of every node is O(log n).
  •  Once this point is achieved, we find that the pull protocol is faster than the push protocol.
  • Let p(i) be the fraction of non-infected processes, after ith round, then
  •     p(i+1)=(p(i))^(k+1)
  • This is super-exponential.
  •  Thus the second half of the gossip protocol finishes in O(log(log n))


Mixing pull and push in both halves gives rise to a hybrid version.





This a brief idea of how peer -to -peer network communication happen under the hood
 Hope you found this an interesting read. Thank you for reading!














    







Comments

Popular posts from this blog

P.S. It's easy

Decathlon delivery charge scam

Internship drive season