View this post on the web at https://email.mg1.substack.com/c/eJxVUMtuhSAQ_ZrL7hpAXi5YdNPfICijkioYGNv498…
I was recently reminded of the fact that people use the term “peer-to-peer” to mean a variety of different things. That can make conversations on the topic difficult, as with any situation where you assume you have common ground, only to discover that is not the case.
In this interlude, I want to — really quite quickly — disambiguate some things, as a kind of reference for future conversations. You don’t need to agree with me. Though if you don’t, I’d be interested to hear about it!
Protocols
The most basic source of misunderstanding I encounter regards what exactly constitutes a protocol, never mind if it’s peer-to-peer or not.
The term is derived from the common, non-computer-science usage of the word, which some dictionaries define as e.g. “the forms of ceremony and etiquette observed by diplomats and heads of state”, or “a code of proper conduct”.
Applied to the realm of computing, this describes how components in a computer, or computers in a network discover and introduce themselves to each other, request or respond to requests for services, etc.
Taken a little into the more abstract realm than that, it implies that otherwise independent actors (processes, computers, etc.) maintain state on their understanding of a mutual exchange, to the extent that such state is necessary. They then send messages according to the rules of the protocol and their internal state, and update said state on receipt of messages.
It’s arguable that a protocol then consists of state machines and messages that correspond to state transition events. So far, most people I’ve encountered share this view more or less.
Where things drift apart is the question of the specifics of how these messages get exchanged. I encounter a lot of folk for whom this is not a relevant question. And from the perspective that the validity of the protocol is independent of such issues, they are entirely correct. But in practice, computers do not know how to exchange messages unless you tell them.
I spend a lot of my formative software engineering years reading requests for comments [https://email.mg1.substack.com/c/eJw9UMtuhSAQ_ZrLkgAi4IJFN_0Nw2O0pAoGxhr_vt…], the documents effectively outlining how to make the Internet work. And in a majority of cases, these documents either specify very precisely how to exchange messages, explicitly refer to a previously standardized way, or at minimum outline the expectations they have on such methods.
My expectations of discussions on networking protocols is that these considerations are included. And that is because not every lower-level networking protocol is created equal.
If, for example, your assumption is that messages can be JSON [https://email.mg1.substack.com/c/eJw9UMGOhSAM_JrHkQAC4oHDXvY3CEr1satgoK7x7x… and transported over HTTP [https://email.mg1.substack.com/c/eJw9UMtuhSAQ_ZrLkgDycsGim_6GQRi9pAoGxjb-fb…], then that’s fair in and of itself. But it’s also clear to me that on the one hand, you’re missing a discussion of HTTP endpoints, methods and status codes for a complete specification. And on the other hand, it’s also clear to me that you do not expect your network nodes to be behind NATs.
Peer
To be fair, the above is fine in principle. But when we’re discussing P2P networks, it also becomes necessary to clarify this a little. That’s especially the case today, where phrases such as “peer-to-peer lending” are flung around.
The suggesion in this phrase, as in the idea of P2P networks, is that the exchange is entered into as equals.
In practice, the phrase was coined to distinguish it from the client-server principle, in which clients make requests to servers, and servers respond to them. In peer-to-peer networks, either participant can make a request or respond to one.
The term is therefore also mired in the history of thin and fat clients. Throughout history of networked computers, the role of the client has shrunk and receded, always depending on whether it was more costly to process data or transfer it.
Without going into too many specifics here, however, it was always clear that servers exist to serve more than one client, whereas clients are effective “user agents” and represent a single user. The upshot of this is that servers were always meant to be more powerful in one way or another.
With dial-up internet and NATs, and all that kind of thing, there is a second angle to this: servers must be reachable, so have an address (IP or otherwise) that any eligible client can resolve and route to. Clients, by contrast, do not.
In a P2P setting where each node may or may not act as a user agent, this is a crucial consideration. Does the node have such a “public” address or not? If yes, is it perhaps more of a server in effect? If not, should it be classified more as a client?
If you want to make a peer-to-peer network truly a network of peers, such distinctions must not matter. That is, you must find ways to directly connect peers irrespective of whether they own a “public” address, such as via e.g. STUN [https://email.mg1.substack.com/c/eJw9UMtuxCAM_JrliALhlQOHXnrspe0ZkeBk0SYQgd…] or similar methods. Arguably if you do not, then your network is not peer-to-peer.
Note: the “peer” I’m referring to in the title of this blog is explicitly a TCP “peer”; the message “connection reset by peer” corresponds to receiving a TCP RST packet which resets the connection. The “peer” here is the remote end of the TCP connection.
Distribution vs Decentralization
The above view goes right back to Paul Baran’s distinction of distributed vs decentralized networks, as expressed in his 1964 memorandum On Distributed Computing [https://email.mg1.substack.com/c/eJw9kM2OrSAMgJ_msCSAgrhgMZvZ3c19AYJSlYyAgT…] for the RAND organization.
The diagram above shows clearly that a distributed network is only achievable if any peer can directly engage with any other peer. If there are peers that (by design, not happenstance) form communications hubs, at best the network is decentralized.
Distribution is a necessary requirement for peer-to-peer networks. It’s still possible for some peers to take on different roles, but if their distinction is based on their limited or enhanced ability to facilitate basic connectivity, then the network degrades to decentralization.
Pure P2P
In some classifications of P2P networks, there is a distinction between so-called pure and hybrid P2P networks. There may also be mention of central P2P networks.
The description above essentially describes pure P2P networks. Central P2P networks have peers engage directly with each other in some forms of communication, but initiate a conversation via a central instance. Hybrid P2P networks… well this is where the fun starts.
Hybrid P2P
As the name implies, hybrid P2P networks combine some parts of P2P with some parts of, well, other models. They’re not central, though, in that there is no single central entity which the network relies on.
A loose classification would be that there are some nodes in the network that take on a more central role, but they’re not predetermined. This definition probably works across all interpretations of hybrid networks.
But there are stricter classifications which suggest that a network topology is derived from this. One such suggestion essentially divides the roles of nodes into those forming a pure P2P network in their self-organization. A common term for these nodes is “super nodes” or “super peers”, but the term has issues I’ll get into below.
In this view, super nodes provide services to any other node in the network, the essence of which is that most (leaf) nodes take on a client role, whereas the super nodes take on a server role. The peer-to-peer part here consists predominantly of the fact that super nodes self-organize to mitigate failures of some nodes and accommodate newcomers.
This model of hybrid P2P networks is strongly associated with blockchain. There are many people who treat blockchain as essentially peer-to-peer, which is true only for some interpretations of what it means to be a peer (see above). On the basis of that interpretation, a blockchain is essentially a distributed database managed by all participating super nodes (or full nodes in blockchain parlance). By contrast, any other nodes (light nodes in blockchain terms) are the client nodes.
I do not share this interpretation.
I do not share it for two reasons: a) it is historically inaccurate, and b) it is unnecessarily restrictive.
First off, it is unnecessarily restrictive because it immediately pushes the notion of a peer-to-peer network away from distribution towards decentralization, which cannot provide the same affordances to leaf nodes acting as user agents as a distributed network can. Refer to Baran for well-outlined reasons.
More personally speaking, it is historically inaccurate. And though that is in itself the lesser reason to dislike this view, it is also the more personal one, and leads on to a much stronger argument for challenging this interpretation.
Skype as a Hybrid P2P Network
Enter Skype. Or Joost. Or KaZaA. Or Joltid [https://email.mg1.substack.com/c/eJw9kMtuhSAQhp_msCSAcnHBopu-hkEYPbQKBsYa-_…].
You probably haven’t heard of some of these.
All are ventures started by Janus Friis and Niklas Zennström, and all employ peer-to-peer technology. In particular, they employ technology managed by Joltid, a company they held to manage this technology.
I joined Joost in 2006, when it was still a startup in stealth mode. Joost provided peer-to-peer video streaming of a quality near to digital TV (just below DVD), at a time when YouTube was limited to a few minutes of low resolution cat videos. The product was based on Joltid’s P2P stack.
After its launch, we moved from on-demand video to live streaming. In 2008, we streamed March Madness live [https://email.mg1.substack.com/c/eJw9kM2OhCAMx59muGkAAfHAYS_7GgalOuwqGKiz8e…] to an audience of about a thousand users in an early beta.
In order to do this work, we had to understand Joltid’s P2P software, and not merely use it, and as it so happened in our team numbering a handful, that task fell to me. I remember I took way longer than anyone wanted to break through, but eventually produced a detailed document on this software’s inner workings that helped us make some crucial decisions in how to approach live streaming a little differently.
The point of this anecdote is, at this time, the term hybrid P2P term meant something entirely different from the blockchain-centric view above. What’s more, it was the same more flexible definition I gave above: merely that some nodes take on extra roles in an otherwise fully distributed network.
Of course, I signed an NDA back then. Of course, the NDA has expired by now. Of course, I don’t remember every detail from a decade and a half ago. However, there is one specific thing I recall very clearly, and there is a public record of it [https://email.mg1.substack.com/c/eJw9UMuOwyAM_JpyRECAkAOHXvY3Ih5Oi5ZABM5W-f…]:
On Thursday, 16th August 2007, the Skype peer-to-peer network became unstable and suffered a critical disruption. The disruption was triggered by a massive restart of our users' computers across the globe within a very short timeframe as they re-booted after receiving a routine set of patches through Windows Update.
The high number of restarts affected Skype's network resources. This caused a flood of log-in requests, which, combined with the lack of peer-to-peer network resources, prompted a chain reaction that had a critical impact.
The crucial point here is this: personal computers restarting meant Skype nodes going offline for a while.
How is that possible? It’s possible because in what is otherwise a pure, i.e. distributed P2P network, some nodes took on special roles, which turned them into super nodes — a term derived directly from the few descriptions Skype issued about their protocol at the time.
These nodes were peers in (almost) every sense: they did not necessarily have public IP addresses, they did not run in data centers. The only difference is that the Skype network voted for them to take on a special organizing role, providing more persistence to the network’s index. My document was to a large part concerned with what criteria made a node eligible, and how the voting progressed, none of which is particularly relevant here.
What the public discussion of the events did not refer to, however, is the role Joost played in any of this.
You see, there is a bootstrapping problem in this. How can a newly installed Skype instance know which super nodes to contact for indexing — or which nodes at all, period? It turns out that for this purpose, Skype was running two dedicated super nodes, as a way to bootstrap new nodes. Over time, Skype added more discovery methods for super nodes.
So when you started your Skype software, it would contact some of the last known super nodes instead of these dedicated two. Only when that list could not be reached would the dedicated super nodes be contact again, which in practice was — never.
In fact, this pure P2P approach worked so well that Skype simply stopped operating those two dedicated super nodes. But when all super nodes restarted due to a Windows update, suddenly they were in demand again, and couldn’t be reached.
Enter a phone call to Joost. We were running dedicated super nodes for our video network, and our video network no longer needed them full-time. We just happened to keep operating them. With a bit of a configuration jiggle and a recompile, our dedicated super nodes went live again pretending to be Skype’s, and the Skype network could rebuild.
This long story should provide some key insights into hybrid P2P networks:
In 2007, they had central components for bootstrapping the system.
They typically operated in a pure P2P mode.
In no way did “light” nodes exist that merely connected to “full” nodes, by whichever terminology.
The thing that frustrates me here isn’t that terms change their meanings. I don’t massively care what is taught nowadays in universities as “hybrid” P2P networks.
What I do care about, and passionately so, is that this rewrite of history means that modern-day students cannot even conceive of how amazingly peer-to-peer (in every sense) the Skype network was, because they lack the terminology to describe it. And if you can’t conceive of something, it becomes that much harder to build anything like it again.
Conclusion
In summary, when I engage in conversations on P2P networks, I carry with me a number of assumptions that are not commonly shared. It will sometimes make it difficult to explain things, also in connection with the Interpeer Project [https://email.mg1.substack.com/c/eJxVUEtuxCAMPc2wREDCb8Gim7lGBMTMoCYQgdMqty…].
In summary, I suppose it’s fair to say that I take the following view:
I consider the point of peer-to-peer networks to provide resilience and fairness.
Resilience refers to distribution as Baran argues for it.
Fairness refers to a node being able to represent a user’s needs directly.
Both require nodes to be able to directly connect to each other, in a pure P2P fashion; consequently, if some nodes must mediate communications between others by inherent design, the network fails at being resilient or fair.
This does not preclude nodes from taking on special roles within the network. The generalized view of the above is that any special role a node enters must not be by prior design or otherwise create a fixed hierarchy or resilience and fairness are at risk.
A quick note on subscriptions: I’m trying to keep doing this work, and have set up subscriptions so that people can help me pay my bills. That means some articles are effectively paywalled — if you can’t pay, fear not: there is a special launch promotion open.
You’ll get free lifetime access, and I can keep the paywall up for folks coming from the outside. And if you are in the right space to be extra awesome and pay for a subscription, that’s all the more appreciated!
Unsubscribe https://email.mg1.substack.com/c/eJxVUsuumzAQ_ZqwKzIGDCy8SBPlNlFJbiqa1wYZe0…
Hi!
This is a test message to check the new mailing list. But it's also good for a quick welcome text in the archives.
This is a public mailing list, so archives and signup and configuration will be available at
https://lists.interpeer.io/mailman/listinfo/interpeer . It is general purpose discussion list about the Interpeer project.
Which means it's not an announcement list (though we/I may make announcements here). But since there are no end users at the moment, it's probably something close to a development list.
If it becomes apparent that separate announcement and development lists are necessary, we will add them.
Enjoy!
Jens