Documentation site of NextGraph.org https://docs.nextgraph.org
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
docs-site/src/pages/en/architecture.md

84 lines
6.3 KiB

---
title: Architecture
description: General Architecture of NextGraph
layout: ../../layouts/MainLayout.astro
---
4 months ago
Before you explore the architecture of NextGraph, some important concepts have already been explained in the [previous chapters](/en/design) of this documentation, and we encourage you to read them if you haven't already.
4 months ago
Data is stored into Repositories that contain branches that themselves organize commits in the form of a **DAG (Directed Acyclic Graph)**.
For the purpose of transport and storage, commits are broken down into fixed-size blocks, each one encoded with convergent encryption.
The commits are of two types: maintenance operations and transactions. The transactions mutate the data and are passed as is to the application layer, while the maintenance ops change permissions and other metadata about the repository. We will focus now on those maintenance operations, as the transaction format has already been the subject of previous chapters c.f. [crdts](/en/framework/crdts).
Peers that want to access a Repository need to join an overlay, and subscribe to the relevant pub/sub topic of the repository in order to receive updates. If the user is granted write access, its peer will also advertise itself as a publisher of that topic. There is a simple mapping of one pub/sub topic for each branch of a repository. But topics can be renewed in case a member is removed from the repository (the topic_id changes, but not the branch_id !).
All the mechanisms described here are handled for you by the libraries you use in our SDKs. You do not have to deal with all the protocols directly.
### Network
Peers that interact on the same set of Repositories participate in an Overlay network that isolate them from other hosts on the IP network.
The peers are organized in different topologies according to their role.
We distinguish 2 roles :
- Broker : maintains the Pub/Sub and forwards events (that contain commits), while being blind to the content they transmit (E2EE, zero knowledge).
- Verifier : decrypts and verifies the commits, materializes and saves the state, and passes the update to the application
The brokers have 2 interfaces:
- they participate between each other, to the Core Network, which is a P2P network with topology of a **general undirected graph**, using the Core Protocol.
- they expose an interface to Verifiers that use the Client Protocol. This is a typical client-server communication, with star topology.
A verifier only connects to one broker at a time, and needs to register and authenticate in order to access the broker.
Verifiers do not talk to each other (Except in some rare cases when reconciliation is needed. Not implemented yet). Instead they talk to their Broker, and it is the Broker who does all the job of maintaining the pub/sub and forwarding events.
Eventually, an Application connects to a Verifier (locally or remotely) and exchanges with it using the App protocol. This protocol exchanges plaintext data.
### Overlay
Overlays are abstract structures that have an ID and that peers join and leave.
In order to access the content of a repository, its overlay should first be joined.
Repositories are grouped inside overlays, with one Overlay by Store. (see about the concept of [Store here](/en/documents)).
More details about the network and overlays in the [Network chapter](/en/network)
### Repository
Each NextGraph Document is managed internally by one Repository identified with a **unique identifier**, that is a public key randomly generated at creation time. This ID will never change over time. We call this a RepoID.
Inside a Repo, there are several branches that coexist. The root branch holds all the meta-data and permissions. The main branch holds the main content of the document, that is displayed first when opening the Document.
Additional branches are here to support the features of "forking a branch" and the "blocks" feature as explained in the [Document chapter](/en/document).
Please do not confuse between 2 distinct concepts of "blocks" in NextGraph. One is a concept used by end-users who want to create content blocks inside a Document, like what Notion is doing. These are the blocks detailed in the Document chapter above. It is a terminology used for end-users.
But now we will be talking about another type of block. Those are blocks that are chunks of data, like what BitTorrent is doing when encoding files to share. This is a technical and internal terminology and it is of no interest to the end-user. We just explain it here for developers, contributors and those who are curious about the internals of our protocol.
### Branches and blocks
Every branch starts with a singleton commit that we call the Root commit.
Commits hold CRDT operations that can happen concurrently, hence creating forks in the DAG. For this reason, the current state of a branch, that we call the HEAD, is sometimes composed of several concurrent commits. In this case, the DAG is a Semilattice. When all the commits of the HEAD are merged into a single next commit, the HEAD is composed of only one commit and the DAG is then a lattice.
For the purpose of transport and storage, commits are broken down into fixed-size blocks, each one encoded with convergent encryption.
For each block, we hold a secret key that was used to encrypt the block with symmetric encryption (ChaCha20), that key being a keyed hash of the plaintext. We also hold an ID of the block which is a hash of the ciphertext.
Blocks are combined together in a Merkle tree structure. The root block of this tree gives its ID and key to the Object that was chunked into blocks. This ID and key is called an `ObjectRef`.
There are also Objects that are not Commits, and that hold some binary information, like Files, and other system Objects (Signature, Certificate, etc...).
IDs and keys are 33bytes long (32+1 for the version). The ID is a Blake3 Hash, while the key was obtained from a Blake3 keyed hash.
The convergent encryption is useful for deduplication of content and for avoiding nonce/IV reuse and key management in a decentralized system.
The ID acts as a "encrypt-then-MAC" mechanism that helps enforce integrity.
We will now dive more into details about repositories, branches, blocks and commits, that compose the [NextGraph Protocol](/en/protocol), but we will first detail more about our [Network](/en/network).