@ -4,6 +4,80 @@ description: General Architecture of NextGraph
layout: ../../layouts/MainLayout.astro
---
Please bare with us as we are currently writing/publishing the documentation (23 of August until 28 of August 2024).
Before you explore the architecture of NextGraph, some important concepts have already been explained in the [previous chapters](/en/design) of this documentation, and we encourage you to read them if you haven't already.
One article is ready in this section, about [CRDTs](/en/framework/crdts) check it and stay tuned in the coming days for the rest.
Data is stored into Repositories that contain branches that themselves organize commits in the form of a **DAG (Directed Acyclic Graph)**.
For the purpose of transport and storage, commits are broken down into fixed-size blocks, each one encoded with convergent encryption.
The commits are of two types: maintenance operations and transactions. The transactions mutate the data and are passed as is to the application layer, while the maintenance ops change permissions and other metadata about the repository. We will focus now on those maintenance operations, as the transaction format has already been the subject of previous chapters c.f. [crdts](/en/framework/crdts).
Peers that want to access a Repository need to join an overlay, and subscribe to the relevant pub/sub topic of the repository in order to receive updates. If the user is granted write access, its peer will also advertise itself as a publisher of that topic. There is a simple mapping of one pub/sub topic for each branch of a repository. But topics can be renewed in case a member is removed from the repository (the topic_id changes, but not the branch_id !).
All the mechanisms described here are handled for you by the libraries you use in our SDKs. You do not have to deal with all the protocols directly.
### Network
Peers that interact on the same set of Repositories participate in an Overlay network that isolate them from other hosts on the IP network.
The peers are organized in different topologies according to their role.
We distinguish 2 roles :
- Broker : maintains the Pub/Sub and forwards events (that contain commits), while being blind to the content they transmit (E2EE, zero knowledge).
- Verifier : decrypts and verifies the commits, materializes and saves the state, and passes the update to the application
The brokers have 2 interfaces:
- they participate between each other, to the Core Network, which is a P2P network with topology of a **general undirected graph**, using the Core Protocol.
- they expose an interface to Verifiers that use the Client Protocol. this is a typical client-server communication, with star topology.
A verifier only connects to one broker at a time, and needs to register and authenticate in order to access the broker.
Verifiers do not talk to each other (Except in some rare cases when reconciliation is needed. not implemented yet). Instead they talk to their Broker, and it is the Broker who does all the job of maintaining the pub/sub and forwarding events.
Eventually, an Application connects to a Verifier (locally or remotely) and exchanges with it using the App protocol. This protocol exchanges plaintext data.
### Overlay
Overlays are abstract structures that have an ID and that peers join and leave.
In order to access the content of a repository, its overlay should first be joined.
Repositories are grouped inside overlays, with one Overlay by Store. (see about the concept of [Store here](/en/documents)).
More details about the network and overlays in the [Network chapter](/en/network)
### Repository
Each NextGraph Document is managed internally by one Repository identified with a **unique identifier**, that is a public key randomly generated at creation time. This ID will never change over time. We call this a RepoID.
Inside a Repo, there are several branches that coexist. The root branch holds all the meta-data and permissions. The main branch holds the main content of the document, that is displayed first when opening the Document.
Additional branches are here to support the features of "forking a branch" and the "blocks" feature as explained in the [Document chapter](/en/document).
Please do not confuse between 2 distinct concepts of "blocks" in NextGraph. One is a concept used by end-users who want to create content blocks inside a Document, like what Notion is doing. These are the blocks detailed in the Document chapter above. It is a terminology used for end-users.
But now we will be talking about another type of block. Those are blocks that are chunks of data, like what BitTorrent is doing when encoding files to share. This is a technical and internal terminology and it is of no interest to the end-user. We just explain it here for developers, contributors and those who are curious about the internals of our protocol.
### Branches and blocks
Every branch starts with a singleton commit that we call the Root commit.
Commits hold CRDT operations that can happen concurrently, hence creating forks in the DAG. For this reason, the current state of a branch, that we call the HEAD, is sometimes composed of several concurrent commits. In this case, the DAG is a Semilattice. When all the commits of the HEAD are merged into a single next commit, the HEAD is composed of only one commit and the DAG is then a lattice.
For the purpose of transport and storage, commits are broken down into fixed-size blocks, each one encoded with convergent encryption.
For each block, we hold a secret key that was used to encrypt the block with symmetric encryption (ChaCha20), that key being a keyed hash of the plaintext. We also hold an ID of the block which is a hash of the ciphertext.
Blocks are combined together in a Merkle tree structure. The root block of this tree gives its ID and key to the Object that was chunked into blocks. This ID and key is called an `ObjectRef`.
There are also Objects that are not Commits, and that hold some binary information, like Files, and other system Objects (Signature, Certificate, etc...).
IDs and keys are 33bytes long (32+1 for the version). The ID is a Blake3 Hash, while the key was obtained from a Blake3 keyed hash.
The convergent encryption is useful for deduplication of content and for avoiding nonce/IV reuse and key management in a decentralized system.
The ID acts as a "encrypt-then-MAC" mechanism that helps enforce integrity.
We will now dive more into details about repositories, branches, blocks and commits, that compose the [NextGraph Protocol](/en/protocol), but we will first detail more about our [Network](/en/network).
@ -4,4 +4,6 @@ description: NextGraph protocols and security claims have been audited
layout: ../../layouts/MainLayout.astro
---
Please bare with us as we are currently writing/publishing the documentation (23 of August until 28 of August 2024).
So far NextGraph didn't go through any Security Audit. We are hopeful that this will happen at the beginning of 2025, thanks to the help received from NLnet Foundation.
@ -4,4 +4,12 @@ description: Reference of the CLI of NextGraph
layout: ../../layouts/MainLayout.astro
---
Please bare with us as we are currently writing/publishing the documentation (23 of August until 28 of August 2024).
The CLI can be used to administer your broker, or simply access your data from the command-line.
We have released a [preview version](https://nextgraph.org/releases/#cli-ngcli) of the CLI that lets you, for now, verify the signature and download a snapshot of your Document, by issuing the command :
```
ngcli get did:ng:o:...[full Nuri as given to you in the App]
```
You will get the exact command to run when using the tool for signatures in the `Document Menu / Tools / Signature` of our Apps.
Please bare with us as we are currently writing/publishing the documentation (23 of August until 28 of August 2024).
In the previous chapters, we already introduced the concepts of :
- [Local-First](/en/local-first) and [Encryption](/en/encryption) where explain why we need a novel E2EE sync protocol, and what is the “2-tier” topology of our network.
- [Data-First](/en/framework/data-first) and [CRDTs](/en/framework/crdts) that bring interoperability, and how [DID URIs](/en/framework/nuri) bring portability to the [Documents](/en/documents) managed by our [Semantic Web](/en/framework/semantic) database.
We will now dive a bit deeper on some other core concepts and the rationals behind them, and then on the internals of the protocols and formats.
### History of the project
_until August 2024_
I am carrying this project since 15 years. I carried it sometimes in my heart as a wish, sometimes in my mind, with some ideas and thoughts, and sometimes on my shoulders, with experiments and coding, specially in the last 3 years.
The first idea goes back 15 years ago, and it was about data-first and malleable/composable software, very much inspired by the vision of Tim Berners-Lee about Linked Data and the Giant Global Graph (GGG) that he envisioned back then, as the successor of the WorldWideWeb. At the time I was working in big-data startups.
In 2013, the Snowden revelations confirmed the suspicions we all had, about mass surveillance and the absence of privacy online. It is at that moment that I looked closer into security and encryption. I also quit my job at that time and started to dedicate more time toying with some prototypes.
In the following years, I made a first attempt at building a system based on graph databases (JanusGraph) and I bumped into several issues, specially about consistency of the database in a decentralized system. It turned out that Java wasn't the best language for what I was doing, and that Property Graphs weren't neither. I had originally discarded SPARQL and RDF because it looked rebutting, but after playing with WikiData model, I understood it was very powerful and I then fully embraced RDF and SPARQL in 2019.
In Early 2020 I attended the first P2P festival in Paris where I expressed my desire to explore the convergence between Semantic Web and P2P technologies. At the same time, I met with the people of [Virtual Assembly](https://assemblee-virtuelle.org/) in France which is an association that acts as a hub of many projects related to Semantic Web and Ecological Transition. As I am fond of both subjects, it was a match instantly.
They presented me with the new project they were working on: the refactoring of SemApps, a toolkit for Semantic Applications. I joined them and implemented the ACL module (WAC-WebACL) inside a plugin of Jena, the triplestore that SemApps is based on for now.
In 2021, I did a quick test of IPFS and IPLD, and realized it wasn't suitable for what I envisioned, and then I got acquainted with CRDTs and understood it was the key I was missing so far. Still, I couldn't see any framework or software that was already offering such a combination of Semantic Web, P2P, CRDTs, and E2EE. At the time, SemApps was getting more and more involved with ActivityPub, under the direction of Sebastian, and I contributed a bit to that too, before focussing my energies entirely on NextGraph, that I founded at the end of 2021.
### P2Pcollab and Lofi.re
At the same time, at the end of 2021, I was redirected to a small group of developers who had gathered on similar technologies, but had recently disbanded. Some of them had received grants from NLnet, a funding organization that I didn't know until then. I proposed the project NextGraph to the former members of that group that had disbanded, but the chemistry didn't work. Eventually I stayed in touch with T.G. of [P2Pcollab](https://p2pcollab.net/), who was one of them, and who had researched P2P networks and decentralized data sharing before.
In the first half of 2022, he was working on a draft design that he later named [Lofi.re](https://lofi.re/) and I partly contributed to this design. He also transferred to me all his knowledge about P2P network and I learned about his design too. T.G. had a grant to finish and I helped him with the code in October 2022, as I was learning Rust, that I insisted should be the new language of this implementation, because I had already decided it would be the language for NextGraph. Eventually, the project Lofi.re stopped and I applied for my own grant at NLnet, which led to the present work.
As a consequence, the design of NextGraph network protocol is partly inspired by the previous work of Lofi.re by T.G. of P2PCollab, of which I took several components. I was familiar with his work, thanks to our previous cooperation, during which I contributed to the design of the immutable blocks and convergent encryption.
Several concepts that are implemented in NextGraph, come from Lofi.re: the 2-tier topology, the overlays, the data chunked in blocks, the pub/sub mechanism, the organization of repos into branches.
But unfortunately, this previous work in its whole, was flawed in several ways, and I had to revise it, correct it, improve it and adapt it to the needs of NextGraph, and confront it with real use-cases. The design was too abstract, and unsound in some parts. Eventually, NextGraph protocol is not compatible with the draft protocol of LoFi.Re.
What we didn't take from this draft design: the explicit ACKs, the permission system, the signature system, the core protocol (as it claimed to implement LoCaPs but wasn't effectively doing it), the sync protocol, the binary file format, and more.
More features were added, like the distinction between Inner and Outer overlays, one overlay by Store instead of one Overlay by Repo, the ReadCap for an overlay and repo that is very different from the secret_keys of the original design, the refresh mechanism for capabilities and epoch (which wasn't specified at all in Lofi.re), the threshold signature mechanism with certificates, etc…
Overall, I want to thank T.G. for his good contribution with his draft design (which was supported by another grant from NLnet), and for sharing with me his good intuition and knowledge which made the network stack of NextGraph possible.
With NextGraph, we hope that P2Pcollab and all the people interested in the previous work of T.G., will find a suitable implementation of those ideas, that would match their own needs. Feel free to contact us and let us know if NextGraph suits you.
Now after this introduction, let's dive into the [architecture of NextGraph](/en/architecture)
@ -34,7 +34,7 @@ If you are curious about why we implemented this dual nature of documents, read
### Repository
Each NextGraph document is stored in a separate repository, that holds all the changes (that we call "commits") and also all the [permissions](/en/framework/permissions). The commits of such repository mutate both the RDF resource and the discrete document.
Each NextGraph document is stored in a separate [repository](/en/protocol), that holds all the changes (that we call "commits") and also all the [permissions](/en/framework/permissions). The commits of such repository mutate both the RDF resource and the discrete document, and can add and remove binary files.
Each Document is identified with a **unique identifier**, that is a public key randomly generated at creation time. This ID will never change over time. Internally we call this a RepoID.
@ -154,6 +154,8 @@ And your public profile/store can be found with the icon that represents one use
We said that you can create **Groups** when you want to share, exchange and collaborate with other users. In fact, each Group is a separate Store. This is helpful because each group can have its own set of permissions, and then you can configure the store so that all the documents included in this store, inherit the permissions of the store. This way, we can manage the group easily. If we add a member to the group, they immediately get access to all the documents in the group. The same happens when you remove a user from the group: they loose access to all the documents at once.
Because most of the time, collaboration on Documents happens within a group of users, that will probably share more than one document between each other, the Store regroups all the Documents that such Group wants to interact with. Not to confuse this Group with the E2EE group i was referring to in the [Encryption chapter](/en/encryption). In fact, in NextGraph terminology, we never talk about any E2EE group. Instead we call that a Repo, or a Document. A Repo is basically the equivalent of an E2EE group for one and only one Document. So when we talk about a Group, we are referring instead to a Store that gathers all the users and their peers, and where they will be able to share several Documents (or Repos if you prefer).
As you will see, a Store can be organized into folders and sub-folders, making it very easy to keep tidy. It is the equivalent of the "drive" that you have been using in cloud-based sharing system.
Write permissions are managed at the level of the Document, not at the level of the branch or block. But read permissions can be by block or branch. See more about the [permissions here](/en/framework/permissions).
@ -109,7 +109,7 @@ Now let's have a look at what those CRDTs have in common and what is different b
| **isolated transactions** | ✅ | ✅ | ✅ |
| <tdcolspan=3> A NextGraph transaction can atomically mutate both the Graph and the Discrete data in a single isolated transaction. Can be useful to enforce consistency and keep in sync between information stored in the discrete and graph parts of the same document. but: transactions cannot span multiple documents (for that matter, see **smart contracts**). When a SPARQL Update spans across Documents, then the Transaction is split into several ones (one for each target Document) and each one is applied separately, meaning, not atomically. Also, keep in mind, as explained above in the "Counter" section, that CRDTs are eventually consistent. If you need ACID guarantees, use a synchronous transaction instead. |
| <tdcolspan=3> 🔥 this is planned. will be available shortly. the store will be **writable** and will allow a bidirectional binding of the data to some javascript reactive variables in Svelte (same could be done for React/Redux) and we are considering the use of **Valtio** for a generic reactive store, that would also work on nodeJS and Deno |
| <tdcolspan=3> 🔥 this is planned. will be available shortly. the store will be **writable** and will allow a bidirectional binding of the data to some javascript reactive variables in Svelte (same could be done for React) and we are considering the use of **Valtio** for a generic reactive store, that would also work on nodeJS and Deno |
| <tdcolspan=3> (\*) support is planned at the NextGraph level, to be able to query discrete data too in SPARQL. (GraphQL support could then be added) |
@ -22,7 +22,7 @@ For the developer of an App, this means that the data is accessed and manipulate
Developers of modern apps today often use some front-end frameworks like **React** and **Svelte**.
The data they manipulate is often stored in a **reactive store** (called Redux in React, or Runes in Svelte) that needs to be configured and plugged to a backend system with some APIs that can range from WebSocket and GraphQL to HTTP/REST APIs to a MYSQL and so on...
The data they manipulate is often stored in a **reactive store** (Runes in Svelte, probably useContext/useState in React) that needs to be configured and plugged to a backend system with some APIs that can range from WebSocket and GraphQL to HTTP/REST APIs to a MYSQL and so on...
> With NextGraph, we provide the developer with a reactive store (for React, Svelte, and Deno/Node) and that's all they have to worry about. NextGraph transparently synchronizes, encrypts and deals with permissions for you.
@ -4,4 +4,98 @@ description: The P2P (peer-to-peer) Network of NextGraph, composed of Brokers, P
layout: ../../layouts/MainLayout.astro
---
Please bare with us as we are currently writing/publishing the documentation (23 of August until 28 of August 2024).
### Pub/Sub
As we already explained above, the network of NextGraph is composed of Brokers that talk to each other in the Core Network. Then Clients that are attached to one specific broker, only talk to that broker in a client-server manner. This is the 2-tier topology.
The brokers are always online and deal with events received from some client, that they forward to other brokers in the core network, that in turn dispatch them to the clients that are connected to them.
The events are end-to-end-encrypted and only the clients can decrypt them.
Events belong to a Pub-Sub topic. The broker is in charge of dealing with the maintenance of each Pub-Sub topic. Clients tell their broker that they want to subscribe to a specific topic in the pub-sub (read access), and can also tell their broker that they are going to publish in a specific topic (read-write access).
In order to subscribe to a Topic, the only thing the broker needs to know from the client, is the TopicID, which is a public key.
But when a client wants to publish in a topic, it has to prove that it is in possession of the private key of the topic, by signing the event with this private key.
The broker does not have this private key, but it will verify that the signature is valid, in regards to the public key (the topic ID).
In addition to that, the broker that forwards the events in the core network (in an overlay, to be more precise), needs to show to the other brokers that he is allowed to do so. The broker will obtain a signature by the topic privkey, over their own PeerID. Only the clients can sign this proof, and they will do it and give this proof to the broker, so it can enter the overlay. This is called the PublisherAdvert.
There is a whole Core protocol that deals with overlay joining and leaving, with topic subscriptions and publisher adverts, based on the [LoCaPs paper](https://www.gsd.inesc-id.pt/~ler/reports/filipapedrosamsc.pdf), and we will not enter into much details here about that. This core network is modelled as a general (cyclic) undirected graph.
What is important to understand is that the brokers only see encrypted events that belong to a topic, that they forward those events to clients that have subscribed to the topic, and that's pretty much all that the broker can do.
An event is sent for each commit, so it contains all the blocks of the commit itself, plus some additional blocks if needed.
The Broker stores those blocks locally if they are told to do so (by pinning), or if they are the first broker to publish this event (when the publishing client is connected to this broker), this is called “home pinning”.
Also, the brokers manage a list of clients that have created an account with them. This is mandatory because there is a legal contract agreement between clients and brokers (ToS) in the case of a public Broker Service Provider. In the case of a private self-hosted broker, the list of accounts is used to know who is allowed to use the broker, but there is no ToS, as registration works by invitation only.
Finally, the broker enforces the rule that a specific user, and all their devices, have to be connected to the same broker at any given time (and for a given overlay).
Apart from this rule, clients can connect to any broker that they want (if they have an account with them) and can even connect to multiple broker at a time (if on different overlays).
Usually, a client will only talk to one broker at a time, but in case redundancy is needed, the wallet can be configured to try to connect to several brokers.
The User can also decide to migrate to another broker, and transfer all their encrypted data there.
The events/commits are encrypted and fully portable, they can move to another broker easily.
To summarise: **The brokers only deal with encrypted events, topicIDs, pub/sub subscription and publishers, overlays, other brokers in the core network, and the users/clients that have registered with them.**
Brokers know nothing about concepts like Documents, Commits, Branches, Stores, or Wallets.
### Broker
We distinguish 3 types of brokers:
- **core broker** : They form the core of the overlay, they must have a public IP interface and run 24/7. Core brokers should be shared resources. They are trust-less and enable anonymity of the users participating in an overlay, as their devices (phone, computer) are not connected directly to other users’ devices. Instead the core brokers are intermediaries that offer constant connectivity, NAT traversal solutions, mutualization of resources, and routing/caching/backup of the encrypted data. Core brokers can be self-hosted by individuals and companies, shared community hosting, or cloud SaaS services offered by businesses to customers. Core broker communicate between each other with the Core Protocol.
- **edge broker** : They are brokers ran as self-hosted by individuals or offices at the edge of the IP network (behind ADSL/VDSL/fiber subscriber lines) and are used by the local users on the LAN interface where they expose a server (they also expose a server on the public IP, for apps who would connect from the public IP network, and for other core/edge broker to contact them). They behave like the core brokers (they participate in the core network, on their public IP interface). An edge broker will most probably run a verifier too, which means that this broker is trusted by a user that stores his private keys on such machine.
- **local broker**: A broker that is running as a client and connects to another server broker (edge or core) with the Client Protocol. A client broker only connects to one server broker at a time, and runs in a local daemon or a native app (web app is also planned). A local broker will always be running a Verifier, and serves as a cache of blocks for the Verifier. The app connects to Verifier with the App Protocol.
The verifier can decrypt the commits and verify their validity. They need the encryption keys in order to do so. For this reason they are placed at the edge of the overlay, on machines trusted by the user.
Communication between peers is implemented with WebSockets on port 1440 (HTTP) without TLS (except in the webapp), as the transport channel encryption is implemented inside the WebSocket with Noise Protocol. This architecture is important as TLS requires centralized certificate mechanisms (and NextGraph is a totally decentralized ecosystem), and self-signed certificates are not easy to handle for end-users. We are planning to add also QUIC connections(without TLS) between brokers and for native apps, in another iteration (web apps will always stay with websocket and maybe will add webrtc support later on).
We at NextGraph, will provide a set of public brokers available in different locations. But anybody can run their own broker, self-hosted (coming soon), and without the need to buy a domain name. Brokers will always find each other and sync between each other, based on what we call an overlay.
### Overlay
An overlay is like a network subspace where some peers (replicas and brokers) can interact. If those peers have joined the same overlay, they can talk to each other. If they are in disjoint overlays, they don’t even see each other.
A peer can participate in several overlay at the same time, specially the brokers that will be doing that all the time.
There is one Overlay for each [Store](/en/documents), but we distinguish between the Outer Overlay and the Inner Overlay for each Store.
The Inner Overlay, where all the editing and signing happens, protected by a secret key that only editors can have. And the Outer Overlay which is used only for read purposes for external users. The Outer Overlay ID never changes, but the Inner Overlay ID can change when we renew its epoch (after kicking out a member, by example).
The Outer Overlay lets anonymous or non-member users join the overlay and reach the brokers in order to submit Post messages, ExtRequests or ServiceRequests to them, and subscribe to the Pub/Sub topics. They can only perform read operations on the repositories. For privacy reasons, the non-member peers that connect to this overlay do not flood their PeerAdvert to the Overlay. Their IP address remains only known to the specific set of Brokers the non-member peer decided to connect to. The non-member peers can nevertheless ask to receive a full list of Brokers available in this overlay, with the options they offer for a specific topic, so they can decide to which broker they should send their request.
The Inner Overlay is used by all the peers and brokers of the members of the repository, meaning, the users who have a write access to the repository. Peers in this overlay know each other mutually (but their IDs are hidden).
Each store has one Inner Overlay, that regroups all the repositories of such store.
The brokers of each member of a repo, join the Inner overlay, and optionally the Outer overlay. Those who join both overlays serve as relays between the 2 overlays, exposing the data modified in the Inner overlay, into the read-only Outer overlay, the topic ids being identical across the 2 overlays.
An option in the configuration of each broker for that repo/store and user, tells if to join the Outer overlay or not.
NextGraph will offer in the future a unique Global Overlay (of Read type) used for indexing and broadcasting public Sites with a global audience.
Each store has a pair of overlays (Outer and Inner), except the private store that only has one Inner overlay and no Outer overlay.
We also have a pair of overlay for each Group, and only one Inner overlay for each Dialog.
The Outer overlay ID is just the Blake3 hash of the Store ID.
While the Inner overlay ID is a Blake3 keyed hash of the Store ID. the key used for the hash is derived from the ReadCapSecret of the Overlay branch of the Store.
We will now dive more into details about repositories, branches, blocks and commits, that compose the [NextGraph Protocol](/en/protocol)
Here is an illustration of our Network Architecture.
A higher resolution version of this diagram can be found here in [PNG format](https://file.nextgraph.org/download/6c13ca12530a0f931b0ca3e4ed4b7664) or in [SVG format](https://file.nextgraph.org/download/a33ec136711fb3d0ca1448fa940ac084).
@ -4,4 +4,16 @@ description: Reference of the Broker daemon (ngd) of NextGraph
layout: ../../layouts/MainLayout.astro
---
Please bare with us as we are currently writing/publishing the documentation (23 of August until 28 of August 2024).
NextGraph broker (ngd) isn't available yet as a self-hosted program.
We have released a preview version recently, and we use ngd for our server nextgraph.eu
Documentation will be available as soon as the binary will be released in alpha version for self-hosting, by the end of 2024.
The broker is a self-contained binary that will be made available to you already compiled or that you can compile yourself.
Installation is extremely easy as it just consists of placing the small binary somewhere and adding a systemd/init script for the service. There is no need to have complex configuration files or endless dependencies to other services like postgresql, redis, S3, nodejs, python, and what not.
We will also provide a Dockerfile.
The broker is administered remotely with the help of the [CLI (ngcli)](/en/cli).
@ -4,4 +4,146 @@ description: Features of the Sync Protocol of NextGraph
layout: ../../layouts/MainLayout.astro
---
Please bare with us as we are currently writing/publishing the documentation (23 of August until 28 of August 2024).
### Blocks and Commits
Blocks are the smallest unit of data that is transiting in the network and that is saved in Brokers.
A block is an encrypted piece of data, that uses convergent encryption which enables content-addressing and deduplication of identical content, without revealing such content.
A block size is limited to 1MB, and when a blob of data (we call that an Object) is bigger than that, it is sliced into several blocks and a tree of blocks is created. The root block is used to identify the Object.
Blocks have IDs that are nothing more than the hash of their encrypted data, that's how we do content addressing.
Objects can contain different type of data, namely:
- Binary files (used to store images, and other multimedia files, or any text or binary file)
- Commits, which have 3 parts, stored in separate objects: The Header, Content, and Body
- Other internal objects like Quorum definitions, Signatures, Certificates and RefreshCaps
Commits are what is sent in the events of the pub/sub, and they are the objects that constitute the core of the protocol. All the information and content of documents is in fact encoded in commits.
Commits are organized in a DAG (Directed acyclic graph) that have one root, and at any given time, can have several current HEADS, making the DAG a semi-lattice, that becomes temporarily a lattice when there is only one HEAD that merged all the forks. This happens by example when the total-order transaction mechanism is used.
When new content is created, a new commit containing the modifications is added to the DAG, and all the current HEADS known by the local replica before this insertion, are referenced in the new commit as ACKS, because we say that the new commit acknowledges the previous current heads, and all the causal past that are with them.
This new commit is now the current head at the local replica, and will be sent to the other replicas via the pub-sub.
It can happen that other editors make concurrent modifications. In this case, they will also publish a commit with a causal past (ACKS) that is similar or identical to the new commit we just published.
This will lead to a temporary “fork” in the DAG, and after the replicas have finished their syncing, they will all have 2 current heads. one for each of the concurrent commits.
The next commit (whoever will make more modification in the document), will “merge” the fork when it will publish a new commit that references the 2 heads as ACKS (direct causal past).
And so on. This way, the DAG automatically merges itself, without any conflict, and the branching that might occur, automatically collapses itself back to one single "branch".
The forking/merging is automatic, and any conflict that could emerge because of concurrent modifications is handled by the respective CRDT format that represents the data.
The documents of NextGraph have several [CRDT formats](/en/framework/crdts) that can coexists.
In order to have a well-formed DAG, only the unique root commit (a singleton) will be without ACKS. All the subsequent commits will have to have at least one ACK that links to a commit in the direct causal past.
![DAG](/public/nextgraph-DAG.png "NextGraph DAG")
A higher resolution version of this diagram can be found here in [PNG format](https://file.nextgraph.org/download/8683183bc87051f6f5372822da7de742) or in [SVG format](https://file.nextgraph.org/download/93cf7dd8646c071bb661c1dc8ba9ab57).
### Branches and Repos
The DAG of commits that we just described, represents in NextGraph, one Branch of a Repo.
A Repo (repository) is a unit that regroups one or several branches of content, together with a set of users known as members. It has a unique ID (a public key) that is immutable (will never change). At a higher level, a Repo is called a Document. But at the level that we are dealing with now, let's just call it a repo.
When a repo is created, it comes with 2 branches by default :
- the root branch, which is used to store all the members information, their permissions, the list of branches, and controls the epochs. it does not hold content. Its branchID cannot change because it is in fact, the same ID as the RepoID itself.
- the main branch, which is a transactional branch (transactional=that holds content) and that will be the default branch if no other branch is specified when a request is made to access the content. It is possible to change the main branch and point it to another branch in the repo.
Each branch also has a unique ID, that is immutable.
When a new commit is published, it is always done inside a branch.
A branch has a topicID associated with it, and when the commit leaves the replica and goes to the broker, it is published on that topicID (and the broker doesn't even see or know the branchID).
It is possible to renew the topicID during the lifetime of a branch, even several times.
This renew mechanism is used when the capabilities of the branch needs to be refreshed (for read access, when we want to remove read access from some user).
The write access is not controlled by branch, but is controlled more generally at the repo level. it is not possible to give write permission only to one specific branch. When a member is given write permission, it applies to all the branches of the repo at once. The same when write permission is revoked. It is revoked for all the branches of the repo at once.
It is indeed important that permissions are common to all branches, because we will now see that branches can be merged one into another. And when the merge happens, we consider that all the commits of the branches are valid and have been verified already back then, at the moment of every commit addition. We do not want to have to re-verify a whole branch before it is merged. What was already verified and accepted, is immutably part of the repo. If we had a permission system with different permissions for each branch, then there would be cases when some commits in one branch, cannot be merged into another branch because the permissions are incompatible. In order to prevent this, and also to simplify an already very complex design, we restricted the permission management to be only at the repo level, unlike the previous design of LoFi.Re.
The permissions, as said earlier, are stored in the root branch.
All the members of a Repo, can see the list of other members, and their permissions. This is important, because they will need these ACLs to verify the write permission of each commit's author.
Coming back to our branches, their purpose is double:
- branches can be used to “fork” a DAG, by example, of the main branch, into a new branch that in GIT terminology would be called a “working branch”, where some parallel work can be done, on the same document. Once this work is completed, it is possible to “merge back” this working branch into the main branch.
- if to the contrary, the working branch should now become the main branch without a merge (because some concurrent modifications happened in the main branch, and we want to discard them, and prefer to now use the working branch as the main branch), then there is no need to merge, and what is done instead is that we point the main branch towards the working branch.
- the second use case for branches is to have 2 or more completely different contents for the same document. By example, the main branch contains some text document, and another branch contains some JSON data. Or the main branch contains some RDF triples, and another branch contains some extra RDF triples that have different read permissions.
Indeed, as we already explained, each branch can have different read permission (but all the branches in a repo share the same write permissions).
This is due to the fact that a read permission is in fact a cryptographic capability (ReadCap) that contains a pointer towards a branch. It is possible to share this read capability with someone, and therefor give them read access to a specific branch of the repo, without letting them know of any other branch of that repo. This way, the reader will be confined to read that branch, and will not even be able to access the root branch of repo. The list of editors will not be accessible to them, nor the list of other branches. Sharing a BranchReadCap only gives access to one branch. This ReadCap also includes all the information needed to subscribe to that branch (to the corresponding Topic, to be precise). So when we share a ReadCap, we share the content of one and only one branch, together with the capability to subscribe to future updates on that branch. It is also possible to share all the branches at once, by sharing the ReadCap of the root branch, but that's something else.
That's very handy, if we want to separate a Document into several parts that will have different read access.
Let's say I have a Document that is my personal profile description. it contains my pseudonym, full name, date of birth, postal address, email address, phone number, short biography, profile picture, etc…
Now let's imagine that for some reasons related to my privacy, I do not always want to share my postal address and phone number with everyone, but instead I want to opt-out sometimes and share the rest, but not the postal address and phone number.
I could create two different documents. one with all the info, and one with the reduced profile.
But that would be cumbersome, as every time I need to update my bio, by example, i would have to copy paste it in both Documents.
The solution is to create only one profile Document, and to put the sensitive information (postal address and phone number) is a separate branch.
Both branches are updatable. If I modify my bio, all the users who subscribed to the main branch, will receive that update. Same with my phone number or postal address : if I update the other branch that contains phone number and address (lets call it the privateProfile branch), then all my close friends with whom I shared that branch, will see the update.
And I can even include a link to the main branch, from within the privateProfile branch, so that those trusted people also have access to the main branch, without need for me to share both branches ReadCaps with them.
If at some point in the future, I want to merge those two branches into one, well.. that, I won't be able to do it, because in order to merge two branches, they need to share a common ancestor (one branch has to be a fork from the other).
But here, those 2 branches are completely separated one from another. The only thing they share is that they belong to the same Repo, but they both have zero ancestors in their root DAG commit. those 2 DAGs are unrelated one to another. So we cannot merge them.
Another example about how we can use branches to do cool stuff, is for commenting/annotating on someone else's content. Commenting is a kind of editing, as it adds content. But we don't want to have to invite those commentators as editors of the document they want to comment on. Instead the commentator will create a standalone branch somewhere on their own protected store (they are free to proceed as they want on that. They can create a special document on their side, that will have the sole purpose of holding all the branches used for each comment on a specific target Document. or they can even use less Documents, and have one general purpose Document in their protected store that is always used to create branches for commenting, regardless of the target document that is commented upon.) What matters is that they are the only editor on that Document, and they will write one comment by branch. The branch subscription mechanism will let them update/fix typos on that specific comment later on. They can also delete the branch at any time, in order to delete their own comment. Once they have created that branch and inserted some content in it (the comment itself), they will send a link (a DID cap) to the original Document they want to comment upon. (each document has an inbox, which is used in this case to drop the link). A comment can reference previous comment, or quote some part of the document (annotation), thanks to RDF, this is easy to do. The owner of the Document that receives this link that contains a comment, can moderate it, accept, reject, or remove it after accepting it. If accepted, the link (DID cap) is added a the special branch for comments, that every document has by default (more on that below). Any reader of the document that subscribed to this branch, will see the new comment.
So, to recap.
- A branch has specific read permissions, but shares write permissions with all other branches in the repo
- a branch is the unit of data that can be subscribed to.
- i can put what i want in a branch
- i can also fork a branch into another branch, and then merge that fork back into the original branch (or into any other branch that shares a common ancestor)
- those forks can be used to store some specific revisions of the document. and then, by using the branchId, it is possible to refer to that specific revision.
- a branch can also be given a name, like "rewriting_paragraph_B“.
- any given commit has an ID, and that commit can also be used to refer to a specific revision, which in this case, is just the state of the document at that very specific commit. commits can also be given names like v0_1_0 (equivalent to the tags in GIT), and those names are pointers that can be updated. so one can share the name, and update the pointer later on.
- standalone branches can be used to separate different segments of data that need different read permissions.
- ReadCaps can be refreshed in order to remove read access to some branch (but the historical data they used to have access to, will always remain visible to them, specially because everything is local-first, so they surely have a local copy of that historical data. what they won't see are the new updates).
- we use the terms “DID cap”, "ReadCap", “URI", “link” or “[Nuri](/en/framework/nuri)” interchangeably in this document. They all mean the same.
it is also possible to fork a whole repo, if ownership and permissions need to be changed (similar to the “fork me on github” feature) and then there is a mechanism for “pull requests” in order to merge back that forked repo into the original repo. But it doesn't work like merging of branches, as each commit has to be checked again separately and added to the DAG again, using the identity of a user that has write permission in the target repo. Let's leave that for now, as it is not coded yet, and not urgent.
The root branch is a bit complex and has all kind of system commits to handle the internals of permissions etc. We will not dive into that right now. There are also some other hidden system branches (called Store, User, Overlay, Chat, etc..) that contain some internal data used by the system, and that you can imagine a bit what it does, given the reserved names they have. but again, let's keep that for later.
What matters for now is that any transactional branch contains commits that modify the content of the branch, which is a revision of the document.
Those commits are encrypted and sent as events in the pub/sub.
When a commit arrives on a replica, the Verifier is in charge of verifying the integrity of the commit and the branches and repo in general, and this Verifier will need to read the ACLs. it will also verify some signatures and do some checks on the DAG.
If something goes wrong, the commit is rejected and discarded. its content is not passed to the application level.
Eventually, all the replicas have a local set of commits for a branch, and they need to read them and process them once, in order to build the materialized state of the doc. That's the job of the [verifier](/en/verifier).
Please bare with us as we are currently writing/publishing the documentation (23 of August until 28 of August 2024).
**Specifications** of the Protocols and Data formats of NextGraph is still a Work in Progress.
**All our protocols and formats use the binary codec called [BARE](https://baremessages.org/)**. It has implementations in almost all languages.
As a developer, you will not have to directly manipulate BARE messages nor use the protocols directly. Instead all the APIs are made available to you as libraries that you can call from your code. We have bindings in JS and Rust for now. More languages could be added in the future.
You will find in the following sections, the specifications of
- the [Core Protocol](/en/specs/protocol-core) which is used by Core Brokers to maintain the Pub/Sub
- the [Client Protocol](/en/specs/protocol-client) which is used by clients (the Verifiers) to connect to their Broker.
- the [App Protocol](/en/specs/protocol-app) which is used by the App (our official Apps, and all the SDKs) in order to exchange plain-text updates with the Verifier.
- the [Ext Protocol](/en/specs/protocol-ext) which is used by external users for read purpose only.
- the [Admin Protocol](/en/specs/protocol-admin) which is used by an admin of a Broker in order to add and remove users, and manage the Broker in general.
- the [Repo format](/en/specs/format-repo) which details the binary format of the commits, and blocks.
- the [Wallet format](/en/specs/format-wallet) which details the format of the wallet as it is saved locally.
- the [DID method](/en/specs/did) as registered to the W3C.
### Reference
We also provide a reference documentation for the SDKs and binary softwares.
- the [Web JS SDK](/en/web) documentation of our Framework, for App developers in React or Svelte
- the [NodeJS SDK](/en/nodejs) documentation of our Framework, for service developers using Node/Deno
- the [Rust SDK](/en/rust) documentation of our Framework, for service developers
- the [Broker (ngd)](/en/ngd) documentation for Broker admins.
- the [CLI (ngcli)](/en/cli) documentation for users who want to access their data from the command-line
@ -4,4 +4,8 @@ description: Compare NextGraph Framework with other alternatives frameworks
layout: ../../layouts/MainLayout.astro
---
Please bare with us as we are currently writing/publishing the documentation (23 of August until 28 of August 2024).
We are in the process of conducing another survey of what is being coded in the domain of local-first and decentralized software, with a focus on frameworks, apps ready to use by end-users, platforms for social networks, collaborative tools, and end-to-end encryption.
Our previous survey from 2 years ago is available [here](https://nextgraph.org/survey).