Guide to the D.A.G.G.E.R. Litepaper, Part 2

Welcome

Welcome to the second installment of our three-part “Guide to the Litepaper'' blog series, where we continue on an educational journey into the world of GenesysGo's Directed Acyclic Gossip Graph Enabling Replication Protocol (D.A.G.G.E.R.). In this series, we are breaking down the D.A.G.G.E.R. Litepaper section by section, unraveling the intricate details and uncovering the immense potential of D.A.G.G.E.R. This second post will cover Section 3 of the Litepaper, with additional clarity on the ShdwDrive implementation of D.A.G.G.E.R. which fully forms the data-availability protocol for Web2 and Web3 applications. We will offer more clarity on how we securely store data using erasure coding, and provide a glimpse into a first-ever design using mobile networks to audit and secure the network. For a deeper dive into the technical specifics of D.A.G.G.E.R. consensus and modules, please refer to the full version of the D.A.G.G.E.R. Litepaper.

Section 3 - ShdwDrive

Section 3 of the Litepaper delves into the convergence of D.A.G.G.E.R. and ShdwDrive. This section comprises Section 3.1, the Executive Summary, and Section 3.2, the Mobile Network, with both being written for a general audience. We encourage our non-technical readers to dive into this section of the Litepaper if you haven’t already since it will help platform what we are about to talk about next. In an effort to better cast the vision of ShdwDrive v2, this part 2 guide will cover the basics of ShdwDrive v2, explain erasure coding performed by the D.A.G.G.E.R. controller module and wrap up with a further explanation of the exciting role mobile phones will play in establishing ShdwDrive v2 as a novel data-availability protocol. If you’re unfamiliar with GenesysGo’s ShdwDrive v2, it’s a decentralized, high-performance, and scalable object storage solution tailored for both Web2 and Web3; it’s currently in the testnet development phases and publicly available to test file uploads and send test transactions here. ShdwDrive v2 is optimized for data availability (storage) at scale, and it’s implemented on the D.A.G.G.E.R. consensus engine and packaged with an SDK, CLI, and S3-API to make developing on the protocol as easy as possible. Please visit the Roadmap Overview Post to learn more about the roadmap of ShdwDrive v2.

Section 3.1 - Executive Summary

The Executive Summary (section 3.1) in the Litepaper provides an excellent non-technical introduction to ShdwDrive v2. We encourage everyone to go read it! In the final paragraph of section 3.1, we mention general architecture and a novel erasure coding scheme. While the majority of the technical design of ShdwDrive v2 is covered in a future whitepaper, we will be shedding some light on some of the internals referenced (such as erasure coding and auditing). Before we do that, here is a bulleted summary of sections 3.1-3.2 for those who need a quick refresher on the key points:  

  • Remote data storage has significantly changed how we access info across devices and share it with others. It allows recovery if devices fail.
  • Remote storage providers have become essential data banks. Users trust them to store and retrieve data reliably.
  • Solutions must be secure, available anytime, and protect privacy. As remote storage grows exponentially, these needs are critical.
  • By 2025, over 160 zettabytes (trillion gigabytes) of data will need storage, much in public clouds. 
  • More stored data is sensitive - financial, customer, and medical records. In 2023, 75% of survey respondents said over 40% of their cloud data is sensitive.
  • Modern systems must guarantee privacy, access whenever needed, and durability of data.
  • GenesysGo's D.A.G.G.E.R. system provides a decentralized data-availability layer that can coordinate storage that meets these needs. Its decentralized design means no single point of failure.
  • ShdwDrive v2, with D.A.G.G.E.R. implemented, gives users full control and privacy as a data-availability layer. Encryption guarantees privacy and prevents data breaches.
  • ShdwDrive v2 uses hybrid erasure coding for efficient storage and durability.
  • ShdwDrive v2 aims to work for both cold archives and hot databases via efficient storage and fast performance.
  • It will be easy to start using and switch to ShdwDrive v2. Compatible with Amazon's S3, software kits for major languages.
  • Pay-once immutable storage enables permanent low-cost records, asset archives, and blockchain data storage.
  • Flexible for many uses - encrypted data, research datasets, high-demand CDN, archival needs, AI training data, and more.
  • Smartphones and tablets increasingly have the power and bandwidth for remote storage thanks to 5G. Their capacity remains largely untapped.

The Executive Summary provides a high-level overview of ShdwDrive's capabilities built on D.A.G.G.E.R. A key technique mentioned is the novel hybrid erasure coding scheme used to enable efficient storage and data durability. Erasure coding refers to encoding original data into shards and distributing them redundantly across nodes. This allows reconstruction even if some shards are lost or corrupted. In ShdwDrive v2, the controller module executes the erasure coding algorithms when shards arrive from the consensus system. The data is sharded and distributed across the decentralized network of storage nodes or “Wield'' nodes using this erasure coding scheme. Within distributed erasure-coded systems, verifying data is properly distributed and performing quality auditing is a lightweight but critical task.

This is where the auditing capabilities of mobile devices come into play. By participating as auditor nodes, mobile devices can verify the integrity of erasure-coded data shards distributed across storage nodes. Auditors calculate proofs issued by Wield nodes to ensure data availability, consistency, and accuracy. While the role of mobile devices will expand over time, their initial integration for auditing the novel erasure coding scheme is an important first step. The following section expands on the erasure coding design of ShdwDrive and leads us to the understanding of mobile auditing.

Erasure Coding

Reed-Solomon erasure coding is a critical component of the ShdwDrive implementation of D.A.G.G.E.R. It is used to ensure data durability and high storage efficiency. The Reed-Solomon erasure codes are a type of error-correcting algorithm that allows for the reconstruction of original data even when some parts of the data are corrupted or lost. This is particularly useful in distributed systems like ShdwDrive v2, where data is stored across multiple nodes, and the risk of data loss or corruption is non-trivial. 

In the ShdwDrive implementation of D.A.G.G.E.R., the controller executes the Reed-Solomon erasure coding in the following way: 

  • Data Sharding: The original data is divided into smaller pieces, known as shards. Each shard is then encoded (encrypted) using the Reed-Solomon erasure codes. This process is crucial for the efficient storage and retrieval of data. The size of the shards can be adjusted/tuned based on the size and requirements of the system during our testnet tuning phases. 
  • Metadata and Shard Database: Each operator node (Wield Nodes) maintains a dual database consisting of metadata and shards. The metadata includes information about the shards, such as their sha256 hash, which is used to identify corrupted shards. This metadata is replicated across all Wield nodes in the system.
  • Erasure Coding and Replication: The shards are then distributed across the system using a hybrid erasure code and replication scheme to ensure security and balance efficiency and redundancy.
  • Data Recovery: In the event of data loss or corruption, the Reed-Solomon erasure codes come into play for data recovery. The erasure codes allow for the reconstruction of the original data from the remaining uncorrupted shards. The system identifies the corrupted shards using the stored sha256 hash of the shard data. Once identified, the corrupted shards are repaired or replaced using the Reed-Solomon decoding algorithm. The Reed-Solomon erasure codes provide strict repair guarantees, making them a practical choice for erasure coding in distributed systems.

This scheme not only ensures high storage efficiency but also enhances data durability by allowing for the reconstruction of original data even when some shards are lost or corrupted. The ShdwDrive implementation of D.A.G.G.E.R. also makes use of a new approach for encoding and decoding Reed-Solomon erasure codes over characteristic-2 finite fields. This approach supports a complexity of O(nlog2(n)) in both additive and multiplicative complexities. This is a significant improvement over the traditional Reed-Solomon erasure codes, which had a complexity of O(N^2). When we release the full whitepaper, our exact implementation of erasure coding will be covered in technical detail. For now, we can explain how this works using an analogy.

Let's imagine you're trying to send a precious family photo album from New York to Los Angeles. But instead of sending it all in one package, you decide to make copies of each photo and send them in multiple packages. This is to ensure that even if some packages get lost or damaged during the journey, you can still reconstruct the entire album at the destination. This is similar to how Reed-Solomon erasure codes work. They break down data into multiple 'shards' and add some extra 'parity' shards that help in reconstructing the original data if some shards are lost or corrupted.

Now, let's talk about the complexity part. Imagine you're trying to solve a jigsaw puzzle. The traditional Reed-Solomon erasure codes are like trying to solve the puzzle by trying every possible combination of pieces, which can take a lot of time (O(N^2)). But the new approach we're using is like having a guide that tells you exactly where each piece goes, making the process much faster (O(nlog2(n))). This 'guide' is the result of our new encoding and decoding algorithms for Reed-Solomon erasure codes over characteristic-2 finite fields. Finite fields are just special types of algebraic structures that work well with computer (binary) computations.

In essence, we're sending your photo album (data) in multiple packages (shards) with some extra copies (parity shards) and using a faster method (new encoding and decoding algorithms) to put the album back together at the destination. This ensures that your precious memories (data) are not only efficiently stored but also safely delivered, even if some mishaps occur along the way.

This newer approach is advantageous in practical applications as it reduces the computational overhead associated with encoding and decoding of Reed-Solomon erasure codes. This, in turn, enhances the performance of ShdwDrive v2 and the way in which it utilizes the D.A.G.G.E.R. consensus protocol. Additionally, when a block of transactions arrives from the D.A.G.G.E.R. system, they are consensus ordered and then stable sorted by the user within the Wield Nodes. This results in parallelizable sub-blocks of transactions, lifting the Reed-Solomon encoding bottleneck that would otherwise be present in a sequential executor, allowing for horizontal scaling. This notion is extended across blocks, creating a stream of transactions for each user. This optimization further enhances the performance of the ShdwDrive v2 implementation of D.A.G.G.E.R. This lower computation overhead opens the door for mobile network participation in key ways - specifically acting as auditor nodes.

Section 3.2 - Mobile Network (Auditing)

The reduced computational overhead discussed above not only boosts the performance of ShdwDrive but also enables mobile devices to play a crucial role in auditing. Mobile devices, due to their widespread usage and our design to incorporate them as asynchronous auditors, are uniquely positioned to verify the integrity of data in the D.A.G.G.E.R. Wield Nodes. They can check for data corruption and replication at their own pace, making them effective in this distributed system. 

Moreover, the low-energy footprint of our D.A.G.G.E.R. protocol's auditing algorithms allows mobile devices to contribute to the network without significantly impacting the environment. This is not just about integrating mobile technology into the network but about leveraging the unique advantages of mobile devices to enhance the resilience and verifiability of the system. This approach not only democratizes participation but also contributes to a more sustainable, efficient, and reliable distributed system.

With the advent of 5G and upcoming 6G technologies, mobile devices are becoming increasingly powerful and capable. It's fascinating to note that the specifications of today's high-end smartphones are comparable to the personal computers of just a decade ago. This technological advancement, combined with the sheer volume of over 7.5 billion smartphones actively used worldwide, presents a massive untapped potential for distributed systems such as D.A.G.G.E.R. and its application, ShdwDrive v2. 

The rapid advancement in computing capabilities at the most local level - the user's mobile device - is revolutionizing the world of data management. Every day, these devices are becoming more adept at handling complex computations and high-capacity tasks. Consider this: there's a limit to how quickly we can process and interact with data. Humans can only read, write, stream, or access data at a certain pace. Yet, the power of the mobile devices in our pockets is fast approaching a point where it can outstrip our physical needs. This surplus of local computing power, if properly harnessed and integrated into a decentralized protocol, opens up a world of possibilities. It offers a unique opportunity to redefine how we manage and interact with data in the future.

However, it's important to understand that integrating mobile devices into such a network isn't as simple as flipping a switch. There are challenges to address, particularly concerning battery life and uptime. But the beauty of the D.A.G.G.E.R. architecture is that it allows for a gradual, phased integration of mobile devices, starting with their role as auditor nodes. Mobile auditor nodes are a unique and innovative concept in the realm of decentralized storage protocols. They essentially allow mobile phone users to participate in the network by auditing and verifying the quality and correctness of the storage nodes, also known as Wield Nodes. In addition to many other module tasks (see the Litepaper), these Wield nodes are responsible for storing erasure-coded data, and their quality, consistency, availability, and security are of paramount importance to the overall health of the network. By calculating proofs from the ShdwDrive v2 network generated by Wield nodes, these auditor nodes play a crucial role in ensuring the integrity and security of the data stored. 

And the best part? Mobile phone users who participate as auditor nodes are rewarded for their contributions. This not only incentivizes participation but also democratizes the network by allowing anyone with a smartphone to contribute. So, while the ultimate vision of utilizing mobile devices as active participants in the storage and archival of ShdwDrive data may be several steps down the line, their immediate integration as auditor nodes is a significant and groundbreaking first step. 

This approach allows D.A.G.G.E.R. to leverage the vastly underutilized resources of mobile devices worldwide while also giving users the opportunity to actively participate in and benefit from the network. In the grand scheme of things, this integration of mobile devices represents a paradigm shift in how we view and use our personal devices. No longer are they just tools for communication or entertainment. Instead, they become active contributors to a global, decentralized network, helping to secure and verify data storage on an unprecedented scale. Mobile devices become an even greater personal asset by contributing to the future demands of data availability - democratizing something as fundamentally important as the quality and correctness of the data we globally create and store. The potential impact of this on the world of data storage is truly revolutionary.

Why Mobile Matters:

First, integrating mobile devices democratizes participation by allowing anyone with a smartphone to easily become an auditor node and earn rewards. This marks a major shift in making data availability protocols inclusive and accessible.

Second, the vast unused processing power of idle mobile devices presents an opportunity to tackle real-world energy and storage demands. D.A.G.G.E.R.'s efficient design allows mobiles to contribute without high energy costs, setting a new standard for sustainable distributed systems.

Moreover, by leveraging such a large pool of potential auditors, businesses can reduce storage costs and access marketing opportunities through community participation. However, the true impact is how this integration empowers everyday users worldwide to take an active role in securing a global decentralized network. Their mobile devices become a personal asset that helps drive the future of data availability and privacy. This approach is set to redefine how we view and utilize our personal tech in the digital world.

In summary, integrating mobiles is revolutionary - it democratizes participation, pioneers energy efficiency, provides business incentives, and, most importantly, allows people across the globe to directly contribute to and benefit from the future of data storage.

Conclusion:

In this article, we delved into the erasure coding approach used by ShdwDrive and how data is distributed. We highlighted the roles of Wield nodes that store data and Auditor nodes that harness mobile resources. Finally, we tied these design choices into a meaningful pathway to solving challenges with cloud storage. We are excited to share these principal aspects of our design as they help convey more deeply what makes ShdwDrive different, useful, and capable of solving real-world problems.

We appreciate your time in reading this second installment of our three-part series, delving into the D.A.G.G.E.R. Litepaper Guide. We will continue to shed more light on the design as we approach the eventual release of the ShdwDrive Whitepaper. Stay tuned for our upcoming post, where we discuss the general high-level architecture of ShdwDrive (including the role of nodes and the SHDW token), followed soon by exploring Section 4 of the Litepaper as we dig into other use cases for D.A.G.G.E.R.