News

Scaling Low-Powered Device Fleets with NATS.io: A New Approach to Connectivity at the Edge

August 22, 2024
12:26 am

Jeremy Saenz shares insights on NATS, an open-source project dedicated to improving services communication. He explains the benefits of using NATS for effective communication and device management at edge locations.

Currently a senior software engineer at Synadia Communications, Jeremy Saenz helps maintain NATS, a highly regarded open-source messaging system. His portfolio includes contributing to a variety of well-known Go community projects such as Martini, Negroni, CLI, Gin, and Inject. His previous role as Chief Product Officer at Kajabi gave him a rich experience in diverse functions, fostering his passion to drive the software engineering industry forward.

QCon San Francisco is a pivotal event that shapes the software development landscape by promoting knowledge sharing and innovation within the developer community. It appeals primarily to technical team leads, architects, engineering directors, and project managers who play a crucial role in driving innovation within their teams.

Saenz: I’m Jeremy. Today, we will delve into everything NATS-related.

First, let me introduce you to NATS and why at my company, where we handle the NATS open source project, we emphasize rethinking connectivity. Why question established connection methods? While traditional communication protocols between computers seem resolved, emerging trends in multi-cloud and edge technologies prompt a revisit of our established notions, particularly in constructing web structures, microservices, or streaming platforms.

This shift is precipitating a significant evolution in technological development, prompting a reevaluation of conventional methodologies. Although these concepts are not brand new and have been a staple in the study of distributed systems for years, there seems to be a collective forgetfulness as we moved into cloud computing, losing sight of alternative approaches developed over the past decades. A typical web-based construction relies heavily on DNS, hostnames, and IP addresses for discovery and connectivity—methods that have become standard yet may no longer be sufficient.

The general practice involves identifying computers by IP address, obtained via DNS, and using pull-based interactions, like HTTP requests, as a universal solution for all problems. Moreover, the traditional perimeter-based security approach—encapsulating everything within a Virtual Private Cloud (VPC)—assumes that a boundary ensures comprehensive security. Similarly, our backend infrastructures are often centralized, relying on massive, solitary databases as repositories. Such frameworks are predominantly based on one-to-one communication methodologies, particularly in microservices via HTTP, along with numerous layered constructs on top of it, which we may need to reconsider.

Throughout this discussion, I hope to inspire a shift in perspective regarding connectivity and communication, emphasizing that alternative, perhaps more effective methods exist beyond conventional practices. This rethinking aims to broaden our understanding and application of communication strategies within distributed systems.

NATS is an innovative, open-source messaging framework that functions like an interconnected network. At its essence, it primarily facilitates publish/subscribe messaging, enhanced by additional layers facilitating the resolution of complex distribution system issues. The platform is designed to minimize the complexities involved in creating sophisticated distributed systems.

The foundational principles of NATS include adaptability and scalability. One groundbreaking feature is its location-independent addressing—once connected to NATS, communication can occur with any other available entity without reliance on IP addresses, DNS, or domain names, utilizing straightforward, subject-based addressing. This simple-sounding approach is incredibly potent and demonstrates the system’s advanced capabilities. Furthermore, NATS supports M to N communications, enabling a broad spectrum of interaction patterns beyond simple one-to-one communications. For example, a user can pose a question and receive one or several responses, showcasing the platform’s capability to elegantly navigate complex communication scenarios. Additionally, NATS offers both push and pull communication options, enhancing its flexibility.

NATS also boasts a decentralized authentication process with zero-trust security elements, ensuring robust multi-tenancy with options for both logical and physical isolation, yet maintaining a cohesive system. It includes an intelligent, persistent storage component known as JetStream, tailored for global operation. This design allows NATS to provide high performance and low latency across international boundaries, making it an exemplary system for global scale applications.

What is NATS from an architecture standpoint? Essentially, it consists of a NATS server and various NATS clients. There are around 40 different client libraries available across multiple programming languages. Specifically, there are eight officially supported languages that are widely used and endorsed as ideal for integration. Diving deeper into the system’s architecture, there exists Core NATS which facilitates basic, high-performance messaging functions. This represents the foundational level of Pub/Sub communication where messages are asynchronously distributed across a server, effectively enabling a temporarily coupled system. If the recipient isn’t available to receive the message, it won’t be delivered.

Some may view this aspect as a limitation, however, the ability to operate a completely stateless Pub/Sub model presents unique advantages. Building upon Core NATS is JetStream, a subsystem designed for those requiring guaranteed message delivery. JetStream, while still using the NATS Core Pub/Sub framework, supports a request-reply pattern enabling persistent data management, replication, and versatile data storage and access methods. Core NATS is characterized by its rapid, payload-agnostic message publishing capabilities, handling up to a million messages per second without concern for data content, simply passing data to interested parties.

The NATS architecture supports various communication patterns beyond the basic request-reply. These include publish and subscribe, fan-in and fan-out, and scatter-gather configurations. Additionally, NATS provides automatic load balancing, which is especially beneficial in a microservices environment. This negates the need for external load balancers, as NATS intelligently manages node communication, ensuring efficient routing and load distribution. It is engineered to be globally aware, directing traffic to the nearest or most appropriate responders.

Shifting the focus to NATS JetStream, it represents an advanced, next-generation distributed persistence layer atop the Core NATS system. Retaining the underlying Pub/Sub architecture, JetStream enhances system capabilities by ensuring data persistence. It is adept at handling multi-tenancy, offering extensive configuration options and exceptional scalability. This subsystem not only supports cross-datacenter replication but also extends to global replication across continents. JetStream simplifies complex processes such as data replication and the multiplexing and demultiplexing of data streams. It supports various patterns which are crucial for modern applications, particularly in fields requiring robust edge computing, fleet management, and optimized data locality.

We utilize streaming, typically a function for which Kafka is suited, alongside work queues akin to RabbitMQ. We integrate key-value storage like Redis and object storage such as MinIO, all based on a unified foundation. This structure consists of a globally ordered data set indexed by topics or subjects, allowing the construction of various applications. JetStream facilitates these patterns within a unified stream and consumer model, enabling exploration of multiple functionalities.

Let’s delve into the NATS demo. I’ll provide insights into how NATS functions, proceed to explore fleet management, and conclude with additional demonstrations. NATS is notably straightforward to engage with, making it an ideal demo subject due to its operational simplicity and minimal setup time. I possess a NATS server binary and support for containers and Kubernetes. I’ll initiate my NATS server locally and commence connections.

To illustrate, I’ll switch my NATS context since we’ll be alternating between several. I’ll execute ‘nats sub hello’ to subscribe to the ‘hello’ subject, or even ‘hello.world.’ These terms, easily adaptable as wildcards, allow for flexible subscription, as demonstrated by subscribing to ‘hello.*’ in the terminal. Furthermore, I can publish using ‘nats pub hello.jeremy’ with a simple string payload. Shortly, message transmission and reception can occur at high speeds. For instance, using a built-in benchmark, we can distribute millions of messages per second from a local setup, proving NATS Core’s efficiency and providing a scalable foundation for JetStream developments. NATS supports request-reply interactions as well.

Exploring the power of NATS with microservices reveals some fascinating possibilities such as the use of the scatter-gather pattern. By switching to a cloud setup where microservices are connected, one could dynamically interact with them. For instance, selecting a specific context in NATS and listing microservices by executing commands like ‘nats micro list’, which retrieves a list of active microservices. These listed IDs represent the instances of the QCon microservice. By further extending commands like ‘nats micro stats’, it’s possible to gather statistics for each microservice and their endpoints, enhancing operational insights and allowing effortless load balancing.

In the context of load balancing, running a command such as ‘nats request’ with specific parameters can select an instance randomly, serving as a practical demonstration of NATS’s load balancing capabilities. For example, selecting a nickname through such a method showcases the practical application and interactive capabilities of NATS orchestrating service requests. Furthermore, integrating this with hardware like a Raspberry Pi Pico demonstrates NATS’s versatility. By piping the output to the microcontroller, one can directly publish the results, showcasing a seamless integration between software and hardware components.

Additionally, NATS supports various interaction patterns, including request-reply, publish-subscribe, and fan-in, fan-out methods, which enhance the way microservices can intercommunicate and function in a unified environment. This flexibility is a significant advantage in creating scalable and efficient systems.

Another intriguing feature is JetStream, which facilitates data management through streams. For instance, data collected from user interactions can be managed effectively with NATS handling the backend operations directly. By utilizing commands like ‘nats stream lists’ or ‘nats sub’, one can manage and subscribe to data streams such as survey results, making real-time data handling and analytics feasible and straightforward.

This is essentially what you are doing every time you reload the page, you’re just pulling in all that data. If people keep filling out the survey, all this data will come in. We have this concept of a really cool consumer model where we have ephemeral consumers, which is what you guys are using today where you just create them and they work automatically. We also have durable consumers to keep that parser in case you’re doing any stream processing work.

I think we can move into the next portion which is on fleet management. NATS for fleet management. What do I mean by fleet management? I obviously don’t mean literal cars. Sometimes I mean cars. We have NATS in cars as well. We do some literal fleet management. Really, when it comes to hardware and devices, I’m talking about just a large number of distributed devices. Typically, they might have a broad variance in hardware profiles. We might have everything connecting into this. Some will be PCs, some will be web browsers, like you guys have.

Some of these will be single board computers or microcontrollers. We need to consider all of those use cases for like, how do we create a single layer or level of communication between all of these things, especially when they’re very distributed, they’re all living in the same place. They’re going to also have unreliable network connectivity. We need to consider that. I didn’t even get to talk about how reliable NATS is in terms of how much it does retries, how much it protects itself at its own cost, and how it does failover and fault tolerance automatically, at a global scale.

These are all the things that NATS gives you for this type of fleet management use case. The last part is this perimeterless security. Because if we try to do what we did in the cloud, it’s not going to work for the edge. You can’t just put a wall around everything because everything’s scattered all over the place. Somebody could go take your hardware and do naughty things with it. We need to think about, what are some trusted security models that we could bake into this? How do we manage this at scale? We can’t put users in a database if we have millions of them, and they’re provisioning all the time. How do we scale something like that? AuthN and AuthZ is something that I think NATS really solves really well, as well.

In collaborating with various organizations that operate at the forefront of technology, it’s evident four main patterns consistently emerge in this area which are highly sought after when leveraging a technology like NATS. The initial pattern is the ability for live querying, which entails accessing numerous devices to perform ad hoc filtering and selection, enabling the development of applications that can manage extensive fleets. This scalability is achieved using a scatter-gather approach that allows such ad hoc querying, where NATS offers numerous efficient filtering and performance enhancements.

The second notable pattern is configuration management. This involves managing a myriad of devices, be they in retail locations like Starbucks, vehicles on highways, production machinery, or IoT devices. Each scenario necessitates some type of configuration or updates from a distance. The aim is to enable settings adjustments that are automatically integrated, even when devices temporarily go offline and subsequently reconnect, ensuring a seamless update process. In a similar vein to configuration management, another pattern involves remote commands that may need to be sent to devices which might be offline. Solutions have been developed to address this scenario effectively. The final pattern discussed is store and forward, which pertains to maintaining data locality.

The essence of this pattern is to keep applications operational on devices, retaining and saving information as if there were no interruptions in connectivity. Once connectivity is restored, the stored data is then transmitted appropriately. Traditionally, deploying such features has been a largely manual endeavor with custom-built platforms specific to each case. However, with NATS, these functionalities are simplified significantly. This discussion has covered aspects such as live querying, device selection, filtering, configuration management, remote commands, and the store and forward technique, illustrating the expansive capabilities possible with these technological tools.

Today, we are exploring some functionalities in real-time, as I connect to the nats-0 server hosted on Digital Ocean. With an impressive round-trip time of about 10 to 25 milliseconds, our interactions are quite swift, though some cloud latency still exists. To demonstrate, let’s delve into live querying.

In our scenario, similar to querying microservices for their data, we can also request device information. By sending a command like ‘nats request device info’ and setting replies to zero, I initiate a one-time request and await responses until a set timeout. This can generate immediate feedback on device specifics from participants, such as the type of device being used. However, managing data from, say, millions of devices would require more sophisticated methods like subject mapping or data filtering.

For instance, by identifying users on Android devices and crafting a simple command to filter this data, like ‘os.name equals Android’, I can redirect my query to garner only Android device responses. Such filtration showcases a commonly employed technique for handling large-scale device management by applying specific criteria.

Moving on, let’s discuss configuration management within NATS. I prefer utilizing a persistent key-value store for maintaining configurations which remain consistently available. By assigning keys to particular subjects, subscribed clients can continuously access current settings, even if they temporarily go offline. This system not only ensures reliability but also supports historical data tracking within the key-value stores. By using commands like ‘make config change’, alterations can be implemented and reflected across connected devices.

The narrative speaks to the implementation of a near real-time system that utilizes color-coded charts, which can be reverted back effortlessly. The discussion extends into the realm of configuration management which is highlighted as an effective way to manage multiple devices remotely. Differentiation is made between command management systems, with a focus on maintaining command structure to prevent repeated execution while ensuring devices respond once they reconnect.

No demonstration on remote commands was included, primarily due to the transient nature of the audience’s system setups. In systems with persistent consumers, such scenarios would be perfectly suited for issuing and managing commands that devices process singularly upon reconnection.

The concept of ‘store and forward’ is introduced, explaining how devices can autonomously save data when offline and synchronize it later without application-level intervention. The use of NATS technology, a lightweight server binary, is endorsed for such tasks, emphasizing its compatibility with small computers and above, excluding microcontrollers. The speaker mentions personal experience running this technology on a Raspberry Pi Zero 2 with satisfactory results and persistence.

Lastly, the potential for embedding NATS servers as a solution for efficient data management across networked devices is suggested. A live example of how they plan to adjust data pathways among audience members’ cloud-connected setups through changes labeled “NATS 0, 1, or 2” is proposed as a forthcoming demonstration.

Initially, I will shut down the existing .nats server operating on my local machine to configure it as a leaf node. A leaf node acts as an ancillary extension of a NATS cluster, operating independently of the NATS Raft group. This setup allows for a distinct NATS system that is connected through a designated bridge to manage data transfer settings. By executing the nats-server command with the leaf node configuration, I enable this setup.

Once activated, this server will connect to the cloud server, permitting us to maintain local connections. Local connection management might require some adjustments such as not performing NATS traversal due to current network configurations. For demonstration, I’ll configure a tunnel, depicting the system’s operational capability in localized settings. This ensures that even if the cloud connection is disrupted, local functions like message publishing and data storage continue seamlessly until cloud connectivity is restored and data synchronization occurs.

Attention must now be turned to our monitoring dashboard. To establish the tunnel, I will use the command, make tunnel, employing Ngrok—a highly effective tunneling tool. After attempting data transfer, an error arises due to the absence of certain data on the local system, necessitating a reversal of the operation with make unmove to reconnect to our cloud server. Subsequently, the command make mirrors will be issued to mirror the configuration key-value store and survey data, enabling real-time replication and local persistence on my laptop. Execution of commands nats context select default and make mirrors will confirm successful replication, indicated by visibility of the survey stream and key-value store configurations through further commands such as nats stream list and nats kv list, revealing the replicated data and the previously removed key with its 30 messages.

It’s stored right here on my laptop, as well as in the cloud. This allows for operations like moving data and refreshing the page seamlessly as the data processes via my laptop. When managing accounts, for instance, accessing the QCon accounts, I can alter settings in real-time. Specifically, when dealing with decentralized authentication mechanisms, revoking access to a leaf node user disrupts their cloud connection yet, everything continues to function smoothly because the relocation of data was managed efficiently.

The power of revoking and then un-revoking access showcases the flexibility in managing connections and data flow. Using commands like ‘nats micro list’, we can directly access responses from microservices, ensuring that despite the leaf node’s temporary disconnection, the system’s integrity remains intact. This demonstrates the robust nature of local server management and the ability to maintain operational continuity without reliance on cloud connectivity.

The final aspect involves implementing a store and forward system, allowing for a more structured data management approach. By subscribing to every metric being emitted using ‘nats sub’, all data generated becomes accessible in what I refer to as metrics mode. From there, capturing this data into a stream with ‘nats stream add’ allows for organized storage and retrieval. Configuring the system to store this data on a local file system, given the current setup’s scope, facilitates immediate data access and management, even with a minimal replication factor.

We’re currently gathering data locally on my laptop. To transmit this data, the process is quite straightforward. It’s akin to mirroring, but now I’ll transition to the cloud, leveraging our existing cloud connectivity. My aim is to create a data stream that integrates multiple sources. This process of combining or separating data flows is known as muxing or demuxing. This could involve data from numerous locations at the edge, which we accumulate and intend to merge into a single extensive stream in the cloud. Due to local hardware constraints, we might retain data locally for only a short duration, but opt for much longer data retention in the cloud.

There are several fascinating possibilities, such as maintaining the local stream in memory, while the cloud-based stream is stored on traditional hard drives. I’m going to switch my focus back to the cloud setup. Once there, I’ll initiate a command to create a source, resulting in a new stream named global_metrics.

This replication of data happens instantaneously, which is quite remarkable. We can maintain local storage, synchronize it with cloud storage, and transmit configuration updates effortlessly. One of the standout features of NATS is its ability to handle location transparency. It supports the nomadic nature of both applications and data, simplifying how we can interact with, and manage, data and applications across different locations without the burden of extensive configuration.

Participant 1: We’ve encountered a basic challenge. Without these issues, how would connectivity function in this scenario?

Saenz: The tunnel was a little bit of a workaround, because I can’t control the network that we’re on. If I could, and I could say, port forward something to me, and I could use dynamic DNS or something. What I faked was the tunnel, but in reality, inside of an edge network deployment, you would have control over that network and you’d just be able to connect directly to the node.

Participant 1: We just don’t use [inaudible], so we’re not replacing TCP/IP.

Saenz: This is all still TCP based.

Participant 1: This is all application level, not queues based.

Saenz: It’s all layer 7. For those in tune with networking, they understand NATS when I explain that it closely resembles a software-defined networking stack. It elevates many low-layer networking concepts to L7, enhancing the adaptability, freedom in topology, and ease of data maneuverability.

Participant 1: Focusing on that point, it enables location independence. We can all stay connected even if I leave the room or shift to another country, as long as there’s internet availability. DNS and other protocols remain in use.

Saenz: You rely on DNS for establishing the initial connection, indeed.

Participant 1: And what about subsequent processes?

Saenz: After establishing the initial connection, we utilize NATS to manage all subsequent communications. NATS maintains a persistent TCP connection. For the initial contact, standard protocols like DNS and IP are employed. Unlike conventional microservices architectures that primarily use ad hoc HTTP or gRPC, which do not support many long-lived connections, here, frequent DNS lookups and IP connections are unnecessary.

To achieve global scalability, normally many layers are required. However, by integrating a globally scalable mesh in the core, it simplifies and streamlines the management of these elements significantly.

Participant 2: You brought up multi-tenancy, suggesting it appears limited like Docker, with nested hierarchies. How does NATS handle multi-tenancy, and how is the data isolated for different tenants?

Saenz: In NATS, we ensure both logical and physical isolation. In fact, all of you are connected to my general-purpose NATS cluster hosted in the cloud. Within this setup, you were all part of a QCon-specific tenant. Every subject attributed to you is prefixed with your namespace to prevent any overlap or crossover issues. Additionally, if needed, we facilitate a sharing mechanism that allows the import and export of subjects across different accounts within these tenants.

You can establish well-defined contracts for interaction interfaces. This is evident from the widespread adoption of NATS as a comprehensive platform within organizations, incorporating what we refer to as accounts. These accounts serve distinct teams or organizational units, allowing them to connect explicitly while maintaining their independent operational environments. This implementation is what we consider logical isolation.

In terms of physical isolation, including features like leaf nodes, individuals can manage their NATS servers and control their network traffic. We often discuss solving the ‘Coke and Pepsi dilemma’, envisioning scenarios where these competitors are part of a unified network yet require extensive separation measures to ensure comfort and security in data handling and traffic management. This example hints at the complex nuances of our multi-tenant architecture, including both Authentication and Authorization mechanisms, among other intricate system designs.

Participant 3: How do I ensure uniformity across 40 different clients?

Saenz: Achieving consistency is challenging. Over time, we have experimented with numerous approaches and ideas. For instance, the possibility of developing a core library in Rust that could be adapted across various platforms was considered. Balancing the introduction of new concepts and patterns into client libraries while maintaining true to their native idiomatic expressions presents a continual trade-off.

Everyone has their unique approach to managing concurrency with various concurrency tools. Despite being a small team, we invest heavily in client libraries since there’s significant complexity managed on the client’s side, including client-side load balancing, flow control, and more. We aim to ensure that anyone familiar with programming languages like Java, .NET, Go, or Rust can easily work with a NATS client. Essentially, it requires a continuous, rigorous effort and interaction with users.

For further details, check out the presentations with transcripts

Aug 21, 2024

Welcome to DediRock, your trusted partner in high-performance hosting solutions. At DediRock, we specialize in providing dedicated servers, VPS hosting, and cloud services tailored to meet the unique needs of businesses and individuals alike. Our mission is to deliver reliable, scalable, and secure hosting solutions that empower our clients to achieve their digital goals. With a commitment to exceptional customer support, cutting-edge technology, and robust infrastructure, DediRock stands out as a leader in the hosting industry. Join us and experience the difference that dedicated service and unwavering reliability can make for your online presence. Launch our website.

Share this Post

0 0 votes

Article Rating

0 Comments

Oldest

Newest Most Voted

FRESH DEALS: KVM VPS PROMOS NOW AVAILABLE IN SELECT LOCATIONS!

DediRock is Waging War On High Prices Sign Up Now

Scaling Low-Powered Device Fleets with NATS.io: A New Approach to Connectivity at the Edge

Share this Post

Search

Categories

Tags

Address

We Accept