Skip to main content

Apache Storm main components

 Apache Storm has several main components that work together to enable distributed real-time stream processing. Here are the key components of Apache Storm:




1. **Nimbus:**

   - Nimbus is the master node in a Storm cluster. It is responsible for distributing code around the cluster, assigning tasks to worker nodes, and monitoring the overall health of the cluster. Nimbus also manages the assignment of spouts and bolts in the topology.


2. **Supervisor:**

   - Supervisors run on worker nodes in the Storm cluster. They are responsible for starting and stopping worker processes (called executors) based on the assignments received from Nimbus. Supervisors monitor the health and resource usage of worker processes and report back to Nimbus.


3. **Worker:**

   - A worker is a process running on a worker node that executes a subset of a topology. Each worker runs one or more executor threads, and each thread can run one or more tasks. Tasks correspond to individual spouts or bolts within a topology.


4. **Executor:**

   - An executor is a thread running within a worker process. Executors are responsible for executing the tasks assigned to them. Each executor can run multiple tasks concurrently, providing parallelism within a worker process.


5. **Topology:**

   - A topology is the overall data processing workflow in Apache Storm. It consists of a directed acyclic graph (DAG) of spouts and bolts, where spouts are sources of data and bolts are processing units. The topology defines how data flows through the system and how it is processed.


6. **Spout:**

   - A spout is a source of data in a Storm topology. It generates streams of data and emits them into the processing pipeline. Spouts can read from various data sources, such as Kafka, Twitter, or a custom data generator.


7. **Bolt:**

   - A bolt is a processing unit in a Storm topology. Bolts receive input streams from spouts or other bolts, perform computations or transformations on the data, and emit the results into one or more output streams. Bolts can be stateless or stateful, depending on their requirements.


8. **Stream:**

   - A stream is a sequence of tuples in Storm. Tuples are the basic data structure in Storm, containing named fields. Streams represent the flow of data between spouts and bolts within a topology.


9. **Tuple:**

   - A tuple is a basic unit of data in Storm. It is an ordered set of named fields and is used to represent data as it flows through the processing pipeline. Tuples are emitted by spouts and processed by bolts.


These components work together to create a distributed and fault-tolerant real-time data processing system. Nimbus and supervisors manage the deployment and execution of topologies across the cluster, while spouts and bolts define the data processing logic within the topology. The overall architecture of Apache Storm allows for scalable and low-latency stream processing.

Comments

Popular posts from this blog

Apache Storm vs Apache Flink

 Apache Storm and Apache Flink are both distributed stream processing frameworks, but they have some key differences in terms of architecture, programming models, and features. Here's a comparison between Apache Storm and Apache Flink: 1. **Programming Model:**    - **Apache Storm:** Storm provides a low-level, event-driven programming model using spouts and bolts. Spouts are sources of data, and bolts are the processing units that apply transformations or analyses to the data. It is designed for building complex, directed acyclic graphs (DAGs) of processing stages.        - **Apache Flink:** Flink offers a more high-level and expressive API for stream processing. Flink's API includes a functional programming style using operations like map, flatMap, filter, and windowing operations, making it easier to express complex data transformations. 2. **Event Time Processing:**    - **Apache Storm:** Initially, Storm had challenges in handling event ...

Shell Scripts

Shell scripts $? variable: In a shell script, we can check the return status immediately after running any command to determine if command is successful or not. like echo $? if return status is 0, it indicates success,  and if the return status is non-zero, typically 1, means failure. /dev/null /dev/null is a special device file in Unix-like operating systems (including Linux) that discards all data written to it. It essentially acts as a black hole for data. When data is written to /dev/null, it simply disappears and does not consume any storage space. Here are some common use cases for /dev/null: Discarding Output: As mentioned earlier, redirecting output to /dev/null is a common way to discard unwanted output, such as diagnostic messages or verbose output, especially when running scripts or commands in the background where you don't need to see the output. command >/dev/null  # Redirects stdout to /dev/null command 2>/dev/null # Redirects stderr to /dev/null command ...

Recover lost files on Windows, free and effective

 Windows File Recovery If necessary, download and launch the app from Microsoft Store. Press the Windows key, enter Windows File Recovery in the search box, and then select Windows File Recovery. When you are prompted to allow the app to make changes to your device, select Yes. In the Command Prompt window, enter the command in the following format:  winfr source-drive: destination-drive: [/mode] [/switches] There are 2 basic modes you can use to recover files: Regular and Extensive.  Regular mode examples Recover your Documents folder from your C: drive to the recovery folder on an E: drive. Don’t forget the backslash (\) at the end of the folder.   winfr C: E: /regular /n \Users\<username>\Documents\  Recover PDF and Word files from your C: drive to the recovery folder on an E: drive.  winfr C: E: /regular /n *.pdf /n *.docx  Extensive mode examples   winfr E: C: /extensive /n *invoice*  Recover jpeg and png photos from your...