9 releases

0.4.3	Nov 9, 2024
0.4.1	Oct 14, 2024
0.3.5	Aug 18, 2024
0.3.3	Nov 27, 2023
0.1.7	May 18, 2023

#616 in Asynchronous

MIT license

94KB
2K SLoC

capp-rs

Common things i use to build Rust CLI tools for web crawlers.

CAPP - "Comprehensive Asynchronous Parallel Processing" or just "Crawler APP"

capp is a Rust library designed to provide powerful and flexible tools for building efficient web crawlers and other asynchronous, parallel processing applications. It offers a robust framework for managing concurrent tasks, handling network requests, and processing large amounts of data in a scalable manner.

Features

Asynchronous Task Management: Utilize tokio-based asynchronous processing for efficient, non-blocking execution of tasks.
Flexible Task Queue: Implement various backend storage options for task queues, including in-memory and Redis-based solutions.
Round-Robin Task Distribution: Ensure fair distribution of tasks across different domains or categories.
Configurable Workers: Set up and manage multiple worker instances to process tasks concurrently.
Error Handling and Retry Mechanisms: Robust error handling with configurable retry policies for failed tasks.
Dead Letter Queue (DLQ): Automatically move problematic tasks to a separate queue for later analysis or reprocessing.
Health Checks: Built-in health check functionality to ensure the stability of your crawling or processing system.
Extensible Architecture: Easily extend the library with custom task types, processing logic, and storage backends.

Use Cases

While capp is primarily designed for building web crawlers, its architecture makes it suitable for a variety of parallel processing tasks, including:

Web scraping and data extraction
Distributed task processing
Batch job management
Asynchronous API clients
Large-scale data processing pipelines

Getting Started

To use capp in your project, add it to your Cargo.toml:

[dependencies]
capp = "0.4"

Check examples!

Modules

config: Configuration management for your application.
healthcheck: Functions for performing health checks on your system.
http: Utilities for making HTTP requests and handling responses.
manager: Task and worker management structures.
queue: Task queue implementations and traits.
task: Definitions and utilities for working with tasks.

Dependencies

~10–24MB
~341K SLoC

bin+lib capp

9 releases

capp-rs

`lib.rs`:

CAPP - "Comprehensive Asynchronous Parallel Processing" or just "Crawler APP"

Features

Use Cases

Getting Started

Modules

Dependencies

9 releases

capp-rs

lib.rs:

CAPP - "Comprehensive Asynchronous Parallel Processing" or just "Crawler APP"

Features

Use Cases

Getting Started

Modules

Dependencies

`lib.rs`: