13 releases

0.1.12 Sep 7, 2023
0.1.11 Apr 10, 2023
0.1.10 Oct 26, 2022
0.1.9 Jul 12, 2022
0.1.2 Sep 27, 2021

#304 in Filesystem


Used in 4 crates (2 directly)

Apache-2.0

225KB
5.5K SLoC

C 4K SLoC // 0.1% comments Rust 1.5K SLoC // 0.1% comments

fs-hdfs

It's based on the version 0.0.4 of http://hyunsik.github.io/hdfs-rs to provide libhdfs binding library and rust APIs which safely wraps libhdfs binding APIs.

Current Status

  • All libhdfs FFI APIs are ported.
  • Safe Rust wrapping APIs to cover most of the libhdfs APIs except those related to zero-copy read.
  • Compared to hdfs-rs, it removes the lifetime in HdfsFs, which will be more friendly for others to depend on.

Documentation

Requirements

  • The C related files are from the branch 2.7.3 of hadoop repository. For rust usage, a few changes are also applied.
  • No need to compile the Hadoop native library by yourself. However, the Hadoop jar dependencies are still required.

Usage

Add this to your Cargo.toml:

[dependencies]
fs-hdfs = "0.1.12"

Build

We need to specify $JAVA_HOME to make Java shared library available for building.

Run

Since our compiled libhdfs is JNI-based implementation, it requires Hadoop-related classes available through CLASSPATH. An example,

export CLASSPATH=$CLASSPATH:`hadoop classpath --glob`

Also, we need to specify the JVM dynamic library path for the application to load the JVM shared library at runtime.

For jdk8 and macOS, it's

export DYLD_LIBRARY_PATH=$JAVA_HOME/jre/lib/server

For jdk11 (or later jdks) and macOS, it's

export DYLD_LIBRARY_PATH=$JAVA_HOME/lib/server

For jdk8 and Centos

export LD_LIBRARY_PATH=$JAVA_HOME/jre/lib/amd64/server

For jdk11 (or later jdks) and Centos

export LD_LIBRARY_PATH=$JAVA_HOME/lib/server

Testing

The test also requires the CLASSPATH and DYLD_LIBRARY_PATH (or LD_LIBRARY_PATH). In case that the java class of org.junit.Assert can't be found. Refine the $CLASSPATH as follows:

export CLASSPATH=$CLASSPATH:`hadoop classpath --glob`:$HADOOP_HOME/share/hadoop/tools/lib/*

Here, $HADOOP_HOME need to be specified and exported.

Then you can run

cargo test

Example

use std::sync::Arc;
use hdfs::hdfs::{get_hdfs_by_full_path, HdfsFs};

let fs: Arc<HdfsFs> = get_hdfs_by_full_path("hdfs://localhost:8020/").ok().unwrap();
match fs.mkdir("/data") {
    Ok(_) => { println!("/data has been created") },
    Err(_)  => { panic!("/data creation has failed") }
};

Dependencies

~2.2–5.5MB
~99K SLoC