12 releases

0.1.11 Feb 2, 2023
0.1.10 Feb 2, 2023
0.1.9 Jan 22, 2023
0.1.6 Nov 24, 2022

#208 in Machine learning

36 downloads per month

Apache-2.0

95KB
2K SLoC

rust-diagnostics

This is a utility to insert diagnostics of code fragments as comments in Rust code and checks how a warning/error in the diagnostics has been fixed in git commit history.

Rust compiler displays many diagnostics to the console, using file name and line numbers to indicate their exact locations. Without an IDE, it requires a programmer to go back and forth between command console and the editor.

This utility inserts the diagnostic messages in-place, which could enable transformer-based machine learning approaches to analyse Rust diagnostic semantics.

Through additional arguments, this utility also checks how a warning found in revision r1 has been manually fixed by a revision r2.

Currently we integrate the utility with clippy and git2-rs.

Installation

cargo install rust-diagnostics

Usage:

The full synoposis of the command is shown below.

rust-diagnostics [--patch <commit_id> [--confirm] [--pair] [--function] [--single] [--location] [--mixed] ]

A running example

To illustrate its usage, we use a small example, let's call it an abc project.

rm -rf abc
cargo init --bin abc
cat > abc/src/main.rs <<EOF
fn main() {
    let s = std::fs::read_to_string("Cargo.toml").unwrap();
    println!("{s}");
}
EOF

Inserting warnings info into Rust code

The default function (i.e., without any argument) of rust-diagnostics will insert warning info into the Rust code. For example,

cd abc
rust-diagnostics

The command invokes clippy to report all the warnings:

    Checking abc v0.1.0 (...)
    Finished dev [unoptimized + debuginfo] target(s) in 0.06s
There are 1 warnings in 1 files.

As a result, there is also a new folder diagnostics created, with a file src/main.rs inside:


fn main() {
    let s = /*#[Warning(clippy::unwrap_used)*/std::fs::read_to_string("Cargo.toml").unwrap()/*
#[Warning(clippy::unwrap_used)
note: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#unwrap_used
if this value is an `Err`, it will panic
requested on the command line with `-W clippy::unwrap-used`*/;
    println!("The configuration file is: {s}");
}

As one can see, the warning related has been marked by two comments, before and after the violation code. The comment before indicates the type of warning, here clippy::unwrap_used. The comment after indicates also some additional note reported by cargo clippy, providing details of the type of warning and hints on how to address it. In this example, unwrap_used is not automatically fixed.

Analyse the manually fixed warnings from change history

A useful extension of the above utility checks how many warnings in the change history have been fixed, whether it is done by cargo clippy --fix automatically, or by manual patches. If the manual fixes are repetitive, it would become useful for learning the language, either manually or by machine learning.

To do this, we restart the example by making a few changes to the git repository as follows.

rm -rf abc
cargo init --vcs git --bin abc
cd abc
cat > src/main.rs <<EOF
fn main() {
    let s = std::fs::read_to_string("Cargo.toml").unwrap();
    println!("{s}");
}
EOF
git commit -am "r1"
cat > src/main.rs <<EOF
fn main() {
    if let Ok(s) = std::fs::read_to_string("Cargo.toml") {
        println!("{s}");
    }
}
EOF
git commit -am "r2"

The --vcs git option is used here so that the example project can contain some change history, in order to illustrate the functionality to do with git repository analysis.

If you inspect the code and wonder whether revision r2 has fixed the warning of revision r1, you can use git log -p to identify the revisions' commit id first.

commit 839164fa28d71a9c00009c9e25bc84dce6caa286             .......... (r2)
Author: ...
Date:   ...

    update

diff --git a/src/main.rs b/src/main.rs
index 36d2d89..6175ab1 100644
--- a/src/main.rs
+++ b/src/main.rs
@@ -1,5 +1,6 @@
 
 fn main() {
-    let s = std::fs::read_to_string("Cargo.toml").unwrap();
-    println!("{s}");
+    if let Ok(s) = std::fs::read_to_string("Cargo.toml") {
+        println!("{s}");
+    }
 }

commit 6fafc98041f47155bc51c5ddc55b8e8b0b7548bf           .......... (r1)
Author: ...
Date:   ...

    init

diff --git a/src/main.rs b/src/main.rs
new file mode 100644
index 0000000..36d2d89
--- /dev/null
+++ b/src/main.rs
@@ -0,0 +1,5 @@
+
+fn main() {
+    let s = std::fs::read_to_string("Cargo.toml").unwrap();
+    println!("{s}");
+}


Then run the following two commands, we can check whether the warning in r1 has been fixed by r2.

git checkout $r1
rust-diagnostics --patch $r2 --confirm

The output diagnostics.log includes the count of warnings of $r1 and the hunks between $r1..$r2 that matters to fix the warnings listed in front of the hunks.

For example, the output will be the same as those in the git diff format:

There are 1 warnings in 1 files.
##[Warning(clippy::unwrap_used)
@@ -3,2 +3,3 @@ fn main() {
-    let s = std::fs::read_to_string("Cargo.toml").unwrap();
-    println!("{s}");
+    if let Ok(s) = std::fs::read_to_string("Cargo.toml") {
+        println!("{s}");
+    }

Note that here we have removed all the context lines, just like the -U0 option of git-diff command, so that it is possible to get a more precise function context of the patch.

Generate into a pair using the --pair option

Using the --pair option changes the patch into a pair of code before and after the change:

git checkout $r1
rust-diagnostics --patch $r2 --confirm --pair

For example, the diagnostics.log will contain

There are 1 warnings in 1 files.
##[Warning(clippy::unwrap_used)
@@ -3,2 +3,3 @@ fn main() {
    let s = std::fs::read_to_string("Cargo.toml").unwrap();
    println!("{s}");
=== 19a3477889393ea2cdd0edcb5e6ab30c ===
    if let Ok(s) = std::fs::read_to_string("Cargo.toml") {
        println!("{s}");
    }

Note. To avoid possible clash with existing code, in the separator of the pair we use the hash key 19a3477889393ea2cdd0edcb5e6ab30c, which has been created from the command

echo rust-diagnostics | md5sum 

Generate function context using the --function option

The pair may be too terse to learn, we use the --function option to print the function surrounding the patch as its context:

git checkout $r1
rust-diagnostics --patch $r2 --confirm --pair --function

For example, it will print the following instead:

There are 1 warnings in 1 files.
##[Warning(clippy::unwrap_used)
@@ -3,2 +3,3 @@ fn main() {
fn main() {
    let s = std::fs::read_to_string("Cargo.toml").unwrap();
    println!("{s}");
}
=== 19a3477889393ea2cdd0edcb5e6ab30c ===
@@ -3,2 +3,3 @@ fn main() {
fn main() {
    if let Ok(s) = std::fs::read_to_string("Cargo.toml") {
        println!("{s}");
    }
}

Generate marked up context using the --location option

This option could insert the location of warning and its hints of fixing according to clippy into the original context.

git checkout $r1
rust-diagnostics --patch $r2 --confirm --pair --function --location

For example, it will print the following instead:

There are 1 warnings in 1 files.
##[Warning(clippy::unwrap_used)
fn main() {
    let s = /*#[Warning(clippy::unwrap_used)*/std::fs::read_to_string("Cargo.toml").unwrap()/*
#[Warning(clippy::unwrap_used)
note: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#unwrap_used
if this value is an `Err`, it will panic
requested on the command line with `-W clippy::unwrap-used`*/;
    println!("{s}");
}
=== 19a3477889393ea2cdd0edcb5e6ab30c ===
fn main() {
    if let Ok(s) = std::fs::read_to_string("Cargo.toml") {
        println!("{s}");
    }
}

Generate mixed context and patch using the --mixed option

This option could pair up the context with the actual patch.

git checkout $r1
rust-diagnostics --patch $r2 --confirm --pair --function --location --mixed

For example, it will print the following instead:

There are 1 warnings in 1 files.
##[Warning(clippy::unwrap_used)
fn main() {
    let s = /*#[Warning(clippy::unwrap_used)*/std::fs::read_to_string("Cargo.toml").unwrap()/*
#[Warning(clippy::unwrap_used)
note: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#unwrap_used
if this value is an `Err`, it will panic
requested on the command line with `-W clippy::unwrap-used`*/;
    println!("{s}");
}
=== 19a3477889393ea2cdd0edcb5e6ab30c ===
-    let s = std::fs::read_to_string("Cargo.toml").unwrap();
-    println!("{s}");
+    if let Ok(s) = std::fs::read_to_string("Cargo.toml") {
+        println!("{s}");
+    }

Note that we don't keep the header because the line numbers are no longer important if we already know where the warning is and the inserted markup hints already shifted the original line numbers.

Counting warnings

An alternative to count warnings (probably quicker) is to use "cargo lintcheck".

Acknowledgement

  • David Wood offered the idea that we can use the --message-format=json option to get diagnostic information from the Rust compiler, which saves tremendous effort in modifying the Rust compiler. Now our solution is kind of independent from the Rust compiler implementations;
  • Mara Bos provided some hints on how to fix unwrap() warnings using if-let statements;
  • Amanieu d'Antras provided some explanation for the necessity of certain clippy rules in practice, he also improves the performance of the underlying BTreeMap.
  • Josh Triplett implemented the underlying git2-rs which wraps the libgit2 library in Rust.
  • Dr Chunmiao Li implemented the refactoring rule unwrapped_used.txl to fix the corresponding warnings automatically.
  • Dr Nghi Bui suggested an idea to create mixed pairs.

Dependencies

~209MB
~5.5M SLoC