1 unstable release
0.1.0 | Jul 21, 2023 |
---|
#1826 in Development tools
Used in tree-sitter-lint
155KB
4K
SLoC
tree-sitter-grep
tree-sitter-grep is a grep-like search tool that recursively searches the current directory for a tree-sitter query pattern.
Dual-licensed under MIT or the UNLICENSE.
Installation
With a Rust toolchain installed, run:
$ cargo install tree-sitter-grep
Usage
$ tree-sitter-grep -q '(trait_bounds) @t'
src/core.rs:14:pub struct Core<'s, M: 's, S> {
src/core.rs:30:impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
src/mod.rs:622: P: AsRef<Path>,
src/mod.rs:623: M: Matcher,
src/mod.rs:624: S: Sink,
src/mod.rs:644: M: Matcher,
[...]
Specifying the query
tree-sitter-grep uses tree-sitter queries to specify "patterns" to match
You can either specify the query "inline" with the -q
/--query
argument:
$ tree-sitter-grep -q '(trait_bounds) @t'
or via a path to a tree-sitter query file (typically *.scm
) with the -Q
/--query-file
argument:
$ cat queries/trait_bounds.scm
(trait_bounds) @t
$ tree-sitter-grep -Q queries/trait_bounds.scm
tree-sitter-grep uses tree-sitter query "captures" (@whatever
) to specify "matching" tree-sitter
AST nodes
So your query must always include at least one capture
If your query includes multiple captures (eg if you are using a "pre-composed" query or are using a predicate), tree-sitter-grep will by default use the first capture in the query (in lexicographical order, I think?) as its "target capture"
To override that behavior, you can pass the -c
/--capture
argument:
$ tree-sitter-grep -q '((field_declaration name: (field_identifier) @field_name (#eq? @field_name "pos")) @f)' --capture f
How do I figure out what query I want?
It's worth reading the tree-sitter query docs as a starting point
Then for figuring out what the relevant tree-sitter AST structure is for a query you'd like
to write, a tree-sitter "playground" is invaluable, eg the interactive online one
or I use neovim's :InspectTree
In my experience while tree-sitter queries are a solid starting point, they aren't always "expressive" enough to be able to specify exactly the set of AST nodes you'd like to match
So that's why we also support specifying filter plugins where you have "total programmatic control" over what constitutes a match or not
Supported query "predicates"
Tree-sitter query predicates allow doing some eg "filtering" of matching tree-sitter AST nodes
We use the Rust tree-sitter bindings so "we support whatever predicates they do"
Specifically that includes:
#eq?
$ tree-sitter-grep -q '((field_declaration name: (field_identifier) @field_name (#eq? @field_name "pos")) @f)' --capture f
src/core.rs:20: pos: usize,
#match?
$ tree-sitter-grep -q '((field_declaration name: (field_identifier) @field_name (#match? @field_name "^p")) @f)' --capture f
src/core.rs:20: pos: usize,
src/mod.rs:157: passthru: bool,
Filter plugins
When you need "the power of a programming language" in order to fully specify the matching "criteria", you can write a "filter plugin"
Using a filter plugin
If you have an existing filter plugin, you specify that you want to use it via the
-f
/--filter
argument (with a path to the compiled filter dynamic library .so
/.dll
/.dylib
file):
$ tree-sitter-grep -q '(trait_bounds) @t' -f path/to/libmy-filter.so
If the filter plugin expects to be passed a "filter argument" (eg for parameterizing/configuring its
behavior in some way) then you specify that with the -a
/--filter-arg
argument:
$ tree-sitter-grep -q '(trait_bounds) @t' -f path/to/libmy-filter-that-expects-argument.so -a '{ the_filter_plugin_can_parse_this: "however_it_wants" }'
It's also worth noting that technically you don't have to pass a tree-sitter query argument at all if you supply a filter plugin argument (in which case the filter plugin will get invoked against "every" tree-sitter AST node)
Writing filter plugins
TODO: add a "guide" for this
The short version is:
While in theory you could probably write filter plugins in other languages the "happy path" would
be to write them in Rust and use the example filter plugins from examples/
as a starting point/reference
The basic idea is that for each tree-sitter AST node that is a potential match according to the supplied
query argument, the filter plugin then additionally gets invoked and indicates whether it considers that
node a match or not (basically as a (&tree_sitter::Node) -> bool
"predicate")
Supported target languages
Currently, tree-sitter-grep "bakes in" support for searching the following languages:
- C
- C++
- C#
- CSS
- Dockerfile
- Elisp
- Elm
- Go
- HTML
- Java
- JavaScript
- JSON
- Kotlin
- Lua
- Objective-C
- Python
- Ruby
- Rust
- Swift
- Toml
- tree-sitter queries (how meta!)
- TypeScript
In theory, any language that has a tree-sitter grammar crate published/available should be "fair game". In the future we may support dynamically specifying/loading additional languages
Or feel free to file an issue requesting "baked-in" support for other languages
Restricting the query to specific files/languages
By default, tree-sitter-grep will recursively search all "non-ignored/hidden" files of the supported languages/types and if it can parse the provided query against that language's grammar it will then search that file's contents for matches
To explicitly specify/restrict to a single language, use the -l
/--language
argument:
$ tree-sitter-grep -q '(trait_bounds) @t' -l rust
You can also restrict the search to certain files/directories by providing path arguments:
$ tree-sitter-grep -q '(trait_bounds) @t' src/main.rs src/compiler
Additional flags/arguments
For documentation of additional arguments related to eg customizing the match output, run:
$ tree-sitter-grep --help
In general, we are aiming to be rather ripgrep
-"compatible"
Performance
I haven't done any "real" benchmarking but the general take seems to be that tree-sitter-grep is pleasantly, surprisingly fast (especially given that tree-sitter is not optimized for the "parse-from-scratch" use case)
For "not gigantic" code-bases I'm tending to see it run in < 100ms
And for "gigantic" code-bases where it's eg scanning > 300k lines of code and outputting > 7000 matches, I'm seeing it run in say 360ms, which still feels "quite fast"
Editor integrations
TODO, I believe that @peterstuart has written an initial version of an Emacs plugin and I started tinkering with writing a neovim plugin
The basic idea would probably tend to be that you'd be able to interact with matches
from tree-sitter-grep in your editor the way that you'd interact with matches from
eg grep
/ripgrep
Contributions welcome/let us know if you've written a plugin for your editor of choice
Non-goals
-
Trying to support "everything and the kitchen sink" functionality (yes that is some slight
ast-grep
shade)We think tree-sitter-grep certainly has the potential to be a useful grep-like tool in and of itself, and beyond that we're thinking of it as a "building-block" that could in theory be leveraged by other tooling for eg search-and-replace, code-mod, ...
I've already had success using tree-sitter-grep as part of a "one-off large-scale automated refactor"
-
Coming up with our own custom eg querying syntax (damn it's shady over here in the shade)
I actually think the approach taken by eg
ast-grep
of providing a query syntax that "looks like the code" is pretty intuitive and maybe the "easiest thing to reach for" in a lot of casesI just personally am not drawn to it as an approach to tooling. I dislike that it's concealing the "tree-sitter-ness of it all". It feels like tree-sitter in general is very ripe for building a variety of different types of tooling on top of as an underlying technology and so I'm more drawn to "building blocks" that let you leverage existing knowledge/expertise and by their nature lead you down a path of gaining more of that knowledge/expertise. And maybe then building your own sh** on top of it (or inspired by it)
Contributing/issues
The code-base is a rather typical cargo
-based Rust project
So eg cargo test
runs the test suite
Feel free to open issues or pull requests
Dependencies
~213MB
~6M SLoC