Skip to content

Commit

Permalink
regex_macros: delete it
Browse files Browse the repository at this point in the history
The regex_macros crate hasn't been maintained in quite some time, and has
been broken. Nobody has complained. Given the fact that there are no
immediate plans to improve the situation, and the fact that it is slower
than the runtime engine, we simply remove it.
  • Loading branch information
BurntSushi committed Dec 30, 2017
1 parent b8f56f1 commit 0375954
Show file tree
Hide file tree
Showing 15 changed files with 73 additions and 1,081 deletions.
67 changes: 30 additions & 37 deletions HACKING.md
Original file line number Diff line number Diff line change
Expand Up @@ -185,37 +185,36 @@ A regular expression program is essentially a sequence of opcodes produced by
the compiler plus various facts about the regular expression (such as whether
it is anchored, its capture names, etc.).

### The regex! macro (or why `regex::internal` exists)

The `regex!` macro is defined in the `regex_macros` crate as a compiler plugin,
which is maintained in this repository. The `regex!` macro compiles a regular
expression at compile time into specialized Rust code.

The `regex!` macro was written when this library was first conceived and
unfortunately hasn't changed much since then. In particular, it encodes the
entire Pike VM into stack allocated space (no heap allocation is done). When
`regex!` was first written, this provided a substantial speed boost over
so-called "dynamic" regexes compiled at runtime, and in particular had much
lower overhead per match. This was because the only matching engine at the
time was the Pike VM. The addition of other matching engines has inverted
the relationship; the `regex!` macro is almost never faster than the dynamic
variant. (In fact, it is typically substantially slower.)

In order to build the `regex!` macro this way, it must have access to some
internals of the regex library, which is in a distinct crate. (Compiler plugins
must be part of a distinct crate.) Namely, it must be able to compile a regular
expression and access its opcodes. The necessary internals are exported as part
of the top-level `internal` module in the regex library, but is hidden from
public documentation. In order to present a uniform API between programs build
by the `regex!` macro and their dynamic analoges, the `Regex` type is an enum
whose variants are hidden from public documentation.

In the future, the `regex!` macro should probably work more like Ragel, but
it's not clear how hard this is. In particular, the `regex!` macro should be
able to support all the features of dynamic regexes, which may be hard to do
with a Ragel-style implementation approach. (Which somewhat suggests that the
`regex!` macro may also need to grow conditional execution logic like the
dynamic variants, which seems rather grotesque.)
### The regex! macro

The `regex!` macro no longer exists. It was developed in a bygone era as a
compiler plugin during the infancy of the regex crate. Back then, then only
matching engine in the crate was the Pike VM. The `regex!` macro was, itself,
also a Pike VM. The only advantages it offered over the dynamic Pike VM that
was built at runtime were the following:

1. Syntax checking was done at compile time. Your Rust program wouldn't
compile if your regex didn't compile.
2. Reduction of overhead that was proportional to the size of the regex.
For the most part, this overhead consisted of heap allocation, which
was nearly eliminated in the compiler plugin.

The main takeaway here is that the compiler plugin was a marginally faster
version of a slow regex engine. As the regex crate evolved, it grew other regex
engines (DFA, bounded backtracker) and sophisticated literal optimizations.
The regex macro didn't keep pace, and it therefore became (dramatically) slower
than the dynamic engines. The only reason left to use it was for the compile
time guarantee that your regex is correct. Fortunately, Clippy (the Rust lint
tool) has a lint that checks your regular expression validity, which mostly
replaces that use case.

Additionally, the regex compiler plugin stopped receiving maintenance. Nobody
complained. At that point, it seemed prudent to just remove it.

Will a compiler plugin be brought back? The future is murky, but there is
definitely an opportunity there to build something that is faster than the
dynamic engines in some cases. But it will be challenging! As of now, there
are no plans to work on this.


## Testing
Expand All @@ -236,7 +235,6 @@ the AT&T test suite) and code generate tests for each matching engine. The
approach we use in this library is to create a Cargo.toml entry point for each
matching engine we want to test. The entry points are:

* `tests/test_plugin.rs` - tests the `regex!` macro
* `tests/test_default.rs` - tests `Regex::new`
* `tests/test_default_bytes.rs` - tests `bytes::Regex::new`
* `tests/test_nfa.rs` - tests `Regex::new`, forced to use the NFA
Expand All @@ -261,10 +259,6 @@ entry points, it can take a while to compile everything. To reduce compile
times slightly, try using `cargo test --test default`, which will only use the
`tests/test_default.rs` entry point.

N.B. To run tests for the `regex!` macro, use:

cargo test --manifest-path regex_macros/Cargo.toml


## Benchmarking

Expand All @@ -284,7 +278,6 @@ separately from the main regex crate.
Benchmarking follows a similarly wonky setup as tests. There are multiple entry
points:

* `bench_rust_plugin.rs` - benchmarks the `regex!` macro
* `bench_rust.rs` - benchmarks `Regex::new`
* `bench_rust_bytes.rs` benchmarks `bytes::Regex::new`
* `bench_pcre.rs` - benchmarks PCRE
Expand Down
31 changes: 0 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -188,37 +188,6 @@ assert!(!matches.matched(5));
assert!(matches.matched(6));
```

### Usage: `regex!` compiler plugin

**WARNING**: The `regex!` compiler plugin is orders of magnitude slower than
the normal `Regex::new(...)` usage. You should not use the compiler plugin
unless you have a very special reason for doing so. The performance difference
may be the temporary, but the path forward at this point isn't clear.

The `regex!` compiler plugin will compile your regexes at compile time. **This
only works with a nightly compiler.**

Here is a small example:

```rust
#![feature(plugin)]

#![plugin(regex_macros)]
extern crate regex;

fn main() {
let re = regex!(r"(\d{4})-(\d{2})-(\d{2})");
let caps = re.captures("2010-03-14").unwrap();

assert_eq!("2010", caps[1]);
assert_eq!("03", caps[2]);
assert_eq!("14", caps[3]);
}
```

Notice that we never `unwrap` the result of `regex!`. This is because your
*program* won't compile if the regex doesn't compile. (Try `regex!("(")`.)


### Usage: a regular expression parser

Expand Down
1 change: 0 additions & 1 deletion bench/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,6 @@ re-onig = ["onig"]
re-re2 = []
re-rust = []
re-rust-bytes = []
re-rust-plugin = ["regex_macros"]
re-tcl = []

[[bench]]
Expand Down
5 changes: 1 addition & 4 deletions bench/run
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/bin/bash

usage() {
echo "Usage: $(basename $0) [rust | rust-bytes | rust-plugin | pcre1 | pcre2 | re2 | onig | tcl ]" >&2
echo "Usage: $(basename $0) [rust | rust-bytes | pcre1 | pcre2 | re2 | onig | tcl ]" >&2
exit 1
}

Expand All @@ -22,9 +22,6 @@ case $which in
rust-bytes)
exec cargo bench --bench bench --features re-rust-bytes "$@"
;;
rust-plugin)
exec cargo bench --bench bench --features re-rust-plugin "$@"
;;
re2)
exec cargo bench --bench bench --features re-re2 "$@"
;;
Expand Down
15 changes: 2 additions & 13 deletions bench/src/bench.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,6 @@
// Enable the benchmarking harness.
#![feature(test)]

// If we're benchmarking the Rust regex plugin, then pull that in.
// This will bring a `regex!` macro into scope.
#![cfg_attr(feature = "re-rust-plugin", feature(plugin))]
#![cfg_attr(feature = "re-rust-plugin", plugin(regex_macros))]

#[macro_use]
extern crate lazy_static;
#[cfg(not(any(feature = "re-rust", feature = "re-rust-bytes")))]
Expand All @@ -27,7 +22,6 @@ extern crate onig;
#[cfg(any(
feature = "re-rust",
feature = "re-rust-bytes",
feature = "re-rust-plugin",
))]
extern crate regex;
#[cfg(feature = "re-rust")]
Expand All @@ -43,7 +37,7 @@ pub use ffi::pcre1::Regex;
pub use ffi::pcre2::Regex;
#[cfg(feature = "re-re2")]
pub use ffi::re2::Regex;
#[cfg(any(feature = "re-rust", feature = "re-rust-plugin"))]
#[cfg(feature = "re-rust")]
pub use regex::Regex;
#[cfg(feature = "re-rust-bytes")]
pub use regex::bytes::Regex;
Expand All @@ -52,14 +46,11 @@ pub use ffi::tcl::Regex;

// Usage: regex!(pattern)
//
// Builds a ::Regex from a borrowed string. This is used in every regex
// engine except for the Rust plugin, because the plugin itself defines the
// same macro.
// Builds a ::Regex from a borrowed string.
//
// Due to macro scoping rules, this definition only applies for the modules
// defined below. Effectively, it allows us to use the same tests for both
// native and dynamic regexes.
#[cfg(not(feature = "re-rust-plugin"))]
macro_rules! regex {
($re:expr) => { ::Regex::new(&$re.to_owned()).unwrap() }
}
Expand Down Expand Up @@ -99,7 +90,6 @@ macro_rules! text {
feature = "re-pcre2",
feature = "re-re2",
feature = "re-rust",
feature = "re-rust-plugin",
))]
macro_rules! text {
($text:expr) => { $text }
Expand All @@ -116,7 +106,6 @@ type Text = Vec<u8>;
feature = "re-pcre2",
feature = "re-re2",
feature = "re-rust",
feature = "re-rust-plugin",
))]
type Text = String;

Expand Down
1 change: 0 additions & 1 deletion bench/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ extern crate onig;
#[cfg(any(
feature = "re-rust",
feature = "re-rust-bytes",
feature = "re-rust-plugin",
))]
extern crate regex;
#[cfg(feature = "re-rust")]
Expand Down
1 change: 0 additions & 1 deletion bench/src/misc.rs
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ use {Regex, Text};
#[cfg(not(feature = "re-onig"))]
#[cfg(not(feature = "re-pcre1"))]
#[cfg(not(feature = "re-pcre2"))]
#[cfg(not(feature = "re-rust-plugin"))]
bench_match!(no_exponential, {
format!(
"{}{}",
Expand Down
14 changes: 0 additions & 14 deletions ci/run-kcov
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,10 @@ tests=(
regex
)
tmpdir=$(mktemp -d)
with_plugin=
coveralls_id=

while true; do
case "$1" in
--with-plugin)
with_plugin=yes
shift
;;
--coveralls-id)
coveralls_id="$2"
shift 2
Expand All @@ -33,15 +28,6 @@ while true; do
esac
done

if [ -n "$with_plugin" ]; then
cargo test --manifest-path regex_macros/Cargo.toml --no-run --verbose
kcov \
--verify \
--include-pattern '/regex/src/' \
"$tmpdir/plugin" \
$(ls -t ./regex_macros/target/debug/plugin-* | head -n1)
fi

cargo test --no-run --verbose --jobs 4
for t in ${tests[@]}; do
kcov \
Expand Down
35 changes: 0 additions & 35 deletions regex_macros/Cargo.toml

This file was deleted.

Loading

0 comments on commit 0375954

Please sign in to comment.