3 releases (breaking)

0.3.0	Jun 28, 2021
0.2.0	Jun 27, 2021
0.1.0	Jun 22, 2021

#1819 in Algorithms

49 downloads per month

MIT/Apache

44KB
828 lines

Pairing Heap

From Wikipedia:

A pairing heap is a type of heap data structure with relatively simple implementation and excellent practical amortized performance. Pairing heaps are heap-ordered multiway tree structures, and can be considered simplified Fibonacci heaps. They are considered a "robust choice" for implementing such algorithms as Prim's MST algorithm.

A min-pairing heap supports the following operations:

find_min: finds the minimum element of the heap, which is the root.
merge: combines two heaps together.
insert: adds a new element into the heap.
delete_min: remove the root and reorder its children nodes.
decrease_key: decrease the priority of an element. Standard implementation of a heap data structure does not support searching for a key efficiently (which is the case in this crate). Thus, this operation can take very long time, with an upper bound of O(2^(sqrt(log log n))).

The crate also comes with an efficient implementation of Dijkstra's algorithm to solve the single source shortest path problem and Prim's algorithm for finding minimum spanning tree.

Benchmarks

To measure the performance of this implementation, I choose the following libraries that are available on crates.io to experiment:

If I miss any libraries, please let me know.

The experiments are conducted on my PC with the following spec:

OS: Fedora 34 64-bit
CPU: AMD® Ryzen 7 3800x 8-core processor
RAM: 32 GB

Experiment 1

Each implementation is tasked to execute 1000 insertions / 0 deletes, then 999 insertions / 1 deletes (remove the top element), until the number of deletes is 1000. This means each implementation has to execute 500_500 insertions and 500_500 deletions.

For this experiment, I use the crate criterion to measure the performance of each implementation.

	Pairing heap (this crate)	Addressable pairing heap	Pairing heap (Apasel422)	Priority queue	Keyed priority queue
Average time (milliseconds)	20.37	56.6	24.18	116.84	111.30

Experiment 2

Each implementation is tasked to execute 1000 insertions / 1000 priority update / 0 deletes, then 999 insertions / 999 priority updates | 1 deletes (remove the top element), until the number of deletes is 1000.

	Pairing heap (this crate)	Addressable pairing heap	Pairing heap (Apasel422)	Priority queue	Keyed priority queue
Average time (seconds)	1.399	No implementation	No implementation	0.171	0.142

For this experiment, the pairing heap fairs worse than other two libraries. This is due to the fact that pairing heap data structures must search for keys, which in worse cases takes O(n) time, while other implementations leverage the fast lookup power from hash map.

Experiment 3

Each implementation is tasked to insert 1 million elements and the memory consumption will be measured.

For this experiment, I write a simple main (in examples/stress.rs) and use valgrind with massif for the evaluation purpose.

To compile:

cargo build --examples --release

To run valgrind:

valgrind --tool=massif ./target/release/examples/stress <implementation> <number of nodes to be inserted>

The commandline argument <implementation> accepts the following options:

pairing_heap
priority_queue
keyed_priority_queue
addressable_pairing_heap
ap422_pairing_heap

	Pairing heap (this crate)	Addressable pairing heap	Pairing heap (Apasel422)	Priority queue	Keyed priority queue
Peak heap memory consumption (MB)	30.5	72.0	segfault	62	76

The image outputs of massif-visualiser are stored in the folder img.

Dijkstra's algorithm

To test the performance of Dijkstra's algorithm with pairing heap, I use the DIMACS dataset. You can download all datasets by using the python script with the following command:

python3 scripts/download.py -d dimacs-all --dest data/

On crates.io there are several libraries that have Dijkstra's algorithm but I only find the crate pathfinding performant (please let me know if I miss any crate).

For this experiment, all implementations are tasked to solve the shortest path problem on all DIMACS dataset and I take the average runtime after ten runs.

Note: the function dijkstra_all of pathfinding returns only the direct parent node for a queried node, instead of an entire path, the function sssp_dijkstra_lazy is used for my implementation of Dijkstra's algorithm. This function returns a result which is (kind of) equivalent to what pathfinding delivers. By doing so, we can compare the solving time of both implementations, while ignoring the path building time.

Time is measured in millisecond:

	Number of nodes	Number of edges	pheap	pathfinding
DIMACS-NY	264_346	733_846	88	110
DIMACS-BAY	321_270	800_172	94	127
DIMACS-COL	435_666	1_057_066	126	172
DIMACS-FLA	1_070_376	2_712_798	377	626
DIMACS-NW	1_207_945	2_840_208	456	665
DIMACS-NE	1_524_453	3_897_636	619	852
DIMACS-CAL	1_890_815	4_657_742	740	1_246
DIMACS-LKS	2_758_119	6_885_658	1_141	1_695
DIMACS-E	3_598_623	8_778_114	1_548	2_151
DIMACS-W	6_262_104	15_248_146	3_098	4_460
DIMACS-CTR	14_081_816	34_292_496	10_183	11_256
DIMACS-USA	23_947_347	58_333_344	16_678	20_896

Minimum spanning tree

In this experiment, I measure the performance of both libraries in finding the MST. However, there are several differences between two crates that are worth mentioning: firstly, while pathfinding uses Kruskal's algorithm, I implement only the Prim's algorithm using the pairing heap. Secondly, pathfinding's implementation returns only the iterators of edges and it is the task of users to collect these iterators and (re)construct the MST. On the other hand, my implementation returns the complete graph and total weight of an MST. Thus, I run two experiments for pheap, one solving without building MST, and the other for both solving and building MST.

Average time after ten runs, measured in milliesecond:

	Number of nodes	Number of edges	pheap (Solve)	pheap (Solve + Build)	pathfinding
DIMACS-NY	264_346	733_846	78	140	132
DIMACS-BAY	321_270	800_172	93	170	140
DIMACS-COL	435_666	1_057_066	132	243	191
DIMACS-FLA	1_070_376	2_712_798	358	727	598
DIMACS-NW	1_207_945	2_840_208	409	863	622
DIMACS-NE	1_524_453	3_897_636	565	1_144	845
DIMACS-CAL	1_890_815	4_657_742	715	1_553	1_148
DIMACS-LKS	2_758_119	6_885_658	1_093	2_307	1_641
DIMACS-E	3_598_623	8_778_114	1_452	3_100	2_125
DIMACS-W	6_262_104	15_248_146	2_618	5_732	4_042
DIMACS-CTR	14_081_816	34_292_496	7_371	16_470	9_712
DIMACS-USA	23_947_347	58_333_344	11_785	25_450	17_943

License

Licensed under either of

Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Dependencies

~150KB