doc(tsort): Add benchmarking documentation

author: Tom D. <noreplygitemail@protonmail.com> 2024-10-19 12:06:08 +0200
committer: Tom D. <noreplygitemail@protonmail.com> 2024-10-19 15:08:06 +0200
commit: 66c3efa0b7aa430f5058131a6b0ca2fd373e8bea (patch)
tree: 7572b2a9c2235bd601c6f2694c9ae522c9f153d2
parent: c3b28bf6c72c25fb31a4ba6556c185b93e009655 (diff)
1 files changed, 64 insertions, 0 deletions
diff --git a/src/uu/tsort/BENCHMARKING.md b/src/uu/tsort/BENCHMARKING.md
new file mode 100644
index 0000000000..52b9dfc57e
--- /dev/null
+++ b/src/uu/tsort/BENCHMARKING.md
@@ -0,0 +1,64 @@
+# Benchmarking `tsort`
+<!-- spell-checker:ignore (words) randint tsort DAG uu_tsort GNU  -->
+Much of what makes `tsort` fast is the efficiency of its algorithm and implementation for topological sorting. 
+Our implementation of `tsort` also outputs a cycle whenever such ordering does not exist, just like GNU  `tsort`. 
+
+## Strategies
+
+To test `tsort`'s performance for its nominal use case, we need to test it with a DAG. One of the worst cases is when all nodes are just representing a succession of independent steps. 
+We should also test cycle detection for good measure. 
+
+### Random acyclic graph (DAG)
+
+This will output a DAG composed of 1 million pairs of edges between nodes numbered from 0 to 10,000, ensuring that the graph is acyclic by always assigning the edge with the smallest id to the node with the highest one.
+
+```python
+import random
+
+N = 10000
+
+for i in range(100*N):
+    a = random.randint(0, N)
+    b = random.randint(0, N)
+    print(f"{min(a, b)} {max(a, b)}")
+```
+
+### Random graph with cycles
+
+The following will output a graph with multiples edges, it also allows some degree of tuning to test different cases. 
+
+```python
+import random
+
+# Parameters for the graph
+num_nodes = 100  
+num_edges = 150  
+cycle_percentage = 0.10  
+max_cycle_size = 6  
+
+num_cycles = int(num_edges * cycle_percentage)
+
+for _ in range(num_edges - num_cycles):
+    a = random.randint(0, num_nodes)
+    b = random.randint(0, num_nodes)
+    print(f"{a} {b}")
+
+
+for _ in range(num_cycles):
+    cycle_size = random.randint(3, max_cycle_size)
+    cycle_nodes = random.sample(range(num_nodes), cycle_size)
+    for i in range(cycle_size):
+        print(f"{cycle_nodes[i]} {cycle_nodes[(i + 1) % cycle_size]}")
+```
+
+## Running Benchmarks
+The above scripts will output the generated graphs to the standard output. They can therefore be used directly as tests. In order to run a Benchmark, the output should be redirected to a file. 
+Use [`hyperfine`](https://github.com/sharkdp/hyperfine) to compare the performance of different `tsort` versions. For example, you can compare the performance of GNU `tsort` and another implementation with the following command:
+
+```sh
+hyperfine 'tsort random_graph.txt' 'uu_tsort random_graph.txt'
+```
+
+## Note
+
+Benchmark results from the above scripts are fuzzy and change from run to run unless a seed is set.
author	Tom D. <noreplygitemail@protonmail.com>	2024-10-19 12:06:08 +0200
committer	Tom D. <noreplygitemail@protonmail.com>	2024-10-19 15:08:06 +0200
commit	66c3efa0b7aa430f5058131a6b0ca2fd373e8bea (patch)
tree	7572b2a9c2235bd601c6f2694c9ae522c9f153d2
parent	c3b28bf6c72c25fb31a4ba6556c185b93e009655 (diff)