summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorHarel Ben-Attia <harelba@gmail.com>2022-01-16 08:25:18 +0200
committerHarel Ben-Attia <harelba@gmail.com>2022-01-16 08:25:18 +0200
commita8b671ffcea5b52195485fd78e8e547d2b2d652d (patch)
tree0a3a0f5952b47d54bc4194836fd025774d0bb0d0
parent8addcf51ab5efc95875ffa4c37fa947c7e85fd3f (diff)
update caching information in benchmark results page
-rw-r--r--test/BENCHMARK.md25
1 files changed, 25 insertions, 0 deletions
diff --git a/test/BENCHMARK.md b/test/BENCHMARK.md
index 3a4d573..2162e38 100644
--- a/test/BENCHMARK.md
+++ b/test/BENCHMARK.md
@@ -2,6 +2,31 @@
NOTE: *Please don't use or publish this benchmark data yet. See below for details*
+# Update
+q now provides inherent automatic caching capabilities, writing the CSV/TSV file to a `.qsql` file that sits beside the original file. After the cache exists (created as part of an initial query on a file), q knows to use it behind the scenes without changing the query itself, speeding up performance significantly.
+
+The following table shows the impact of using caching in q:
+
+| Rows | Columns | File Size | Query time without caching | Query time with caching | Speed Improvement |
+|:---------:|:-------:|:---------:|:--------------------------:|:-----------------------:|:-----------------:|
+| 5,000,000 | 100 | 4.8GB | 4 minutes, 47 seconds | 1.92 seconds | x149 |
+| 1,000,000 | 100 | 983MB | 50.9 seconds | 0.461 seconds | x110 |
+| 1,000,000 | 50 | 477MB | 27.1 seconds | 0.272 seconds | x99 |
+| 100,000 | 100 | 99MB | 5.2 seconds | 0.141 seconds | x36 |
+| 100,000 | 50 | 48MB | 2.7 seconds | 0.105 seconds | x25 |
+
+Effectively, `.qsql` files are just standard sqlite3 files, with an additional metadata table that is used for detecting changes in the original delimited file. This means that any tool that can read sqlite3 files can use these files directly.
+
+As a side-effect from this addition, q knows how to directly query multi-file sqlite3 databases, which means that the user can query any sqlite3 database, or the `.qsql` file, even when the original file doesn't exist anymore. For example:
+
+```bash
+q "select a.*,b.* from my_file.csv.qsql a left join some-sqlite3-database:::some_table_name b on (a.id = b.id)"
+```
+
+The benchmark results below reflect the peformance without the caching, e.g. directly reading the delimited files, parsing them and performing the query.
+
+I'll update benchmark results later on to provide cached results as well.
+
# Overview
This just a preliminary benchmark, originally created for validating performance optimizations and suggestions from users, and analyzing q's move to python3. After writing it, I thought it might be interesting to test its speed against textql and octosql as well.