diff options
author | Harel Ben-Attia <harelba@gmail.com> | 2022-01-16 08:25:18 +0200 |
---|---|---|
committer | Harel Ben-Attia <harelba@gmail.com> | 2022-01-16 08:25:18 +0200 |
commit | a8b671ffcea5b52195485fd78e8e547d2b2d652d (patch) | |
tree | 0a3a0f5952b47d54bc4194836fd025774d0bb0d0 | |
parent | 8addcf51ab5efc95875ffa4c37fa947c7e85fd3f (diff) |
update caching information in benchmark results page
-rw-r--r-- | test/BENCHMARK.md | 25 |
1 files changed, 25 insertions, 0 deletions
diff --git a/test/BENCHMARK.md b/test/BENCHMARK.md index 3a4d573..2162e38 100644 --- a/test/BENCHMARK.md +++ b/test/BENCHMARK.md @@ -2,6 +2,31 @@ NOTE: *Please don't use or publish this benchmark data yet. See below for details* +# Update +q now provides inherent automatic caching capabilities, writing the CSV/TSV file to a `.qsql` file that sits beside the original file. After the cache exists (created as part of an initial query on a file), q knows to use it behind the scenes without changing the query itself, speeding up performance significantly. + +The following table shows the impact of using caching in q: + +| Rows | Columns | File Size | Query time without caching | Query time with caching | Speed Improvement | +|:---------:|:-------:|:---------:|:--------------------------:|:-----------------------:|:-----------------:| +| 5,000,000 | 100 | 4.8GB | 4 minutes, 47 seconds | 1.92 seconds | x149 | +| 1,000,000 | 100 | 983MB | 50.9 seconds | 0.461 seconds | x110 | +| 1,000,000 | 50 | 477MB | 27.1 seconds | 0.272 seconds | x99 | +| 100,000 | 100 | 99MB | 5.2 seconds | 0.141 seconds | x36 | +| 100,000 | 50 | 48MB | 2.7 seconds | 0.105 seconds | x25 | + +Effectively, `.qsql` files are just standard sqlite3 files, with an additional metadata table that is used for detecting changes in the original delimited file. This means that any tool that can read sqlite3 files can use these files directly. + +As a side-effect from this addition, q knows how to directly query multi-file sqlite3 databases, which means that the user can query any sqlite3 database, or the `.qsql` file, even when the original file doesn't exist anymore. For example: + +```bash +q "select a.*,b.* from my_file.csv.qsql a left join some-sqlite3-database:::some_table_name b on (a.id = b.id)" +``` + +The benchmark results below reflect the peformance without the caching, e.g. directly reading the delimited files, parsing them and performing the query. + +I'll update benchmark results later on to provide cached results as well. + # Overview This just a preliminary benchmark, originally created for validating performance optimizations and suggestions from users, and analyzing q's move to python3. After writing it, I thought it might be interesting to test its speed against textql and octosql as well. |