summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--.gitignore2
-rw-r--r--test/BENCHMARK.md97
-rw-r--r--test/results/benchmark-results-2018-12-1748
-rw-r--r--test/results/benchmark-results-2019-12-0248
-rwxr-xr-xtest/test-suite173
-rw-r--r--test/unit-file.csv.gzbin0 -> 390046 bytes
6 files changed, 359 insertions, 9 deletions
diff --git a/.gitignore b/.gitignore
index 8f68670..27c87cc 100644
--- a/.gitignore
+++ b/.gitignore
@@ -11,3 +11,5 @@ win_build
packages
.idea/
dist/windows/
+_benchmark_data*
+*.benchmark-results
diff --git a/test/BENCHMARK.md b/test/BENCHMARK.md
new file mode 100644
index 0000000..80146b2
--- /dev/null
+++ b/test/BENCHMARK.md
@@ -0,0 +1,97 @@
+
+
+*Please don't use or publish this benchmark data yet, it's still alpha, i'm checking the validity of the results, and python 3 q version has not been merged yet.*
+
+**NOTE**
+This just a preliminary benchmark, and the results I got are somewhat surprising. I would love to validate these results by having other people run the benchmark as well and send me emails with their results. If you're interested, follow the "Running the benchmark" part. After the benchmark is finished, send me the `all.benchmark-results` file, along with some details about your hardware, and i'll add it to the spreadsheet. <harelba@gmail.com>
+
+# Benchmark
+This is an initial version of the benchmark, along with some results. The following is compared:
+* q running on python 2.7.11
+* q running on python 3.6.4
+* textql 2.0.3
+* octosql
+
+The q version used for the benchmark is still on the python2/3 compatibility branch (hash f0b62b15b91583cd944ea2e8daf6f730198959fa)
+
+This is by no means a scientific benchmark, and it only focuses on the data loading time. Also, it does not try to provide any usability comparison between q and textql. Actually, I've created this benchmark in order to compare q over python 2 and 3, and only then decided it would be interesting to compare the results to textql and octosql.
+
+## Methodology
+The idea was to compare the time sensitivity of row and column count.
+
+* Row counts: 1,10,100,1000,10000,100000,1000000
+* Column counts: 1,5,10,20,50,100
+* Iterations for each combination: 10
+
+The benchmark executes simple `select count(*) from <file>` queries for each combination, calculating the mean and stddev of each set of iterations. The stddev is used in order to measure the validity of the results.
+
+The graphs below only compare the means of the results, the standard deviations are written into the google sheet itself, and can be viewed there if needed.
+
+## Hardware
+OSX Sierra on a 15" Macbook Pro from Mid 2015, with 16GB of RAM, and an internal Flash Drive of 256GB.
+
+
+## Running the benchmark
+
+Please note that the initial run generates big files, so you'd need more than 3GB of free space available. This also means that the first run will take much longer than additional runs. This is typical, and does not affect the benchmark results. All the generated files reside in the `_benchmark_data/` folder.
+
+### Preparations
+Make sure you have pyenv and pyenv-virtualenv installed.
+
+* $ `git clone git@github.com:harelba/q.git`
+* $ `git checkout q-benchmark`
+* $ `cd test/`
+* $ `pyenv install 2.7.11`
+* $ `pyenv virtualenv 2.7.11 py2-q`
+* $ `pyenv activate py2-q`
+* $ `pip install -r ../requirements.txt`
+* $ `pyenv install 3.6.4`
+* $ `pyenv virtualenv 3.6.4 py3-q`
+* $ `pyenv activate py3-q`
+* $ `pip install -r ../requirements.txt`
+* $ `wget "https://s3.amazonaws.com/harelba-q-public/benchmark_data.tar.gz"`
+* $ `tar xvzf benchmark_data.tar.gz`
+* Install [`textql`](https://github.com/dinedal/textql#install)
+* Install [`octosql`](https://github.com/cube2222/octosql#installation)
+
+### Execution
+* $ `pyenv activate py2-q`
+* $ `./test-all BenchmarkTests.test_q_matrix`
+* $ `pyenv activate py3-q`
+* $ `./test-all BenchmarkTests.test_q_matrix`
+* $ `./test-all BenchmarkTests.test_textql_matrix`
+* $ `./test-all BenchmarkTests.test_octosql_matrix`
+
+The results from each of the benchmarks will be written to `<virtual-env-name>.benchmark-results`, `textql.benchmark-results` for the textql test, and `octosql.benchmark-results`.
+
+* $ `paste py2-q.benchmark-results py3-q.benchmark-results textql.benchmark-results octosql.benchmark-results > all.benchmark-results`
+
+## Updating the benchmark markdown document file
+The results should reside in the following [google sheet](https://docs.google.com/spreadsheets/d/1Ljr8YIJwUQ5F4wr6ATga5Aajpu1CvQp1pe52KGrLkbY/edit?usp=sharing).
+
+* Duplicate the baseline tab inside the spreadsheet.
+* Paste the content of `all.benchmark-results` to the new tab, near "Fill raw results here".
+
+* All the graphs below will be updated automatically.
+
+## Results
+(Results are automatically updated from the baseline tab in the google spreadsheet).
+
+### 1 Column Table
+![1 column table](https://docs.google.com/spreadsheets/d/e/2PACX-1vQy9Zm4I322Tdf5uoiFFJx6Oi3Z4AMq7He3fUUtsEQVQIdTGfWgjxFD6k8PAy9wBjvFkqaG26oBgNTP/pubchart?oid=1119350798&format=image)
+
+### 5 Column Table
+![5 column table](https://docs.google.com/spreadsheets/d/e/2PACX-1vQy9Zm4I322Tdf5uoiFFJx6Oi3Z4AMq7He3fUUtsEQVQIdTGfWgjxFD6k8PAy9wBjvFkqaG26oBgNTP/pubchart?oid=599223098&format=image)
+
+### 10 Column Table
+![10 column table](https://docs.google.com/spreadsheets/d/e/2PACX-1vQy9Zm4I322Tdf5uoiFFJx6Oi3Z4AMq7He3fUUtsEQVQIdTGfWgjxFD6k8PAy9wBjvFkqaG26oBgNTP/pubchart?oid=82695414&format=image)
+
+### 20 Column Table
+![20 column table](https://docs.google.com/spreadsheets/d/e/2PACX-1vQy9Zm4I322Tdf5uoiFFJx6Oi3Z4AMq7He3fUUtsEQVQIdTGfWgjxFD6k8PAy9wBjvFkqaG26oBgNTP/pubchart?oid=1573199483&format=image)
+
+### 50 Column Table
+![50 column table](https://docs.google.com/spreadsheets/d/e/2PACX-1vQy9Zm4I322Tdf5uoiFFJx6Oi3Z4AMq7He3fUUtsEQVQIdTGfWgjxFD6k8PAy9wBjvFkqaG26oBgNTP/pubchart?oid=448568670&format=image)
+
+### 100 Column Table
+![100 column table](https://docs.google.com/spreadsheets/d/e/2PACX-1vQy9Zm4I322Tdf5uoiFFJx6Oi3Z4AMq7He3fUUtsEQVQIdTGfWgjxFD6k8PAy9wBjvFkqaG26oBgNTP/pubchart?oid=2101488258&format=image)
+
diff --git a/test/results/benchmark-results-2018-12-17 b/test/results/benchmark-results-2018-12-17
new file mode 100644
index 0000000..8d40754
--- /dev/null
+++ b/test/results/benchmark-results-2018-12-17
@@ -0,0 +1,48 @@
+lines columns py2-q_mean py2-q_stddev lines columns py3-q_mean py3-q_stddev lines columns textql_mean textql_stddev
+1 1 0.06731581688 0.005270230559 1 1 0.09322199821 0.008088911233 1 1 0.01541593075 0.00846248027
+10 1 0.06453447342 0.003110529879 10 1 0.0952757597 0.01068078746 10 1 0.01273214817 0.001517273708
+100 1 0.06692070961 0.004081653457 100 1 0.09462814331 0.00550010348 100 1 0.01279251575 0.0007315880067
+1000 1 0.0703766346 0.002271640626 1000 1 0.09908235073 0.0085850761 1000 1 0.01575729847 0.001170010368
+10000 1 0.1229094744 0.005485221564 10000 1 0.1375562668 0.009702295105 10000 1 0.04378418922 0.001448525422
+100000 1 0.598156023 0.01721054649 100000 1 0.522838521 0.01662262184 100000 1 0.3162255287 0.01030908105
+1000000 1 5.372911286 0.0425664739 1000000 1 4.312362194 0.04878944441 1000000 1 3.042521834 0.02222183573
+lines columns py2-q_mean py2-q_stddev lines columns py3-q_mean py3-q_stddev lines columns textql_mean textql_stddev
+1 5 0.06542704105 0.001973147455 1 5 0.09278903008 0.007920553711 1 5 0.01264638901 0.0009375946825
+10 5 0.06713621616 0.003302711249 10 5 0.09266264439 0.006464956796 10 5 0.01264002323 0.0005921679139
+100 5 0.07043097019 0.003513428229 100 5 0.09614286423 0.006232406135 100 5 0.01298532486 0.001484074702
+1000 5 0.07853364944 0.002677513043 1000 5 0.1007899046 0.009419248049 1000 5 0.01899263859 0.0005582728364
+10000 5 0.1847445965 0.006918806414 10000 5 0.151746726 0.007045195955 10000 5 0.07659320831 0.00297289199
+100000 5 1.206378174 0.01569912364 100000 5 0.6551784992 0.02468335852 100000 5 0.6256412745 0.009538934388
+1000000 5 11.4774132 0.2737370571 1000000 5 5.54825387 0.06392730387 1000000 5 6.174384165 0.0396257937
+lines columns py2-q_mean py2-q_stddev lines columns py3-q_mean py3-q_stddev lines columns textql_mean textql_stddev
+1 10 0.06635277271 0.003224367089 1 10 0.09342534542 0.003372803039 1 10 0.01265852451 0.00115658081
+10 10 0.06949725151 0.004236749478 10 10 0.09139561653 0.00361962951 10 10 0.01304826736 0.0009077163448
+100 10 0.07332832813 0.003211229764 100 10 0.09613847733 0.002976111632 100 10 0.01362993717 0.0003077883843
+1000 10 0.09426920414 0.004147375078 1000 10 0.10503757 0.004323166227 1000 10 0.02448859215 0.001551123656
+10000 10 0.26318748 0.007391059562 10000 10 0.1713474512 0.004400747258 10000 10 0.1165221453 0.004626763279
+100000 10 1.939086366 0.01711379803 100000 10 0.8509856939 0.01451489164 100000 10 1.03131845 0.0154166
+1000000 10 19.16211414 0.3417997674 1000000 10 7.636127377 0.06577367856 1000000 10 10.22023973 0.0443451077
+lines columns py2-q_mean py2-q_stddev lines columns py3-q_mean py3-q_stddev lines columns textql_mean textql_stddev
+1 20 0.06688520908 0.003686408801 1 20 0.0937997818 0.00504618112 1 20 0.01299088001 0.00130498302
+10 20 0.06709973812 0.003909120415 10 20 0.09303014278 0.004256698801 10 20 0.01291837692 0.001043654863
+100 20 0.0813845396 0.005158197903 100 20 0.1016526461 0.004238640414 100 20 0.01500227451 0.001216417242
+1000 20 0.1107584953 0.006723338286 1000 20 0.1139468193 0.005867712372 1000 20 0.03420743942 0.003094073019
+10000 20 0.4188146114 0.01474904378 10000 20 0.2173264027 0.005747071741 10000 20 0.1986592293 0.006588276071
+100000 20 3.461091924 0.1043205869 100000 20 1.287664986 0.0221862172 100000 20 1.829260516 0.01414616471
+1000000 20 33.20876031 0.3190789024 1000000 20 11.84579525 0.1406809832 1000000 20 18.15644448 0.1474355796
+lines columns py2-q_mean py2-q_stddev lines columns py3-q_mean py3-q_stddev lines columns textql_mean textql_stddev
+1 50 0.06706497669 0.003487010206 1 50 0.09036362171 0.00392337182 1 50 0.0134802103 0.001043321639
+10 50 0.0721385479 0.00526657204 10 50 0.09356541634 0.003705587568 10 50 0.01397790909 0.001008071038
+100 50 0.1015130758 0.003524910234 100 50 0.1168865919 0.002810940717 100 50 0.01766057014 0.0008818513382
+1000 50 0.1666964769 0.006661858999 1000 50 0.1373265505 0.004538848823 1000 50 0.05760366917 0.003787637225
+10000 50 0.8726647139 0.04817920962 10000 50 0.3499189854 0.006489403179 10000 50 0.4113406658 0.00551681222
+100000 50 7.659929824 0.1190133198 100000 50 2.486357236 0.04149367418 100000 50 4.023236489 0.02935989293
+1000000 50 75.64912643 1.036366669 1000000 50 23.88283024 0.4251339799 1000000 50 40.02736287 0.3879349969
+lines columns py2-q_mean py2-q_stddev lines columns py3-q_mean py3-q_stddev lines columns textql_mean textql_stddev
+1 100 0.06666021347 0.001720503522 1 100 0.09272692204 0.005532725603 1 100 0.01451745033 0.0009589603269
+10 100 0.0746655941 0.004541222011 10 100 0.09874138832 0.007172096503 10 100 0.0155831337 0.001020332488
+100 100 0.1330797672 0.004335602846 100 100 0.1412571669 0.008253862291 100 100 0.02391133308 0.001714142787
+1000 100 0.2642062426 0.01022737492 1000 100 0.1779050112 0.006555498616 1000 100 0.09285030365 0.002734967858
+10000 100 1.570353174 0.01475258288 10000 100 0.5818499565 0.01616512044 10000 100 0.779653573 0.01021001276
+100000 100 14.70140581 0.3328709764 100000 100 4.601756811 0.05434568891 100000 100 7.700500083 0.06577229359
+1000000 100 148.4634018 7.316550329 1000000 100 44.62859902 0.4333388333 1000000 100 77.977897 0.7301257528
diff --git a/test/results/benchmark-results-2019-12-02 b/test/results/benchmark-results-2019-12-02
new file mode 100644
index 0000000..e9dca7f
--- /dev/null
+++ b/test/results/benchmark-results-2019-12-02
@@ -0,0 +1,48 @@
+lines columns py2-q_mean py2-q_stddev lines columns py3-q_mean py3-q_stddev lines columns textql_mean textql_stddev lines columns octosql_mean octosql_stddev
+1 1 0.0734721899033 0.00342279013601 1 1 0.10051469802856446 0.004675328349147358 1 1 0.0173349380493 0.0059959206152 1 1 0.011228728294372558 0.0010877179127881723
+10 1 0.0746278762817 0.00468414353387 10 1 0.10234739780426025 0.00510078119096311 10 1 0.014651632309 0.00217845165708 10 1 0.011713194847106933 0.0017938878071954913
+100 1 0.0754479169846 0.00367546265314 100 1 0.10537784099578858 0.0035228973459241267 100 1 0.0151803731918 0.00224971341816 100 1 0.014325213432312012 0.0017290050723256997
+1000 1 0.0827184200287 0.00332749977518 1000 1 0.10914053916931152 0.0037876126560058765 1000 1 0.0185441970825 0.00154583625692 1000 1 0.02007620334625244 0.003841671637009388
+10000 1 0.130123448372 0.00398276082559 10000 1 0.1471630811691284 0.0056107748124805115 10000 1 0.0520985126495 0.00227488114922 10000 1 0.06009321212768555 0.0018045935981669575
+100000 1 0.612298583984 0.0185709541475 100000 1 0.5399166822433472 0.02213469033463378 100000 1 0.337541723251 0.0116086194325 100000 1 0.43014986515045167 0.005839166941421165
+1000000 1 5.59862473011 0.0905480166939 1000000 1 4.39980182647705 0.0884813733818434 1000000 1 3.17139401436 0.0444820658987 1000000 1 4.267914342880249 0.11698217726499018
+lines columns py2-q_mean py2-q_stddev lines columns py3-q_mean py3-q_stddev lines columns textql_mean textql_stddev lines columns octosql_mean octosql_stddev
+1 5 0.0694166183472 0.00307281713923 1 5 0.10455994606018067 0.008311956974184905 1 5 0.0131057500839 0.00132499528147 1 5 0.010100650787353515 0.0008495662523858508
+10 5 0.070539522171 0.00196090167509 10 5 0.10231781005859375 0.0050317627429269955 10 5 0.014551782608 0.00205475359694 10 5 0.010378241539001465 0.00042382931551291064
+100 5 0.0742300033569 0.00302154129771 100 5 0.10598726272583008 0.006187299813626734 100 5 0.0150702953339 0.0019565274703 100 5 0.011428117752075195 0.001054487577015793
+1000 5 0.087014746666 0.00431789522004 1000 5 0.11044230461120605 0.00632368195279581 1000 5 0.0254506826401 0.00232872935772 1000 5 0.023230981826782227 0.0013638854413874789
+10000 5 0.187808656693 0.00512575898848 10000 5 0.16487712860107423 0.010076056131490768 10000 5 0.0847299337387 0.00413949339091 10000 5 0.103983473777771 0.002566703779142417
+100000 5 1.24647183418 0.0307551525876 100000 5 0.6653818368911744 0.017578506494383438 100000 5 0.647140431404 0.00484863670427 100000 5 0.9367039680480957 0.047583277674755294
+1000000 5 11.6488220453 0.222469120228 1000000 5 5.654011297225952 0.08764196721029975 1000000 5 6.31902601719 0.0585787838282 1000000 5 8.689867305755616 0.20061665098923728
+lines columns py2-q_mean py2-q_stddev lines columns py3-q_mean py3-q_stddev lines columns textql_mean textql_stddev lines columns octosql_mean octosql_stddev
+1 10 0.0732004165649 0.00524696166897 1 10 0.10312862396240234 0.006586403661254048 1 10 0.0135506629944 0.00154682138063 1 10 0.010560154914855957 0.0012062897475765952
+10 10 0.0719322681427 0.00337980529655 10 10 0.10103726387023926 0.005139217305955802 10 10 0.0143553495407 0.00141737842486 10 10 0.01032114028930664 0.0007034635668424652
+100 10 0.0793414115906 0.0047871454186 100 10 0.10384261608123779 0.004850772126615192 100 10 0.0148341178894 0.00143514697436 100 10 0.012691855430603027 0.0009712232784515944
+1000 10 0.098956155777 0.00314928094914 1000 10 0.11434323787689209 0.0052855049216250505 1000 10 0.0275867700577 0.00200118141767 1000 10 0.034005475044250486 0.001425221820132235
+10000 10 0.273002624512 0.00871803130738 10000 10 0.18594975471496583 0.008426937757921716 10000 10 0.126932358742 0.00620581702113 10000 10 0.19539403915405273 0.00401993825688173
+100000 10 2.03795661926 0.0744729489785 100000 10 0.8921735525131226 0.02259783771356152 100000 10 1.04192745686 0.0149334046633 100000 10 1.8566447257995606 0.08845727656371252
+1000000 10 19.7247032404 0.66605468687 1000000 10 7.7266138076782225 0.10439505885940377 1000000 10 10.2687769413 0.0682723749151 1000000 10 18.230213975906373 0.9985511456352485
+lines columns py2-q_mean py2-q_stddev lines columns py3-q_mean py3-q_stddev lines columns textql_mean textql_stddev lines columns octosql_mean octosql_stddev
+1 20 0.0691435098648 0.00232124488632 1 20 0.10138421058654785 0.0059525018871562215 1 20 0.014045715332 0.00116432652787 1 20 0.01030263900756836 0.0014017045651869215
+10 20 0.072181892395 0.00306747549853 10 20 0.1015845537185669 0.002972749458583174 10 20 0.013680934906 0.000910383697657 10 20 0.010272622108459473 0.0006771441928063938
+100 20 0.0876452922821 0.00404266708701 100 20 0.11257178783416748 0.005037816595320132 100 20 0.0164495944977 0.00154197876987 100 20 0.015861248970031737 0.0010913014445132199
+1000 20 0.116324877739 0.00424430086321 1000 20 0.12467055320739746 0.005266059173902396 1000 20 0.0401841163635 0.00349693991299 1000 20 0.05414586067199707 0.0018546178376686003
+10000 20 0.427709841728 0.0133665186407 10000 20 0.23156797885894775 0.011922511384004917 10000 20 0.204241681099 0.00279346321711 10000 20 0.4071432828903198 0.007885384401472337
+100000 20 3.53898899555 0.145285257829 100000 20 1.2966086864471436 0.020653793768142525 100000 20 1.83605823517 0.0237800648849 100000 20 3.930004286766052 0.10273588479658016
+1000000 20 34.4587288141 0.882682659759 1000000 20 12.197622799873352 0.38353366422310053 1000000 20 18.2444090366 0.13051035911 1000000 20 39.0564279794693 1.6574754268177938
+lines columns py2-q_mean py2-q_stddev lines columns py3-q_mean py3-q_stddev lines columns textql_mean textql_stddev lines columns octosql_mean octosql_stddev
+1 50 0.0733374118805 0.00664954688392 1 50 0.1022254228591919 0.003813871305051591 1 50 0.0140640974045 0.00200548518545 1 50 0.010728073120117188 0.001559272868841953
+10 50 0.0744789838791 0.00281544448238 10 50 0.10403745174407959 0.00443303536428869 10 50 0.0147454023361 0.00148454350858 10 50 0.011526155471801757 0.0009739846530934967
+100 50 0.108049821854 0.00568025365269 100 50 0.124765944480896 0.0053517076254703125 100 50 0.0197550296783 0.0027634472647 100 50 0.023745250701904298 0.0011332207108003655
+1000 50 0.176298546791 0.00606499912326 1000 50 0.144227933883667 0.006044948688936146 1000 50 0.0628623962402 0.00234005612486 1000 50 0.10975453853607178 0.0016775696306391506
+10000 50 0.891832685471 0.0473835751534 10000 50 0.3600123167037964 0.007161627385086743 10000 50 0.427824187279 0.0069294988011 10000 50 0.9561779499053955 0.009561843743429211
+100000 50 7.77155239582 0.101585372824 100000 50 2.6051604032516478 0.0878884131862843 100000 50 4.03805820942 0.0408635603507 100000 50 9.653993058204652 0.2270682921633226
+1000000 50 78.4816464186 2.09936579528 1000000 50 25.2284695148468 0.7603793472233193 1000000 50 40.3812385321 0.160958877387 1000000 50 95.60885927677154 1.751052379968784
+lines columns py2-q_mean py2-q_stddev lines columns py3-q_mean py3-q_stddev lines columns textql_mean textql_stddev lines columns octosql_mean octosql_stddev
+1 100 0.0753021240234 0.00621659499913 1 100 0.10945203304290771 0.009525392011882291 1 100 0.0166680812836 0.00209574297652 1 100 0.011182069778442383 0.0008911394102061748
+10 100 0.0799629449844 0.00431510030729 10 100 0.11035494804382324 0.0023842770363513535 10 100 0.0161526203156 0.000944392845294 10 100 0.01617884635925293 0.0019236690464235176
+100 100 0.141041827202 0.00473456838003 100 100 0.16715807914733888 0.013143750801118107 100 100 0.0285645484924 0.00499557241643 100 100 0.040108680725097656 0.003924007500439625
+1000 100 0.272631931305 0.00899845563948 1000 100 0.19988524913787842 0.004237481791359729 1000 100 0.101301050186 0.00408546610666 1000 100 0.22757408618927003 0.004142308919104949
+10000 100 1.61969444752 0.0338148564151 10000 100 0.6452171087265015 0.02634034109327908 10000 100 0.795653438568 0.0110871290583 10000 100 2.1363088846206666 0.03476917431930926
+100000 100 14.9034232616 0.177666674893 100000 100 5.090956687927246 0.17384895247428786 100000 100 7.78771924973 0.0423887501838 100000 100 21.054430747032164 0.24453457049973806
+1000000 100 147.973981094 2.41177161281 1000000 100 47.093635368347165 1.4756281250192291 1000000 100 78.1040684938 0.212101848957 1000000 100 211.90868167877198 1.6401403292528614
diff --git a/test/test-suite b/test/test-suite
index e17afcd..836676c 100755
--- a/test/test-suite
+++ b/test/test-suite
@@ -8,24 +8,25 @@
# to be executed from the current folder
#
#
+from __future__ import print_function
-import unittest
+import codecs
+import locale
+import os
import random
-import json
-from json import JSONEncoder
-from subprocess import PIPE, Popen, STDOUT
import sys
-import os
import time
+import unittest
+from gzip import GzipFile
+from subprocess import PIPE, Popen
from tempfile import NamedTemporaryFile
-import locale
-import pprint
+
import six
from six.moves import range
-import codecs
sys.path.append(os.path.join(os.path.abspath(os.path.dirname(sys.argv[0])),'..','bin'))
-from qtextasdata import QTextAsData,QOutput,QOutputPrinter,QInputParams
+from qtextasdata import QTextAsData, QInputParams
+import itertools
# q uses this encoding as the default output encoding. Some of the tests use it in order to
# make sure that the output is correctly encoded
@@ -2343,6 +2344,160 @@ class BasicModuleTests(AbstractQTestCase):
self.assertTrue(table_structure.materialized_files['my_data'].is_stdin)
+class BenchmarkAttemptResults(object):
+ def __init__(self, attempt, lines, columns, duration,return_code):
+ self.attempt = attempt
+ self.lines = lines
+ self.columns = columns
+ self.duration = duration
+ self.return_code = return_code
+
+ def __str__(self):
+ return "{}".format(self.__dict__)
+ __repr__ = __str__
+
+class BenchmarkResults(object):
+ def __init__(self, lines, columns, attempt_results, mean, stddev):
+ self.lines = lines
+ self.columns = columns
+ self.attempt_results = attempt_results
+ self.mean = mean
+ self.stddev = stddev
+
+ def __str__(self):
+ return "{}".format(self.__dict__)
+ __repr__ = __str__
+
+class BenchmarkTests(AbstractQTestCase):
+
+ BENCHMARK_DIR = './_benchmark_data'
+
+ def _ensure_benchmark_data_dir_exists(self):
+ try:
+ os.mkdir(BenchmarkTests.BENCHMARK_DIR)
+ except Exception as e:
+ pass
+
+ def _create_benchmark_file_if_needed(self):
+ self._ensure_benchmark_data_dir_exists()
+
+ if os.path.exists('{}/benchmark-file.csv'.format(BenchmarkTests.BENCHMARK_DIR)):
+ return
+
+ g = GzipFile('unit-file.csv.gz')
+ d = g.read().decode('utf-8')
+ f = open('{}/benchmark-file.csv'.format(BenchmarkTests.BENCHMARK_DIR), 'w')
+ for i in range(100):
+ f.write(d)
+ f.close()
+
+ def _prepare_test_file(self, lines, columns):
+
+ filename = '{}/_benchmark_data__lines_{}_columns_{}.csv'.format(BenchmarkTests.BENCHMARK_DIR,lines, columns)
+
+ if os.path.exists(filename):
+ return filename
+
+ c = ['c{}'.format(x + 1) for x in range(columns)]
+
+ # write a header line
+ ff = open(filename,'w')
+ ff.write(",".join(c))
+ ff.write('\n')
+ ff.close()
+
+ r, o, e = run_command('head -{} {}/benchmark-file.csv | ../bin/q -d , "select {} from -" >> {}'.format(lines, BenchmarkTests.BENCHMARK_DIR, ','.join(c), filename))
+ self.assertEqual(r, 0)
+ return filename
+
+ def _decide_result(self,attempt_results):
+
+ failed = list(filter(lambda a: a.return_code != 0,attempt_results))
+
+ if len(failed) == 0:
+ mean = sum([x.duration for x in attempt_results]) / len(attempt_results)
+ sum_squared = sum([(x.duration - mean)**2 for x in attempt_results])
+ ddof = 0
+ pvar = sum_squared / (len(attempt_results) - ddof)
+ stddev = pvar ** 0.5
+ else:
+ mean = None
+ stddev = None
+
+ return BenchmarkResults(
+ attempt_results[0].lines,
+ attempt_results[0].columns,
+ attempt_results,
+ mean,
+ stddev
+ )
+
+ def _perform_test_performance_matrix(self,name,generate_cmd_function):
+ results = []
+
+ self._create_benchmark_file_if_needed()
+ for columns in [1, 5, 10, 20, 50, 100]:
+ for lines in [1, 10, 100, 1000, 10000, 100000, 1000000]:
+ attempt_results = []
+ for attempt in range(10):
+ filename = self._prepare_test_file(lines, columns)
+ if DEBUG:
+ print("Testing {}".format(filename))
+ t0 = time.time()
+ r, o, e = run_command(generate_cmd_function(filename,lines,columns))
+ duration = time.time() - t0
+ attempt_result = BenchmarkAttemptResults(attempt, lines, columns, duration, r)
+ attempt_results += [attempt_result]
+ if DEBUG:
+ print("Results: {}".format(attempt_result.__dict__))
+ final_result = self._decide_result(attempt_results)
+ results += [final_result]
+
+ series_fields = [six.u('lines'),six.u('columns')]
+ value_fields = [six.u('mean'),six.u('stddev')]
+
+ all_fields = series_fields + value_fields
+
+ output_filename = '{}.benchmark-results'.format(name)
+ output_file = open(output_filename,'w')
+ for columns,g in itertools.groupby(sorted(results,key=lambda x:x.columns),key=lambda x:x.columns):
+ x = six.u("\t").join(series_fields + [six.u('{}_{}').format(name, f) for f in value_fields])
+ print(x,file=output_file)
+ for result in g:
+ print(six.u("\t").join(map(str,[getattr(result,f) for f in all_fields])),file=output_file)
+ output_file.close()
+
+ print("results have been written to : {}".format(output_filename))
+ if DEBUG:
+ print("RESULTS FOR {}".format(name))
+ print(open(output_filename,'r').read())
+
+ def test_q_matrix(self):
+ venv = os.path.basename(os.environ.get('VIRTUAL_ENV') or 'unknown-virtual-env')
+
+ def generate_q_cmd(data_filename,line_count,column_count):
+ if column_count == 1:
+ additional_params = '-c 1'
+ else:
+ additional_params = ''
+ return '../bin/q -H -d , {} "select count(*) from {}"'.format(additional_params, data_filename)
+ self._perform_test_performance_matrix(venv,generate_q_cmd)
+
+ def test_textql_matrix(self):
+ def generate_textql_cmd(data_filename,line_count,column_count):
+ return 'textql -dlm , -sql "select count(*)" {}'.format(data_filename)
+ self._perform_test_performance_matrix('textql',generate_textql_cmd)
+
+ def test_octosql_matrix(self):
+ config_fn = self.random_tmp_filename('octosql', 'config')
+ def generate_octosql_cmd(data_filename,line_count,column_count):
+ j = """dataSources:\n - name: bmdata\n type: csv\n config:\n path: "{}"\n""".format(data_filename)
+ f = open(config_fn,'w')
+ f.write(j)
+ f.close()
+ return 'octosql -c {} -o table "select count(*) from bmdata a"'.format(config_fn)
+ self._perform_test_performance_matrix('octosql',generate_octosql_cmd)
+
def suite():
tl = unittest.TestLoader()
basic_stuff = tl.loadTestsFromTestCase(BasicTests)
diff --git a/test/unit-file.csv.gz b/test/unit-file.csv.gz
new file mode 100644
index 0000000..ade23b7
--- /dev/null
+++ b/test/unit-file.csv.gz
Binary files differ