summaryrefslogtreecommitdiffstats
path: root/doc
diff options
context:
space:
mode:
authorHarel Ben-Attia <harelba@gmail.com>2021-10-22 16:42:36 +0300
committerHarel Ben-Attia <harelba@gmail.com>2021-10-22 16:42:36 +0300
commit5be7815463740aba17e0af4fd80bfb312d924d39 (patch)
tree262c09375d76dabe5a9cff110dbc76a058893fd9 /doc
parent7c0f84dbe2e5cd92760ae3bb01b7be14aad95f63 (diff)
man page and usage changes
Diffstat (limited to 'doc')
-rw-r--r--doc/USAGE.markdown82
1 files changed, 69 insertions, 13 deletions
diff --git a/doc/USAGE.markdown b/doc/USAGE.markdown
index bc43021..a1a51d4 100644
--- a/doc/USAGE.markdown
+++ b/doc/USAGE.markdown
@@ -3,13 +3,39 @@
## SYNOPSIS
`q <flags> <query>`
- Simplest execution is `q "SELECT * FROM myfile"` which prints the entire file.
+ Example Execution for a delimited file:
+ q "select * from myfile.csv"
+
+ Example Execution for an sqlite3 database:
+
+ q "select * from mydatabase.sqlite:::my_table_name"
+
+ or
+
+ q "select * from mydatabase.sqlite"
+
+ if the database file contains only one table
+
+ Auto-caching of delimited files can be activated through `-C readwrite` (writes new caches if needed) or `-C read` (only reads existing cache files)
+
+ Setting the default caching mode (`-C`) can be done by writing a `~/.qrc` file. See docs for more info.
+
## DESCRIPTION
-q allows performing SQL-like statements on tabular text data. Its purpose is to bring SQL expressive power to the Linux command line and to provide easy access to text as actual data.
+q's purpose is to bring SQL expressive power to the Linux command line and to provide easy access to text as actual data.
+
+q allows the following:
+
+* Performing SQL-like statements directly on tabular text data, auto-caching the data in order to accelerate additional querying on the same file
+* Performing SQL statements directly on multi-file sqlite3 databases, without having to merge them or load them into memory
Query should be an SQL-like query which contains filenames instead of table names (or - for stdin). The query itself should be provided as one parameter to the tool (i.e. enclosed in quotes).
+The following filename types are supported:
+
+* Delimited-file filenames, including relative/absolute paths
+* sqlite3 database filenames, with an additional `:::<table_name>` for accessing a specific table. If a database contains only one table, then denoting the table name is not needed. Examples: `mydatabase.sqlite3:::users_table` or `my_single_table_database.sqlite`.
+
Use `-H` to signify that the input contains a header line. Column names will be detected automatically in that case, and can be used in the query. If this option is not provided, columns will be named cX, starting with 1 (e.g. q "SELECT c3,c8 from ...").
Use `-d` to specify the input delimiter.
@@ -20,7 +46,7 @@ Please note that column names that include spaces need to be used in the query w
Query/Input/Output encodings are fully supported (and q tries to provide out-of-the-box usability in that area). Please use `-e`,`-E` and `-Q` to control encoding if needed.
-All sqlite3 SQL constructs are supported, including joins across files (use an alias for each table).
+All sqlite3 SQL constructs are supported, including joins across files (use an alias for each table), with the exception of CTE (for now).
See https://github.com/harelba/q for more details.
@@ -44,6 +70,27 @@ q can also get some runtime flags. The following parameters can be used, all opt
Options:
-h, --help show this help message and exit
-v, --version Print version
+ -V, --verbose Print debug info in case of problems
+ -S SAVE_DB_TO_DISK_FILENAME, --save-db-to-disk=SAVE_DB_TO_DISK_FILENAME
+ Save database to an sqlite database file
+ -C CACHING_MODE, --caching-mode=CACHING_MODE
+ Choose the autocaching mode (none/read/readwrite).
+ Autocaches files to disk db so further queries will be
+ faster. Caching is done to a side-file with the same
+ name of the table, but with an added extension .qsql
+ --dump-defaults Dump all default values for parameters and exit. Can
+ be used in order to make sure .qrc file content is
+ being read properly.
+ --max-attached-sqlite-databases=MAX_ATTACHED_SQLITE_DATABASES
+ Set the maximum number of concurrently-attached sqlite
+ dbs. This is a compile time definition of sqlite. q's
+ performance will slow down once this limit is reached
+ for a query, since it will perform table copies in
+ order to avoid that limit.
+ --overwrite-qsql=OVERWRITE_QSQL
+ When used, qsql files (both caches and store-to-db)
+ will be overwritten if they already exist. Use with
+ care.
Input Data Options:
-H, --skip-header Skip header row. This has been changed from earlier
@@ -52,6 +99,8 @@ Options:
-d DELIMITER, --delimiter=DELIMITER
Field delimiter. If none specified, then space is used
as the delimiter.
+ -p, --pipe-delimited
+ Same as -d '|'. Added for convenience and readability
-t, --tab-delimited
Same as -d <tab>. Just a shorthand for handling
standard tab delimited file You can use $'\t' if you
@@ -87,16 +136,26 @@ Options:
escaping the double quote character. By default, you
can use \" inside double quoted fields to escape
double quotes. Mainly for backward compatibility.
+ --as-text Don't detect column types - All columns will be
+ treated as text columns
-w INPUT_QUOTING_MODE, --input-quoting-mode=INPUT_QUOTING_MODE
Input quoting mode. Possible values are all, minimal
and none. Note the slightly misleading parameter name,
and see the matching -W parameter for output quoting.
+ -M MAX_COLUMN_LENGTH_LIMIT, --max-column-length-limit=MAX_COLUMN_LENGTH_LIMIT
+ Sets the maximum column length.
+ -U, --with-universal-newlines
+ Expect universal newlines in the data. Limitation: -U
+ works only with regular files for now, stdin or .gz
+ files are not supported yet.
Output Options:
-D OUTPUT_DELIMITER, --output-delimiter=OUTPUT_DELIMITER
Field delimiter for output. If none specified, then
the -d delimiter is used if present, or space if no
delimiter is specified
+ -P, --pipe-delimited-output
+ Same as -D '|'. Added for convenience and readability.
-T, --tab-delimited-output
Same as -D <tab>. Just a shorthand for outputting tab
delimited output. You can use -D $'\t' if you want.
@@ -121,6 +180,8 @@ Options:
nonnumeric and none. Note the slightly misleading
parameter name, and see the matching -w parameter for
input quoting.
+ -L, --list-user-functions
+ List all user functions
Query Related Options:
-q QUERY_FILENAME, --query-filename=QUERY_FILENAME
@@ -130,31 +191,26 @@ Options:
-Q QUERY_ENCODING, --query-encoding=QUERY_ENCODING
query text encoding. Experimental. Please send your
feedback on this
-````
+```
### Table names
The table names are the actual file names that you want to read from. Path names are allowed. Use "-" if you want to read from stdin (e.g. `q "SELECT * FROM -"`)
-Multiple files can be concatenated by using one of both of the following ways:
-
-* Separating the filenames with a + sign: `SELECT * FROM datafile1+datafile2+datefile3`.
-* Using glob matching: `SELECT * FROM mydata*.dat`
+Wildcard matches are supported - For example: `SELECT ... FROM ... mydata*.dat`
Files with .gz extension are considered to be gzipped and decompressed on the fly.
### Parsing Modes
-q supports multiple parsing modes:
+q supports two parsing modes:
* `relaxed` - This is the default mode. It tries to lean towards simplicity of use. When a row doesn't contains enough columns, they'll be filled with nulls, and when there are too many, the extra values will be merged to the last column. Defining the number of expected columns in this mode is done using the `-c` parameter. If it is not provided, then the number of columns is detected automatically (In most use cases, there is no need to specify `-c`)
* `strict` - Strict mode is for hardcore csv/tsv parsing. Whenever a row doesn't contain the proper number of columns, processing will stop. `-c` must be provided when using this mode
-* `fluffy` - This mode should not be used, and is just some kind of "backward compatible" parsing mode which was used by q previously. It's left as a separate parsing mode on purpose, in order to accommodate existing users. If you are such a user, please open a bug for your use case, and I'll see how I can incorporate it into the other modes. It is reasonable to say that this mode will be removed in the future.
### Output formatting option
The format of F is as a list of X=f separated by commas, where X is a column number and f is a python format:
* X - column number - This is the SELECTed column (or expression) number, not the one from the original table. E.g, 1 is the first SELECTed column, 3 is the third SELECTed column.
-* f - A python formatting string - See http://docs.python.org/release/2.4.4/lib/typesseq-strings.html for details if needed.
-** Example: `-f 3=%-10s,5=%4.3f,1=%x`
+* f - A python formatting string such as {} - See https://www.w3schools.com/python/ref_string_format.asp for details if needed.
## EXAMPLES
Example 1: `ls -ltrd * | q "select c1,count(1) from - group by c1"`
@@ -181,7 +237,7 @@ Harel Ben-Attia (harelba@gmail.com)
Any feedback/suggestions/complaints regarding this tool would be much appreciated. Contributions are most welcome as well, of course.
## COPYRIGHT
-Copyright (C) 2012--2014 Harel Ben Attia
+Copyright (C) 2012--2021 Harel Ben Attia
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3, or (at your option) any later version.