Implement simple instance concurrency

The aim is to have multiple Peekaboo instances to run in parallel to achieve higher throughput. Duplicate concurrent analysis of the same sample is to be avoided. To that end we: Give each Peekaboo instance a unique id. Replace special analysis "result" inProgress with a separate table where we track each sample by checksum. There we also track the instance processing it. Record in-flight samples using a (hopefully) atomic INSERT operation that either succeeds or fails, thus reliably indicating whether we now "own" this sample for processing or another instance is currently working on it. Extend the in-flight logic in the JobQueue to also record samples in the in-flight database table. Add necessary logic to cope with another instance currently processing that sample. (Resubmission of those samples is currently tied to another analysis finishing which isn't optimal. We need a separate thread here to regularly re-try acquiring those samples.) Update the db schema to 4 so we get fresh and empty tables. Clear the current instance's in-flight samples from the database on shutdown as well. Do not save the analsysis info to the database upon registration of the sample and do not update the sample info upon submission to Cuckoo. The aim here seems to have been to be able to re-acquire Cuckoo jobs after a Peekaboo restart which AFAIK never worked and which we have abandoned as desired functionality for now in favour of getting concurrency right first.
author: Michael Weiser <michael.weiser@gmx.de> 2018-12-20 17:30:27 +0000
committer: Michael Weiser <michael.weiser@gmx.de> 2019-01-22 16:41:34 +0000
commit: 235860da666c3eb274bd8055a32740dc6e19300b (patch)
tree: 5acbcc1b2a77a610af1e732c63a13b13ec48a24b /peekaboo.conf.sample
parent: 88b02cdeb1d2542b6d794daafc120e5a042ef54f (diff)
1 files changed, 7 insertions, 0 deletions
diff --git a/peekaboo.conf.sample b/peekaboo.conf.sample
index 2e7dbb1..f5b93df 100644
--- a/peekaboo.conf.sample
+++ b/peekaboo.conf.sample
@@ -48,6 +48,13 @@ url              :    sqlite:////path/to/database.db
 # PostgreSQL
 # url             :    postgresql://user:password@host:port/database
 
+# if multiple instances are to run in parallel and avoid concurrent analysis of
+# the same sample, set instance_id to a nonzero positive unique integer value
+# on each instance and use the same networked DBMS instance (MySQL or
+# PostgreSQL) for all them. (SQLite is not a good choice for this.) Also, do
+# make really, really sure to provide unique IDs. Two instances using the same
+# ID will corrupt each others' records and there is no mechanism to detect this.
+instance_id: 0
 
 #
 # Cuckoo specific settings
author	Michael Weiser <michael.weiser@gmx.de>	2018-12-20 17:30:27 +0000
committer	Michael Weiser <michael.weiser@gmx.de>	2019-01-22 16:41:34 +0000
commit	235860da666c3eb274bd8055a32740dc6e19300b (patch)
tree	5acbcc1b2a77a610af1e732c63a13b13ec48a24b /peekaboo.conf.sample
parent	88b02cdeb1d2542b6d794daafc120e5a042ef54f (diff)