Preallocate a reasonable large buffer

This patch changes the LogReceiver to preallocate a reasonable large buffer for holding logs in memory. Because of the fact that we have to write the log _in one piece_ to the database (there might be a way to append to the log field, but that's optimization of a problem we shouldn't have - we rather should use another approach for storing logs), we have to keep the log in memory until the job is finished. For keeping the log of a job in memory, we use a `Vec`. After consulting the documentation https://doc.rust-lang.org/stable/std/collections/index.html#when-should-you-use-which-collection https://doc.rust-lang.org/stable/std/collections/index.html#sequences we found that `Vec` should be appropriate here, although `VecDeque` might be an option as well because of O(1) insertion time (and we're only inserting). Still, because `Vec` has amortized constant time for adding elements at the end, this should be sufficient. Preallocating a reasonable large amount of elements could yield big benefits with only minor disadvantage. If the job fails early (after only a few lines of log output), there's memory wasted. Also, if we have a large number of jobs, we allocate a huge amount of memory before filling it up. Because we need to have that memory anyways (considering all jobs succeed), that is not really a disadvantage. Now, what is "reasonable large"? This value might be changed later on, but for now, I took this value from experience we had when using butido in practice: select AVG(length.l), MAX(length.l), MIN(length.l) FROM ( SELECT LENGTH(log_text) - LENGTH(REPLACE(log_text, chr(10), '')) AS l FROM jobs ) AS length +-----------------------+--------+-------+ | avg | max | min | |-----------------------+--------+-------| | 2863.0497427101200686 | 165213 | 11 | +-----------------------+--------+-------+ The max and min values seem to be extreme. Now, the values contain a lot of failed jobs and the maximum value (165k log lines is _huge_!) was also a bogus setup. Removing these from the equation, using only the successful jobs, gives us a not-so-different number: SELECT AVG(l.len), MAX(l.len), MIN(l.len) FROM ( SELECT LENGTH(log_text) - LENGTH(REPLACE(log_text, CHR(10), '')) AS len, STRPOS(log_text, 'BUTIDO:STATE:OK') AS okpos FROM JOBS ) AS l WHERE l.okpos != 0 AND l.len != 165213 +-----------------------+-------+-------+ | avg | max | min | |-----------------------+-------+-------| | 2661.7306791569086651 | 55422 | 66 | +-----------------------+-------+-------+ Using the average (arithmetic mean) as a baseline, we decided to go for 4096 (2^12) preallocated elements in the buffer, which should be reasonable. Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
author: Matthias Beyer <matthias.beyer@atos.net> 2021-05-18 09:42:05 +0200
committer: Matthias Beyer <matthias.beyer@atos.net> 2021-05-18 12:25:46 +0200
commit: 416de5dfa882b3f71f758f2b1a3928a03d77be00 (patch)
tree: c4d1d6a53552e3a864d34e4c63d9534a24783256 /src/endpoint
parent: 03433b8875aec76bb2e7d5cc930b4281b26211d4 (diff)
1 files changed, 4 insertions, 0 deletions
diff --git a/src/endpoint/scheduler.rs b/src/endpoint/scheduler.rs
index 8ec8323..64aa3a7 100644
--- a/src/endpoint/scheduler.rs
+++ b/src/endpoint/scheduler.rs
@@ -333,6 +333,10 @@ impl<'a> LogReceiver<'a> {
     async fn join(mut self) -> Result<String> {
         let mut success = None;
         let mut accu = vec![];
+
+        // Reserve a reasonable amount of elements.
+        accu.reserve(4096);
+
         let mut logfile = self.get_logfile().await.transpose()?;
 
         // The timeout for the log-receive-timeout
author	Matthias Beyer <matthias.beyer@atos.net>	2021-05-18 09:42:05 +0200
committer	Matthias Beyer <matthias.beyer@atos.net>	2021-05-18 12:25:46 +0200
commit	416de5dfa882b3f71f758f2b1a3928a03d77be00 (patch)
tree	c4d1d6a53552e3a864d34e4c63d9534a24783256 /src/endpoint
parent	03433b8875aec76bb2e7d5cc930b4281b26211d4 (diff)