diff options
author | Matthias Beyer <matthias.beyer@atos.net> | 2021-05-18 09:42:05 +0200 |
---|---|---|
committer | Matthias Beyer <matthias.beyer@atos.net> | 2021-05-18 12:25:46 +0200 |
commit | 416de5dfa882b3f71f758f2b1a3928a03d77be00 (patch) | |
tree | c4d1d6a53552e3a864d34e4c63d9534a24783256 /src/endpoint | |
parent | 03433b8875aec76bb2e7d5cc930b4281b26211d4 (diff) |
Preallocate a reasonable large buffer
This patch changes the LogReceiver to preallocate a reasonable large buffer for
holding logs in memory.
Because of the fact that we have to write the log _in one piece_ to the
database (there might be a way to append to the log field, but that's
optimization of a problem we shouldn't have - we rather should use another
approach for storing logs), we have to keep the log in memory until the job is
finished.
For keeping the log of a job in memory, we use a `Vec`. After consulting the
documentation
https://doc.rust-lang.org/stable/std/collections/index.html#when-should-you-use-which-collection
https://doc.rust-lang.org/stable/std/collections/index.html#sequences
we found that `Vec` should be appropriate here, although `VecDeque` might be an
option as well because of O(1) insertion time (and we're only inserting).
Still, because `Vec` has amortized constant time for adding elements at the end,
this should be sufficient.
Preallocating a reasonable large amount of elements could yield big benefits
with only minor disadvantage. If the job fails early (after only a few lines of
log output), there's memory wasted.
Also, if we have a large number of jobs, we allocate a huge amount of memory
before filling it up.
Because we need to have that memory anyways (considering all jobs succeed), that
is not really a disadvantage.
Now, what is "reasonable large"? This value might be changed later on, but for
now, I took this value from experience we had when using butido in practice:
select
AVG(length.l), MAX(length.l), MIN(length.l)
FROM
(
SELECT
LENGTH(log_text) - LENGTH(REPLACE(log_text, chr(10), '')) AS l
FROM
jobs
) AS length
+-----------------------+--------+-------+
| avg | max | min |
|-----------------------+--------+-------|
| 2863.0497427101200686 | 165213 | 11 |
+-----------------------+--------+-------+
The max and min values seem to be extreme. Now, the values contain a lot of
failed jobs and the maximum value (165k log lines is _huge_!) was also a bogus
setup.
Removing these from the equation, using only the successful jobs, gives us a
not-so-different number:
SELECT
AVG(l.len), MAX(l.len), MIN(l.len)
FROM
(
SELECT
LENGTH(log_text) - LENGTH(REPLACE(log_text, CHR(10), '')) AS len,
STRPOS(log_text, 'BUTIDO:STATE:OK') AS okpos
FROM JOBS
) AS l
WHERE
l.okpos != 0
AND
l.len != 165213
+-----------------------+-------+-------+
| avg | max | min |
|-----------------------+-------+-------|
| 2661.7306791569086651 | 55422 | 66 |
+-----------------------+-------+-------+
Using the average (arithmetic mean) as a baseline, we decided to go for 4096
(2^12) preallocated elements in the buffer, which should be reasonable.
Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
Diffstat (limited to 'src/endpoint')
-rw-r--r-- | src/endpoint/scheduler.rs | 4 |
1 files changed, 4 insertions, 0 deletions
diff --git a/src/endpoint/scheduler.rs b/src/endpoint/scheduler.rs index 8ec8323..64aa3a7 100644 --- a/src/endpoint/scheduler.rs +++ b/src/endpoint/scheduler.rs @@ -333,6 +333,10 @@ impl<'a> LogReceiver<'a> { async fn join(mut self) -> Result<String> { let mut success = None; let mut accu = vec![]; + + // Reserve a reasonable amount of elements. + accu.reserve(4096); + let mut logfile = self.get_logfile().await.transpose()?; // The timeout for the log-receive-timeout |