butido - Sources of butido

Age	Commit message (Collapse)	Author
2021-11-19	Update Copyright string to 2020-2022	Matthias Beyer
	Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
2021-11-11	Restructure error output for more screen real estate	Matthias Beyer
	Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
2021-11-11	Add line which can be used to print the job log with butido	Matthias Beyer
	Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
2021-11-02	Fix missing fmt argument in error message	David Tolnay
	Without this, the error message would literally be "Running container {} failed" with curly braces in it instead of the container id. Signed-off-by: David Tolnay <dtolnay@gmail.com>
2021-09-16	Schedule to host where fewer containers are running	Matthias Beyer
	This is not perfect, of course, as we rather should check the load on the host rather than some artificial number of running containers (more failed jobs result in more running containers, which results in more scheduled to the host), but still, this is an improvement over the previous state because utilization did not express enough. So the issue was that the utilization only represents the utilization of an endpoint _from the current process_. If butido is called three times, the same host was selected for a job because the utilization in each butido process resulted in the same values for this endpoint. In practice, this resulted in a build for "gcc" to be scheduled to the same host when butido was executed three times for "gcc" (e.g. for different platforms). The endresult was a load of ~70 on a host where 24 was "100% load". This is not ideal. Checking the load of a host is not possible via the docker API, so this is what I came up with. Now, builds are scheduled to endpoints which have fewer running containers. This might not result in a perfect distribution (or even a "good" one), but still it might result in a better distribution of workload than in before this patch. Some completely non-artificial testing resulted in the expected behaviour. Signed-off-by: Matthias Beyer <matthias.beyer@atos.net> Tested-by: Matthias Beyer <matthias.beyer@atos.net>
2021-08-17	Fix: Construct logfile name properly	Matthias Beyer
	Previously we constructed <log dir>/<job id>/.log but the job id shouldn't be a directory name but the filename of the logfile. This patch fixes this bug. Signed-off-by: Matthias Beyer <matthias.beyer@atos.net> Tested-by: Matthias Beyer <matthias.beyer@atos.net>
2021-08-17	Add error context when opening logfile	Matthias Beyer
	Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
2021-05-31	Fix: Insert shouldn't be done with LogItem::display()	Matthias Beyer
	This fixes a nasty bug; because we used the LogItem::display() function to build the log string that was put into the database, the database contained _colorized_ log lines. Because of this, our parsing of the log text later (when reading historical data from the database) failed to parse whether the job was successfull or not. The issue was actually introduced in b3a6458ce34e3065192208826b2fc85edd4761f9, where we changed the `LogItem::display()` implementation. With this fix in place, the log text in the database is corrected again (only for new logs that get put into the database). Fixes: b3a6458ce34e3065192208826b2fc85edd4761f9 ("Add color support in LogItem::display() impl") Signed-off-by: Matthias Beyer <matthias.beyer@atos.net> Tested-by: Matthias Beyer <matthias.beyer@atos.net>
2021-05-20	Update dependency: indicatif 0.15 -> 0.16.1	Matthias Beyer
	0.16.1 has been published which fixes the bug introduced by 0.16.0, hence update this dependency. This reverts commit ddbd9629b3188c9f08d023829683c40ab9e1448b. Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
2021-05-18	Preallocate a reasonable large buffer	Matthias Beyer
	This patch changes the LogReceiver to preallocate a reasonable large buffer for holding logs in memory. Because of the fact that we have to write the log _in one piece_ to the database (there might be a way to append to the log field, but that's optimization of a problem we shouldn't have - we rather should use another approach for storing logs), we have to keep the log in memory until the job is finished. For keeping the log of a job in memory, we use a `Vec`. After consulting the documentation https://doc.rust-lang.org/stable/std/collections/index.html#when-should-you-use-which-collection https://doc.rust-lang.org/stable/std/collections/index.html#sequences we found that `Vec` should be appropriate here, although `VecDeque` might be an option as well because of O(1) insertion time (and we're only inserting). Still, because `Vec` has amortized constant time for adding elements at the end, this should be sufficient. Preallocating a reasonable large amount of elements could yield big benefits with only minor disadvantage. If the job fails early (after only a few lines of log output), there's memory wasted. Also, if we have a large number of jobs, we allocate a huge amount of memory before filling it up. Because we need to have that memory anyways (considering all jobs succeed), that is not really a disadvantage. Now, what is "reasonable large"? This value might be changed later on, but for now, I took this value from experience we had when using butido in practice: select AVG(length.l), MAX(length.l), MIN(length.l) FROM ( SELECT LENGTH(log_text) - LENGTH(REPLACE(log_text, chr(10), '')) AS l FROM jobs ) AS length +-----------------------+--------+-------+ \| avg \| max \| min \| \|-----------------------+--------+-------\| \| 2863.0497427101200686 \| 165213 \| 11 \| +-----------------------+--------+-------+ The max and min values seem to be extreme. Now, the values contain a lot of failed jobs and the maximum value (165k log lines is _huge_!) was also a bogus setup. Removing these from the equation, using only the successful jobs, gives us a not-so-different number: SELECT AVG(l.len), MAX(l.len), MIN(l.len) FROM ( SELECT LENGTH(log_text) - LENGTH(REPLACE(log_text, CHR(10), '')) AS len, STRPOS(log_text, 'BUTIDO:STATE:OK') AS okpos FROM JOBS ) AS l WHERE l.okpos != 0 AND l.len != 165213 +-----------------------+-------+-------+ \| avg \| max \| min \| \|-----------------------+-------+-------\| \| 2661.7306791569086651 \| 55422 \| 66 \| +-----------------------+-------+-------+ Using the average (arithmetic mean) as a baseline, we decided to go for 4096 (2^12) preallocated elements in the buffer, which should be reasonable. Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
2021-05-17	Revert "Update dependency: indicatif 0.15 -> 0.16"	Matthias Beyer
	This reverts the dependency update because indicatif 0.16 introduced a reachable `unreachable!()` statement in their sourcecode that we actually hit if we `--hide-bars`. This reverts commit 6ceccb679d9c2d19389c6c6eef792d8db9086f31. Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2021-05-11	Add endpoint name and container id in progress message	Matthias Beyer
	This patch adds the endpoint name and a (7 character) appreviated container id in the progress message, so that the user can see where the package is built. This reduces the compile+debug loop by one step in the case where the user does not want to inspect the logs anyways, because they can jump to the right container right away and do not have to `butido db job -sL <job id>` to find out where the job was running. Signed-off-by: Matthias Beyer <matthias.beyer@atos.net> Tested-by: Matthias Beyer <matthias.beyer@atos.net>
2021-05-11	Update dependency: indicatif 0.15 -> 0.16	Matthias Beyer
	Because the interfaces of indicatif have changed, this PR changes a lot of calls in the codebase. (Yay, moving all the things!) Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
2021-04-08	Fix: Update the progress bar each second	Matthias Beyer
	This patch tries to work around a longer-than-1-sec blocking future while waiting for the channel .recv() call in the LogReceiver. The issue at hand is: If the channel does not produce log output for longer than 1 second, the progress bar won't be `tick()`ed for that amount of time. That leads to progress bars that seem to block (no update of the time in the progress bar output), which might confuse users. This patch works around that by wrapping the recv() in a timeout future and then catch the timeout, ping the progress bar and try to `recv()` again. Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
2021-03-05	Move endpoint connection setup to utility module	Matthias Beyer
	Because we can re-use this function in our commandline endpoint maintenance commands implementation. Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
2021-03-04	Remove relative speed setting, select endpoint by utilization instead	Matthias Beyer
	This patch removes the "speed" setting from the configuration, which was introduced to set a relative speed for each endpoint, with the idea that the scheduler then would select a faster node preferably. Instead, the utilization of an endpoint is now calculated (number of running jobs vs allowed maximum jobs on the endpoint), and the endpoint with lower utilization is selected. Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
2021-03-02	Implement scheduling with max jobs per endpoint	Matthias Beyer
	This patch implements support for max jobs per endpoint. The number of running jobs on one endpoint are tracked with a wrapper around the Endpoint object, which increases the job counter on allocation and decreases it on deallocation. This way, the scheduler can know how many jobs are running on one endpoint and select the next endpoint accordingly. The loading/comparing is not perfect, so it might happen that more jobs run on one endpoint than configured, but this is the first step into the right direction. Also, the selection happens on a tokio job which runs in a loop{}. Because this almost blocks the whole executor thread, we use `tokio::task::yield_now()` as soon as there is no free endpoint anymore, to yield the execution to another future to free resources for doing actual work, not scheduling. Signed-off-by: Matthias Beyer <matthias.beyer@atos.net> Tested-by: Matthias Beyer <matthias.beyer@atos.net>
2021-03-01	Remove RwLock around Endpoint	Matthias Beyer
	Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
2021-02-25	Multiple release stores	Matthias Beyer
	This patch adds the ability to have more than one release store. With this patch, a user can (has to) configure release store names in the configuration file, and can then specify one of the configured names to release the artifacts to. This way, different release "channels" can be served, for example a stable channel and a rolling release channel (although "channel" is not in our wording). The code was adapted to be able to fetch releases from multiple release directories, in the crate::db::find_artifact implementation, so that re-using artifacts works across all release directories. Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
2021-02-22	Remove MergedStores	Matthias Beyer
	The concept of the MergedStores type was okay in the beginning, but it got more and more complex to use properly and most of the time, we used the release/staging stores directly anyways. So this removes the MergedStores type, which is a preparation for the change to have multiple release stores. Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
2021-02-17	Pass MergedStores to endpoint/jobs rather than only staging store	Matthias Beyer
	This patch changes the code so that the MergedStores object is known in the endpoints and job execution code. This is necessary, because the jobs must be able to load artifacts from the release store as well, for example if a library was released in another submit and we can reuse it because the script for that library didn't change. For that, the interface of StoreRoot::join() was changed - -> Result<FullArtifactPath<'a>> + -> Result<Option<FullArtifactPath<'a>>> whereas it returns an Ok(None) if the artifact requested does not exist. This was necessary to try-joining a path on the store root of the staging store, and if there is no such file continue with try-joining the path on the release store. The calling code got a bit more complex with that change, though. Next, the MergedStores got a `derive(Clone)` because clone()ing it is cheap (two `Arc`s) and necessary to pass it easily to the jobs. Each instance of the code where the staging store was consulted was changed to consult the the release store as well. Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
2021-02-15	Add more error context	Matthias Beyer
	Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
2021-02-15	Fix: Return script errors properly	Matthias Beyer
	This patch fixes error returning from the JobHandle::run() implementation. Somehow, in the many rewrites, the error returning resulted in code that did not differentiate between script run errors and scheduling errors. This patch fixes this by making the JobHandle::run() method return Result<Result<Vec<ArtifactPath>>> whereas the outer error is the scheduling result and the inner error is the script result. The calling code was adapted accordingly. Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
2021-02-08	Colorize error message	Matthias Beyer
	Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
2021-02-08	Remove `Artifact` type	Matthias Beyer
	This patch follows-up on the shrinking of the `Artifact` type and removes it entirely. The type is not needed. Only the `ArtifactPath` type is needed, which is a thin wrapper around `PathBuf`, ensuring that the path is relative to the store root. The `Artifact` type used `pom` to parse the name and version of the package from the `ArtifactPath` object it contained, which resulted in the restriction that the path must always be <name>-<version>... Which should not be a requirement and actually caused issues with a package named "foo-bar" (as an example). Signed-off-by: Matthias Beyer <matthias.beyer@atos.net> Tested-by: Matthias Beyer <matthias.beyer@atos.net>
2021-02-06	Fix: Do not set message when setting progress	Matthias Beyer
	Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
2021-02-02	Fix: Make sure that bar is moved to LogReceiver, drop it afterwards	Matthias Beyer
	This patch makes the bar moved to the LogReceiver instead of borrowing it to it. Because ProgressBar::clone() is cheap (according to indicatif documentation it is just an Arc<> holding the actual object), we can do this here without fearing the overhead. Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
2021-01-30	Update tokio: 0.2 -> 1.0, shiplift	Matthias Beyer
	Because tokio 1.0 does not ship with the Stream trait, this patch also introduces tokio_stream as new dependency. For more information, look here: https://docs.rs/tokio/1.0.3/tokio/stream/index.html Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2021-01-30	itertools: 0.9 -> 0.10	Matthias Beyer
	We don't need resiter::Map here anymore because itertools 0.10 provides a map_ok() extension. Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2021-01-25	Merge branch 'more-parallelism' into master	Matthias Beyer

2021-01-25	Make progress bar message format uniform	Matthias Beyer
	Signed-off-by: Matthias Beyer <matthias.beyer@atos.net> Tested-by: Matthias Beyer <matthias.beyer@atos.net>
2021-01-25	Let the JobHandle::run() return a Vec<Artifact>	Matthias Beyer
	Before that change, it returned the dbmodels::Artifact objects, for which we needed to fetch the filestore::Artifact again. This change removes that restriction (improving runtime, of course). Signed-off-by: Matthias Beyer <matthias.beyer@atos.net> Tested-by: Matthias Beyer <matthias.beyer@atos.net>
2021-01-25	Remove unnecessary qualification	Matthias Beyer
	Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
2021-01-19	Fix clippy: calling `as_bytes()` on a string literal	Matthias Beyer
	Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
2021-01-18	Post-format clippy cleanup	Matthias Beyer
	This patch cleans some rustfmt code that was formatted, but where clippy had something to complain: a scope was unnecessary here, so remove it. Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2021-01-18	Run `cargo fmt`	Matthias Beyer
	Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
2021-01-15	Fix clippy: using `clone` on a `Copy` type	Matthias Beyer
	Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2021-01-15	Fix clippy: useless conversion to the same type: `anyhow::Error`	Matthias Beyer
	Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2021-01-15	Fix style: Remove trailing whitespace	Matthias Beyer
	Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2021-01-15	Fix clippy: calls to `std::mem::drop` with a reference instead of an owned ↵	Matthias Beyer
	value. Dropping a reference does nothing Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2021-01-14	Update implementation for dedicated release table	Matthias Beyer
	This patch updates the implementation for the dedicated release table. First of all, it updates the model types and adds the new `crate::db::models::Release` type for representing releases in the database (which are just pointers to the released artifact). The `db artifacts` command is updated by using a LEFT JOIN with the "releases" table. The `release` command is updated as well, though the change was a bit more complex. First of all, a LEFT OUTER JOIN was added which is essentially the same as the filter for un-released artifacts from before the change. Then, a `.select()` call was added, so we don't load all the objects we don't need anyways. This reduces overhead not only with the database querying but also in the code here, as we do not allocate objects from the database we destruct right away. The updating of the artifact object in the database was changed to generate a new release object in the "releases" table, of course. Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
2021-01-13	Add LICENSE file and license headers	Matthias Beyer
	Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2021-01-04	Make use of JobHandle::create_job_run_error() to add verbose error message	Matthias Beyer
	Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
2021-01-04	Add JobHandle::create_job_run_error() helper function for constructing nice ↵	Matthias Beyer
	error message Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
2020-12-15	Refactor job running on endpoint	Matthias Beyer
	This commit refactors the job running in the endpoint implementation to be a multi-stage (multi-function-call) process. Before this patch, preparing, starting and running the container as well as shutdown of the container was one huge function. This was sub-optimal, because we had to accummulate information during the run and be extra-careful when returning from that function. Ultimatively, we did not succeed in doing a good job here, which resulted in data-loss if a script exited to early because the database-code was just not hit yet. Consider the following packaging script: exit 1 this does nothing. This does, in fact, not even notify butido that it errored. This is not a scientific scenario, because a valid script might exit in the first command (unintentionally) and thus result in the same as the above example. The issue here is, that this causes butido to continue operation as normal -> It tries to copy the artifacts from the container. This, of course, then fails and butido stops processing. Because of that, the job information is never written to the database, because butido exits before that can happen. With this patch, the whole chain of container-orchestration commands gets split up into multiple, chainable functions. There is a function to prepare the container, which returns information that can then be used to execute the container, which returns information that can be used to finalize the container. You get the idea. Because this is multi-staged now, information can be retrieved and processed _in between these steps_. This results in us being able to write information to the database as soon as possible, which is what we do now. Of course this is only the refactoring patch and butido runs with this. More fixes and minor tweaks in the whole processing chain might be required to make the whole process even smoother. But this can be the stepping stone for such improvements. Tested-by: Matthias Beyer <mail@beyermatthias.de> Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2020-12-14	Add more trace logging for database objects created	Matthias Beyer
	Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2020-12-10	Fix: Write job to database, even if it failed	Matthias Beyer
	This patch fixes an error where a failing (as in {{stage "ERR" "something"}}) job was not written to the database. The implementation and signatore of Endpoint::run_job() had to be altered for that, see code comment. Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2020-12-10	Remove feature: State message in "OK" case.	Matthias Beyer
	The "OK" case had a state message parameter. This was removed because "OK" does mean that everything went well. No need to have a message here. Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2020-12-10	Rewrite container error reporting	Matthias Beyer
	This patch rewrites the container error reporting to be more simple. The ContainerError type was rewritten to not wrap other errors anymore, but be the root error itself. It only has one variant now, because there's only one kind of error: "It didn't work". The reporting in the calling functions can now use anyhow::Result<_> instead of std::result::Result because of that. Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2020-12-09	Add package name and version in progress bar message	Matthias Beyer
	Signed-off-by: Matthias Beyer <mail@beyermatthias.de>