summaryrefslogtreecommitdiffstats
path: root/src/job
AgeCommit message (Collapse)Author
2021-11-19Update Copyright string to 2020-2022Matthias Beyer
Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
2021-10-19Remove unused fieldMatthias Beyer
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2021-08-12Fix clippy: Remove needless borrowsMatthias Beyer
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2021-04-08Remove pub(super) on Job membersMatthias Beyer
Not yet perfectly nice, but almost there. Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2021-03-01Add feature to pass git author and git commit information to containerMatthias Beyer
This patch implements the feature to be able to pass author and commit hash information from the repository to the container. This can be used to set packager or package description commit hash inside the build container, if desired. Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
2021-03-01Refactor: Environment variable aggregationMatthias Beyer
This patch refactors the collecting of the environment variables in the `RunnableJob::build_from_job()` implementation as well as in the `RunnableJob::environemtn()` implementation. This results in fewer allocations, especially but not only because the `RunnableJob::environment()` function returns an iterator now and all clone() calls were removed. Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
2021-03-01Refactor: Collect environment resources before building resources for ↵Matthias Beyer
RunnableJob Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
2021-02-08Remove `Artifact` typeMatthias Beyer
This patch follows-up on the shrinking of the `Artifact` type and removes it entirely. The type is not needed. Only the `ArtifactPath` type is needed, which is a thin wrapper around `PathBuf`, ensuring that the path is relative to the store root. The `Artifact` type used `pom` to parse the name and version of the package from the `ArtifactPath` object it contained, which resulted in the restriction that the path must always be <name>-<version>... Which should not be a requirement and actually caused issues with a package named "foo-bar" (as an example). Signed-off-by: Matthias Beyer <matthias.beyer@atos.net> Tested-by: Matthias Beyer <matthias.beyer@atos.net>
2021-02-06Rewrite package organizational structure using DAGMatthias Beyer
This patch reimplements the package orchestration functionality to rely on a DAG rather than a tree. A / \ B E / \ \ C D F Before this change, the structure the packages were organized in for a build was a tree. That did work reasonable well for initial development of butido, because this is a simple case and the implementation is rather simple, too. But, packages and their dependencies are not always organized in a tree. Most of the time, they are organized in a DAG: .-> C -, / \ D > A \ / `-> B -´ This is a real-world example: A could be a common crypto-library that I do not want to name here. B and C could be libraries that use the said crypto-library and D could be a program that use B and C. Because said crypto-library builds rather long, building it twice and throwing one result away is a no-go. A DAG as organizational structure makes that issue go away entirely. Also, we can later implement checks whether the DAG contains multiple versions of the same library, if that is undesireable. The change itself is rather big, frankly because it is a non-trivial change the replace the whole data structure and its handling in the orchestrator code. First of all, we introduce the "daggy" library, which provides the DAG implementation on top of the popular "petgraph" library. The package `Tree` datastructure was replaced by a package `Dag` datastructure. This type implements the heavy-lifting that is needed to load a package and all its dependencies from the `Repository` object. The `JobTree` was also reimplemented, but as `daggy::Dag` provides a convenient `map()` function, its implementation which transforms the package `Dag` into a job `Dag` is rather trivial. `crate::job::Dag` then provides the convenience `iter()` function to iterate over all elements in the DAG and providing a `JobDefinition` object for each node. The topology in which we traverse the DAG is not an issue, as we need to create tasks for all `JobDefinition`s anyways, so we do not care about traversal topology at all. The `crate::package::Package` type got an `Hash` implementation, which is necessary to keep track of the mappings while reading the DAG from the repository. The implementation does not create the edges between the nodes in the DAG right when inserting, but afterwards. To keep track of the `daggy::NodeIndex`es, it keeps a mapping Package -> NodeIndex in a Hashmap. Thus, `Package` must implement `std::hash::Hash` Signed-off-by: Matthias Beyer <mail@beyermatthias.de> Tested-by: Matthias Beyer <mail@beyermatthias.de> squash! Reimplement as DAG
2021-02-02Add tracing outputMatthias Beyer
Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
2021-02-02Fix: fn does not have to be asyncMatthias Beyer
Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
2021-01-21Fix clippy: Do not clone() copy typeMatthias Beyer
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2021-01-21Remove RunnableJob::package_environment()Matthias Beyer
This functionality is not required anymore, as we put the whole package definition in the job script interpolation anyways. Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
2021-01-21Reimplement: Orchestrator::run()Matthias Beyer
This patch reimplements the running of the computed jobs. The old implementation was structured as follows: 1. Compute a Tree of dependencies for the requested package 2. Make sets of this tree (see below) 3. For each set 3.1. Run set in parallel by submitting each job in the set to the scheduler 3.2. collect outputs and errors 3.3. Record outputs and return errors (if any) The complexity here was the computing of the JobSets but also the running of each job in a set in parallel. The code was non-trivial to understand. But that's not even the biggest concern with this approch. Consider the following tree of jobs: A / \ B E / \ \ C D F / \ G H \ I Each node here represents a package, the edges represent dependencies on the lower-hanging package. This tree would result in 5 sets of jobs: [ [ I ] [ G, H ] [ C, D, F ] [ B, E ] [ A ] ] because each "layer" in the tree would be run in parallel. It can be easily seen, that in the tree from above, the jobs for [ I, G, D, C ] can be run in parallel easily, because they do not have dependencies. The reimplementation also has another (crucial) benefit: The implementation does not depend on a structure of artifact path names anymore. Before, the artifacts needed to have a name as follows: <name of the package>-<version of the package>.<something> which was extremely restrictive. With the changes from this patch, the implementation does not depend on such a format anymore. Instead: Dependencies are associated with a job, by the output of jobs run for dependent packages. That means that, considering the above tree of packages: deps_of(B) = outputs_of(job_for(C)) + outputs_of(job_for(D)) in text: The dependencies of package B are the outputs of the job run for package C plus the outputs of the job run for package D. With that change in place, the outputs of a job run for a package can yield arbitrary file names and as long as the build script for the package can process them, everything is fine. The new algorithm, that solves that issue, is rather simple: 1. Hold a list of errors 2. Hold a list of artifacts that were built 3. Hold a list of jobs that were run 4. Iterate over all jobs, filtered by - If the job appears in the "already run jobs" list, ignore it - If a job has dependencies (on outputs of other jobs) that do not appear in the "already run jobs", ignore it (for now) 5. Run these jobs, and for each job: 5.1. Take the job UUID and put it in the "already run jobs" list. 5.2. Take the result of the job, 5.2.1. if it is an error, put it in the "list of errors" 5.2.2. if it is ok, put the artifact in the "list of artifacts" 6. if the list of errors is not empty, goto 9 7. if all jobs are in the "already run jobs" list, goto 9 8. goto 4 9. return all artifacts and all errors Because this approach is fundamentally different than the previous approach, a lot of things had to be rewritten: - The `JobSet` type was complete removed - There is a new type `crate::job:Tree` that gets built from the `crate::package::Tree` It is a mapping of a UUID (the job UUID) to a `JobDefinition`. The `JobDefinition` type is - A Job - A list of UUIDs of other jobs, where this job depends on the outputs It is therefore a mapping of `Job -> outputs(jobs_of(dependencies)` The `crate::job::Tree` type is now responsible for building a `Job` object for each `crate::package::Package` object from the `crate::package::Tree` object. Because the `crate::package::Tree` object contains all required packages for the complete built, the implementation of `crate::job::Tree::build_tree()` does not check sanity. It is assumed that the input tree to the function contains all mappings. Despite the name `crate::job::Tree` ("Tree"), the actual structure stored in the type is not a real tree. - The `MergedStores::get_artifact_by_path()` function was adapted because in the previous implementation, it used `StagingStore::load_from_path()`, which tried to load the file from the filesystem and put it into the internal map, which failed if it was already there. The adaption checks if the artifact already exists in the internal map and returns that object instead. (For the release store accordingly) - The interface of the `RunnableJob::build_from_job()` function was adapted, as this function does not need to access the `MergedStores` object anymore to load dependency-Artifacts from the filesystem. Instead, these Artifacts are passed to the function now. - The Orchestrator code - Got a type alias `JobResult` which represents the result of a job run wich is either - A number of artifacts (for optimization reasons with their associated database artifact entry) - or an error with the job uuid that failed (again, for optimization reasons) - Got an implementation of the algorithm described above - Got a new implementation of run_job(), which - Fetches the pathes of dependency-artifacts from the database by using the job uuids from the JobDefinition object - Creates the RunnableJob object for that - Schedules the RunnableJob object in the scheduler - For each output artifact (database object representing it) - get the filesystem Artifact object for it Signed-off-by: Matthias Beyer <matthias.beyer@atos.net> Tested-by: Matthias Beyer <matthias.beyer@atos.net>
2021-01-21impl From<Artifact> for JobResourceMatthias Beyer
Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
2021-01-18Post-format clippy cleanupMatthias Beyer
This patch cleans some rustfmt code that was formatted, but where clippy had something to complain: a scope was unnecessary here, so remove it. Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2021-01-18Run `cargo fmt`Matthias Beyer
Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
2021-01-18Fix clippy: Remove redundant clone() callsMatthias Beyer
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2021-01-15Fix clippy: this `.into_iter()` call is equivalent to `.iter()` and will not ↵Matthias Beyer
consume the `Vec` Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2021-01-15Allow module inception in this instanceMatthias Beyer
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2021-01-15Fix clippy: writing `&Vec<_>` instead of `&[_]` involves one more reference ↵Matthias Beyer
and cannot be used with non-Vec-based slices Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2021-01-15Fix clippy: this `.into_iter()` call is equivalent to `.iter()` and will not ↵Matthias Beyer
consume the `Vec` Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2021-01-15Fix clippy: `.map().collect()` can be replaced with `.try_for_each()`Matthias Beyer
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2021-01-15Fix clippy: writing `&Vec<_>` instead of `&[_]` involves one more reference ↵Matthias Beyer
and cannot be used with non-Vec-based slices Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2021-01-15Fix clippy: redundant cloneMatthias Beyer
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2021-01-15Fix clippy: you should consider adding a `Default` implementation for ↵Matthias Beyer
`package::tree::Tree` Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2021-01-15Refactor: Chain iterators and collect at onceMatthias Beyer
This improves the collection building be reducing the asyncness. The stream is build first and then all futures are collected. Before this patch, two streams were collected and then appended. Not sure whether this improves runtime, but it certainly improves readability. Unfortunately, collecting the additional job resources right into the streams is not that easy and thus the resulting collection is simply extend()ed. Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2021-01-13Add LICENSE file and license headersMatthias Beyer
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2021-01-11Fix: Add environment from job definitionChristoph Prokop
Signed-off-by: Christoph Prokop <christoph.prokop@atos.net> Signed-off-by: Matthias Beyer <matthias.beyer@atos.net> Tested-by: Christoph Prokop <christoph.prokop@atos.net>
2020-12-14Refactor: Move phase module to packageMatthias Beyer
This is the right scope anyways. Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2020-12-08Implement checking of allowed environment variablesMatthias Beyer
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2020-12-08Move aggregation of environment variables to helper functionMatthias Beyer
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2020-12-08Use EnvironmentVariableName type for names of ENV variablesMatthias Beyer
This makes the typing a bit more helpful by using a type for the name of environment variables. Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2020-12-07Implement shebang overwritingMatthias Beyer
This patch implements the shebang overwriting functionality that was in a TODO note. It adds a `Shebang` type for it, which is a String wrapper. Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2020-12-07Remove obsolete TODOMatthias Beyer
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2020-12-07Remove unused variableMatthias Beyer
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2020-12-07Deny macro_use from external crateMatthias Beyer
Diesel is an exception here, because the generated src/schema.rs file does not automatically contain the necessary imports. All imports were added where necessary. Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2020-12-07Remove Job::add_resources() (unused)Matthias Beyer
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2020-12-07Remove passing of additional env variablesMatthias Beyer
This patch removes the passing around of additional environment variables that were specified on the commandline and adds them directly to the Job object instance upon creation. This does not result in a netto-loss of code, but in a netto-loss of complexity. For this to be possible, we had to derive Clone for `JobResource`, which we have to clone when creating the `Job` objects during the creation of the jobsets from the `Tree` object. Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2020-12-04Add strict script interpolationMatthias Beyer
This patch adds strict script interpolation, which means that the script interpolation will result in an error if a variable is referenced that does not exist. Before this patch, referencing an absent variable did result in an empty string, possibly resulting in an error at runtime. This feature is on by default. Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2020-12-03Allow multiple sources per packageMatthias Beyer
This patch implements multiple (unnamed) sources per package. This means that a package can have an array of sources. What was adapted to allow multiple sources per package: * Downloads are made in parallel now * The cache structure was changed to /<package>-<version>/<hash>.source * The UI was changed to contain the full `Package` struct (as JSON object) in a UI format string Tests were adapted. Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2020-12-03Cleanup importsMatthias Beyer
This patch cleans the imports, removes the unused ones and moves imports, wherever possible, to the outer scope. Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2020-12-03Remove unused function: JobSet::len()Matthias Beyer
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2020-11-16Pass package environment and package variables (as environment variables) to ↵Matthias Beyer
container Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2020-11-14Rewrite to use tokio::sync::RwLockMatthias Beyer
This patch rewrites the codebase to not use std::sync::RwLock, but tokio::sync::RwLock. tokios RwLock is an async RwLock, which is what we want in an async-await context. The more I use tokio, the more I understand what you should do and what you shouldn't do. Some parts of this patch are a rewrite, for example, JobSet::into_runables() was completely rewritten. That was necessary because the function used inside is `Runnable::build_from_job()`, which uses an RwLock internally, thus, gets `async` in this patch. Because of this, `JobSet::into_runables()` needed a complete rewrite as well. Because it is way more difficult than transforming the function to return an iterator of futures, this patch simply rewrites it to return a `Result<Vec<RunnableJob>>` instead. Internally, tokio jobs are submitted via the `futures::stream::FuturesUnordered<_>` now. This is not the most performant implementation for the problem at hand, but it is a reasonable simple one. Optimization could happen here, of course. Also, the implementation of resource preparation inside `RunnableJob::build_from_job()` got a rewrite using the same technique. Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2020-11-11Copy package source to container before running buildMatthias Beyer
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2020-11-08Remove unused imports, sort importsMatthias Beyer
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2020-11-07Add JobSet::len()Matthias Beyer
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2020-11-07Add runtime-dependencies in runnable jobMatthias Beyer
This patch adds the runtime dependencies into the runnable job. These might be required to execute tests for a package inside the container. Because of this we also copy them to the container. The patch also generalizes the preparing mechansim into a helper function and makes use of some nice iterator chaining to be more readable. Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
2020-11-06Add getter for RunnableJob::uuidMatthias Beyer
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>