Age | Commit message (Collapse) | Author |
|
This patch reimplements the package orchestration functionality to rely
on a DAG rather than a tree.
A
/ \
B E
/ \ \
C D F
Before this change, the structure the packages were organized in for a
build was a tree.
That did work reasonable well for initial development of butido, because
this is a simple case and the implementation is rather simple, too.
But, packages and their dependencies are not always organized in a tree.
Most of the time, they are organized in a DAG:
.-> C -,
/ \
D > A
\ /
`-> B -´
This is a real-world example: A could be a common crypto-library that I
do not want to name here.
B and C could be libraries that use the said crypto-library and D could
be a program that use B and C.
Because said crypto-library builds rather long, building it twice and
throwing one result away is a no-go.
A DAG as organizational structure makes that issue go away entirely.
Also, we can later implement checks whether the DAG contains multiple
versions of the same library, if that is undesireable.
The change itself is rather big, frankly because it is a non-trivial
change the replace the whole data structure and its handling in the
orchestrator code.
First of all, we introduce the "daggy" library, which provides the DAG
implementation on top of the popular "petgraph" library.
The package `Tree` datastructure was replaced by a package `Dag`
datastructure. This type implements the heavy-lifting that is needed to
load a package and all its dependencies from the `Repository` object.
The `JobTree` was also reimplemented, but as `daggy::Dag` provides a
convenient `map()` function, its implementation which transforms the
package `Dag` into a job `Dag` is rather trivial.
`crate::job::Dag` then provides the convenience `iter()` function to
iterate over all elements in the DAG and providing a `JobDefinition`
object for each node.
The topology in which we traverse the DAG is not an issue, as we need to
create tasks for all `JobDefinition`s anyways, so we do not care about
traversal topology at all.
The `crate::package::Package` type got an `Hash` implementation, which
is necessary to keep track of the mappings while reading the DAG from
the repository.
The implementation does not create the edges between the nodes in the
DAG right when inserting, but afterwards.
To keep track of the `daggy::NodeIndex`es, it keeps a mapping
Package -> NodeIndex
in a Hashmap. Thus, `Package` must implement `std::hash::Hash`
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
Tested-by: Matthias Beyer <mail@beyermatthias.de>
squash! Reimplement as DAG
|
|
This removes the "tree" column from the "submits" table.
This is because we do not store the build-tree in the database anymore.
We don't actually need this feature and we can always re-build the tree
from an old commit in the repository.
Thus, this is not required anymore.
Also, it is less easy to do as soon as the internal implementation
changes from a "tree" structure to a "DAG" structure.
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
|
|
Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
|
|
Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
|
|
Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
|
|
Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
|
|
Because tokio 1.0 does not ship with the Stream trait, this patch also
introduces tokio_stream as new dependency.
For more information, look here:
https://docs.rs/tokio/1.0.3/tokio/stream/index.html
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
|
|
Before that change, it returned the dbmodels::Artifact objects, for which we
needed to fetch the filestore::Artifact again.
This change removes that restriction (improving runtime, of course).
Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
Tested-by: Matthias Beyer <matthias.beyer@atos.net>
|
|
This patch re-implements hashing using streams and buffered readers instead of
reading a full file to RAM before hashing it.
Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
|
|
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
|
|
This patch reimplements the running of the computed jobs.
The old implementation was structured as follows:
1. Compute a Tree of dependencies for the requested package
2. Make sets of this tree (see below)
3. For each set
3.1. Run set in parallel by submitting each job in the set to the scheduler
3.2. collect outputs and errors
3.3. Record outputs and return errors (if any)
The complexity here was the computing of the JobSets but also the running of
each job in a set in parallel.
The code was non-trivial to understand.
But that's not even the biggest concern with this approch.
Consider the following tree of jobs:
A
/ \
B E
/ \ \
C D F
/ \
G H
\
I
Each node here represents a package, the edges represent dependencies on the
lower-hanging package.
This tree would result in 5 sets of jobs:
[
[ I ]
[ G, H ]
[ C, D, F ]
[ B, E ]
[ A ]
]
because each "layer" in the tree would be run in parallel.
It can be easily seen, that in the tree from above, the jobs for [ I, G, D, C ]
can be run in parallel easily, because they do not have dependencies.
The reimplementation also has another (crucial) benefit: The implementation does
not depend on a structure of artifact path names anymore.
Before, the artifacts needed to have a name as follows:
<name of the package>-<version of the package>.<something>
which was extremely restrictive.
With the changes from this patch, the implementation does not depend on such
a format anymore.
Instead: Dependencies are associated with a job, by the output of jobs run
for dependent packages.
That means that, considering the above tree of packages:
deps_of(B) = outputs_of(job_for(C)) + outputs_of(job_for(D))
in text:
The dependencies of package B are the outputs of the job run for package C
plus the outputs of the job run for package D.
With that change in place, the outputs of a job run for a package can yield
arbitrary file names and as long as the build script for the package can process
them, everything is fine.
The new algorithm, that solves that issue, is rather simple:
1. Hold a list of errors
2. Hold a list of artifacts that were built
3. Hold a list of jobs that were run
4. Iterate over all jobs, filtered by
- If the job appears in the "already run jobs" list, ignore it
- If a job has dependencies (on outputs of other jobs) that do not appear in
the "already run jobs", ignore it (for now)
5. Run these jobs, and for each job:
5.1. Take the job UUID and put it in the "already run jobs" list.
5.2. Take the result of the job,
5.2.1. if it is an error, put it in the "list of errors"
5.2.2. if it is ok, put the artifact in the "list of artifacts"
6. if the list of errors is not empty, goto 9
7. if all jobs are in the "already run jobs" list, goto 9
8. goto 4
9. return all artifacts and all errors
Because this approach is fundamentally different than the previous approach, a
lot of things had to be rewritten:
- The `JobSet` type was complete removed
- There is a new type `crate::job:Tree` that gets built from the
`crate::package::Tree`
It is a mapping of a UUID (the job UUID) to a `JobDefinition`.
The `JobDefinition` type is
- A Job
- A list of UUIDs of other jobs, where this job depends on the outputs
It is therefore a mapping of `Job -> outputs(jobs_of(dependencies)`
The `crate::job::Tree` type is now responsible for building a `Job` object for
each `crate::package::Package` object from the `crate::package::Tree` object.
Because the `crate::package::Tree` object contains all required packages for
the complete built, the implementation of `crate::job::Tree::build_tree()`
does not check sanity.
It is assumed that the input tree to the function contains all mappings.
Despite the name `crate::job::Tree` ("Tree"), the actual structure stored in
the type is not a real tree.
- The `MergedStores::get_artifact_by_path()` function was adapted because in the
previous implementation, it used `StagingStore::load_from_path()`, which tried
to load the file from the filesystem and put it into the internal map, which
failed if it was already there.
The adaption checks if the artifact already exists in the internal map and
returns that object instead.
(For the release store accordingly)
- The interface of the `RunnableJob::build_from_job()` function was adapted, as
this function does not need to access the `MergedStores` object anymore to
load dependency-Artifacts from the filesystem.
Instead, these Artifacts are passed to the function now.
- The Orchestrator code
- Got a type alias `JobResult` which represents the result of a job run wich
is either
- A number of artifacts (for optimization reasons with their associated
database artifact entry)
- or an error with the job uuid that failed (again, for optimization
reasons)
- Got an implementation of the algorithm described above
- Got a new implementation of run_job(), which
- Fetches the pathes of dependency-artifacts from the database by using
the job uuids from the JobDefinition object
- Creates the RunnableJob object for that
- Schedules the RunnableJob object in the scheduler
- For each output artifact (database object representing it)
- get the filesystem Artifact object for it
Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
Tested-by: Matthias Beyer <matthias.beyer@atos.net>
|
|
All other subcommands should be able to run on an unclean repository, but the
build command should not.
Thus, move this function call from main() to the build() implementation.
Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
|
|
Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
|
|
|
|
Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
|
|
Because we continuously get blocking filesystem operations when the
implementation of the verification is async, simply remove the asyncness here
now.
This does not decrease performance (yet), because the function is called
concurrently with other calls anyways.
It blocks the tokio worker thread tho, thus maximum parallelism might be =
number of cores. :-(
Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
|
|
Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
|
|
This patch cleans some rustfmt code that was formatted, but where clippy
had something to complain: a scope was unnecessary here, so remove it.
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
|
|
Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
|
|
Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
|
|
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
|
|
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
|
|
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
|
|
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
|
|
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
|
|
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
|
|
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
|
|
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
|
|
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
|
|
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
|
|
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
|
|
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
|
|
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
|
|
with Default::default()
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
|
|
with Default::default()
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
|
|
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
|
|
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
|
|
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
|
|
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
|
|
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
|
|
and cannot be used with non-Vec-based slices
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
|
|
succinctly expressed by calling `.find(..)` instead
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
|
|
`package::tree::Tree`
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
|
|
This patch adds a feature where a source entry in a package can be marked for
manual download.
This gives us the ability to mask downloads which are hidden behind cruel
JavaScript bullshit bloat where a `curl` cannot access the remote file.
Signed-off-by: Matthias Beyer <matthias.beyer@atos.net>
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
|
|
This patch reimplements the tree-printing using the 'ptree' crate.
Because ptree wants the tree item to implement `Clone`, a wrapper type is
added which then implements `Clone` and `ptree::TreeItem`
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
|
|
|
|
|
|
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
|
|
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
|
|
This patch implements filtering for environment variables via a key-value pair.
The implementation is altered to build the database query with a
filter if the commandline flag was passed.
The implementation was also slightly changed to not fetch the Submit
object first, but filter for it.
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
|