diff options
author | Matthias Beyer <mail@beyermatthias.de> | 2020-12-15 09:39:54 +0100 |
---|---|---|
committer | Matthias Beyer <mail@beyermatthias.de> | 2020-12-15 10:14:26 +0100 |
commit | f4592ab5d21f4e4dcd23f0153699b27b687bfc65 (patch) | |
tree | e19fb185708ae441e0c07aadabb7e07395383288 /src/endpoint/scheduler.rs | |
parent | 352390565280165af60a6597247d4ed4008482f6 (diff) |
Refactor job running on endpoint
This commit refactors the job running in the endpoint implementation to
be a multi-stage (multi-function-call) process.
Before this patch, preparing, starting and running the container as well
as shutdown of the container was one huge function.
This was sub-optimal, because we had to accummulate information during
the run and be extra-careful when returning from that function.
Ultimatively, we did not succeed in doing a good job here, which
resulted in data-loss if a script exited to early because the
database-code was just not hit yet.
Consider the following packaging script:
exit 1
this does nothing. This does, in fact, not even notify butido that it
errored. This is not a scientific scenario, because a valid script might
exit in the first command (unintentionally) and thus result in the same
as the above example.
The issue here is, that this causes butido to continue operation as
normal -> It tries to copy the artifacts from the container. This, of
course, then fails and butido stops processing.
Because of that, the job information is never written to the database,
because butido exits before that can happen.
With this patch, the whole chain of container-orchestration commands
gets split up into multiple, chainable functions.
There is a function to prepare the container, which returns information
that can then be used to execute the container, which returns
information that can be used to finalize the container.
You get the idea.
Because this is multi-staged now, information can be retrieved and
processed _in between these steps_.
This results in us being able to write information to the database as
soon as possible, which is what we do now.
Of course this is only the refactoring patch and butido runs with this.
More fixes and minor tweaks in the whole processing chain might be
required to make the whole process even smoother.
But this can be the stepping stone for such improvements.
Tested-by: Matthias Beyer <mail@beyermatthias.de>
Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
Diffstat (limited to 'src/endpoint/scheduler.rs')
-rw-r--r-- | src/endpoint/scheduler.rs | 24 |
1 files changed, 15 insertions, 9 deletions
diff --git a/src/endpoint/scheduler.rs b/src/endpoint/scheduler.rs index 74155e7..25165fa 100644 --- a/src/endpoint/scheduler.rs +++ b/src/endpoint/scheduler.rs @@ -138,8 +138,11 @@ impl JobHandle { let envs = self.create_env_in_db()?; let job_id = self.job.uuid().clone(); trace!("Running on Job {} on Endpoint {}", job_id, ep.name()); - let res = ep - .run_job(self.job, log_sender, self.staging_store); + let running_container = ep.prepare_container(self.job, self.staging_store.clone()) + .await? + .start() + .await? + .execute_script(log_sender); let logres = LogReceiver { package_name: &package.name, @@ -150,19 +153,22 @@ impl JobHandle { bar: &self.bar, }.join(); - let (res, logres) = tokio::join!(res, logres); - - trace!("Found result for job {}: {:?}", job_id, res); + let (run_container, logres) = tokio::join!(running_container, logres); let log = logres.with_context(|| anyhow!("Collecting logs for job on '{}'", ep.name()))?; - let (paths, container_hash, script) = res.with_context(|| anyhow!("Error during running job on '{}'", ep.name()))?; - - let job = dbmodels::Job::create(&self.db, &job_id, &self.submit, &endpoint, &package, &image, &container_hash, &script, &log)?; + let run_container = run_container.with_context(|| anyhow!("Running container {} failed"))?; + let job = dbmodels::Job::create(&self.db, &job_id, &self.submit, &endpoint, &package, &image, &run_container.container_hash(), run_container.script(), &log)?; trace!("DB: Job entry for job {} created: {}", job.uuid, job.id); for env in envs { let _ = dbmodels::JobEnv::create(&self.db, &job, &env)?; } - let paths = paths?; + + let res : crate::endpoint::FinalizedContainer = run_container + .finalize(self.staging_store.clone()) + .await?; + trace!("Found result for job {}: {:?}", job_id, res); + let (paths, res) = res.unpack(); + let _ = res.with_context(|| anyhow!("Error during running job on '{}'", ep.name()))?; // Have to do it the ugly way here because of borrowing semantics let mut r = vec![]; |