7. Services' Deployment

What I want?

The infrastructure should be running now: the last steps are to deploy our services.

Problems

traefik discovers the different services and nodes by requesting Consul. The refresh interval is set at 15 seconds by default.

This means that traefik will possibly be aware of a change only after 15 seconds and will keep sending traffic during this interval. This can of course results in errors (as the service could have been migrated to another node or simply shutdown on this node, etc.).

Solutions

We have to instruct Nomad to delay the shutdown of an allocation after it has been seen by traefik. We will use this option:

shutdown_delay = "20s"

The services

Traefik

See here

This website

The container

The sources of this webiste are available here:

  • it’s a Hugo site, so there are only static files
  • the Docker image is a nginx daemon
  • The Nomad job’s source can also be found here

See here for the Github Actions that creates the Docker image

The job

The different steps to deploy this website are:

  • on Github Actions, pushing a new tag that will trigger:

    • the generation of the pages
    • their copy in a Docker image stored at Github
  • manually running the Nomad job that will:

    • download the Docker image from Github
    • deploy it on the node VMs:
      • respecting the constraints
      • declaring a service in Consul with tags allowing traefik:
        • to route the traffic
        • to apply middlewares and Let’s Encrypt’s certificates
      • using a random port (passed to the container using an environment variable)

You can also notice that

Run the jobs

To start a job, you only need to call its file.

nomad run <job name>.nomad

Check the status

Check the status of the different jobs:

nomad job status
nomad job status <job name>

On important part is the stats about deployed instances:

Latest Deployment
ID          = e243a9eb
Status      = failed
Description = Failed due to progress deadline

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
http        0       0         2        7       69        0

Check the logs

It is also possible to access the logs of the different allocations.

Identify the allocation’s id:

nomad job status <job name>

You should have as a result:

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created     Modified
168711f9  867e4021  http        72       run      running  17h35m ago  17h30m ago
8b45c8c1  65dec082  http        72       run      running  17h35m ago  17h30m ago

For logs on stdout:

nomad alloc logs -f <alloc id>

For logs on stderr:

nomad alloc logs -f -stderr <alloc id>

Some screenshots

You should end now with everything working together:

“Consul”

“Traefik dashboard”