GitHub Actions in Practice - Commune Engineer Blog

How do you do, fellow reader? My name is Aleksei, I'm a newbie here at Commmune and I'm writing this article on behalf of the Productivity team. We've just started our journey into the great automated and productive future, and here's our first task - replacing the old deployment flow. And as a first step, we are vivisecting the current "CI" workflow based on GitHub Actions in order to deliver every PR to prod as soon as it has passed all checks and auto tests.

Prerequisites

Our project is TypeScript monorepo with a file structure like this:

client Next app + express-based static client
server: internal API server for the client
public-api: public API server
shared: shared code
db-layer: db connection related code

We don't use a build system and manage everything using npm, standard Next compiler, and tsc for builds.

Old workflow

Our old workflow was pretty simple with a run on every push to PR or on commit to the master and develop branch with the single job under the hood:

name: CI

on:
  pull_request:
    types: [opened, synchronize, reopened, ready_for_review]
  push:
    branches: [develop, master]

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

jobs:
  static-code-analysis:
    runs-on: ubuntu-latest
    if: github.event.pull_request.draft == false
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-node@v3

      - name: Cache node modules
        id: cache-npm
        uses: actions/cache@v3
        env:
          cache-name: cache-node-modules
        with:
          path: |
            **/node_modules
          key: ${{ runner.os }}-build-${{ env.cache-name }}-${{ hashFiles('**/package-lock.json') }}
          restore-keys: |
            ${{ runner.os }}-build-${{ env.cache-name }}-
            ${{ runner.os }}-build-
            ${{ runner.os }}-
      - if: ${{ steps.cache-npm.outputs.cache-hit != 'true' }}
        name: Install dependencies on all repositories
        run: npm run all-install

      - name: Test client
        run: npm --prefix ./client run test

      - name: Test server
        run: npm --prefix ./server run test

      - name: Lint and type check
        run: npm run all-lint

      - name: Format check
        run: npm run check:prettier

      - name: Spell check
        run: npm run check:cspell

As you can see there are several problems with this workflow:

Static analysis for the entire project, no matter what was changed
The same with the unit tests, the workflow will run the tests for the client even when you change the README file or the server
No e2e and integration tests run for the application, closed and public API. As a result, the quality of the code merged into the development branch is low, and we often have to revert or hotfix.
Sequential run of the tests and linting

New unit tests and linting workflow

The brand-new concept was made to resolve the issues above. We've split our workflow by the directory structure into five and the one to skip required jobs for the files that are not present in the other workflows. So the typical workflow looks like this:

name: Client CI

env:
  app_name: client

on:
  pull_request:
    types: [opened, synchronize, reopened, ready_for_review]
    paths:
      - 'client/**'
      - '!**.md'
      - 'shared/**'
  push:
    paths:
      - 'client/**'
      - '!**.md'
      - 'shared/**'
    branches: [develop]

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

jobs:
  lint-and-test:
    runs-on: ubuntu-latest
    if: github.event.pull_request.draft == false
    steps:
      - uses: actions/checkout@v3

      - name: Bootstrap
        uses: './.github/actions/bootstrap'

      - name: Formatting
        uses: './.github/actions/format'
        with:
          dir: ${{env.app_name}}

      - name: Test
        working-directory: ${{ env.app_name }}
        run: npm run test

graph LR;
  job[Bootstrap] --> install-deps[Install Dependencies]
  job --> cache[Restore Cache]
  job --> timezone[Set Timezone]
  job --> setup-node[Setup node.js]

graph LR;
  job[Formatting] --> lint[ESlint + Typechecking]
  job --> prettier[Prettier]
  job --> cspell[CSpell]

For the shared and db-layer folders, we just omit the testing part.

E2E workflow

We used a matrix approach to run our Cypress tests in parallel. So it looks like this (some unimportant parts were changed or removed):

name: E2E tests

on:
  pull_request_review:
    types: [submitted]
  push:
    branches: [develop]

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

jobs:
  # To reduce costs, we run e2e tests just before merging to develop branch after 2 PR approvals
  should-run:
    if: github.event.review.state == 'approved'
    runs-on: ubuntu-latest
    outputs:
      approved: ${{ steps.approved.outputs.approved }}
    steps:
      - uses: './.github/actions/approvals'
        id: approved
  # Before the e2e run we don't have any images or artifacts of our project.
  # So we build our Next app here and reuse it in our matrix jobs
  build:
    runs-on: ubuntu-latest
    needs: should-run
    if: ${{needs.should-run.outputs.approved == 'true'}}
    steps:
      - uses: actions/checkout@v3
      - name: Bootstrap
        uses: './.github/actions/bootstrap'

      - name: Build client
        run: npm run build

      - uses: actions/upload-artifact@v3
        with:
          name: dist
          path: client/dist
          if-no-files-found: error
    e2e:
      strategy:
        fail-fast: false
        matrix:
          spec: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
      runs-on: ubuntu-latest
      needs: build
      steps:
        - uses: actions/checkout@v3

        - name: Bootstrap
          uses: './.github/actions/bootstrap'

        # For correct display of Japanese characters in Cypress
        - name: Install fonts-noto
          run: sudo apt install fonts-noto
          shell: bash

        # We don't use `service` feature of the GitHub actions to run e.g. `mysql` because using `service` is not as configurable as a docker-compose file and since we already use it for local development, just running the containers is pretty easy.
        - name: Run containers
          run: docker compose up

        # Run in background and save logs to the file
        - name: Run server
          run: npm run server > server-log${{matrix.spec}} &

        - uses: actions/download-artifact@v3
          with:
            name: dist
            path: client/dis

        # Same with the client
        - name: Run client
          run: npm run client > client-log${{matrix.spec}} &

        - name: Wait for server and client
          run: |
            npm run wait -- http://localhost:15000/echo -t 120000
            npm run wait -- http://localhost:3000 -t 60000

        # Here works the script that splits all our test files into 10 parts
        # and pass them as a list of parameters to the cypress run --spec
        - name: Run tests
          uses: './.github/actions/e2e'
          with:
            dir: application
            split-into: 10
            file-num: ${{matrix.spec}}

        # Save screenshots of failed tests and logs of the client and server
        - name: Save screenshots
          if: failure()
          uses: actions/upload-artifact@v3
          with:
            name: screenshots
            if-no-files-found: ignore
            path: e2e/cypress/screenshots/**

        - name: Save logs
          if: failure()
          uses: actions/upload-artifact@v3
          with:
            name: logs
            path: ./*-log*
    e2e-success:
      runs-on: ubuntu-latest
      needs: e2e
      steps:
        - name: All tests ok
          if: ${{ !(contains(needs.*.result, 'failure')) }}
          run: exit 0
        - name: Some tests failed
          if: ${{ contains(needs.*.result, 'failure') }}
          run: exit 1

Unfortunately you can't specify only e2e job as a branch protection check, because it uses a matrix and basically is not a single job but 10 different and you need to list all of them in the rule. Luckily there is a e2e-success job (thanks stack overflow) so we can just use it as a step and don't worry about matrix expansion in the future.

Also note that for correct display of Japanese characters using Cypress, you need to install Noto fonts, as the default Ubuntu used in GitHub Actions doesn't include them.

Money-saving technics

ChatGPT knows

Use it wisely, almost all the time you can ask it something like "please make it simple" or "simplify" and the suggested solution becomes much better. Reformulate your request if the answers seem strange or inappropriate. And of course don't forget to use your own head :smile:

We've used it to generate some useful bash scripts for splitting the tests into chunks and counting the average number of PRs the team merges each month, and in my opinion it's one of the best things you can use ChatGPT for. The generated scripts are very precise and customizable, and Chat definitely knows more git commands than any of us.

For example here is the generated script for the number of PRs (counted by the squash commits to the develop branch).

# Set the start and end dates for the period to calculate the average commits
start_date="2022-03-08"
end_date="2023-03-08"

# Get the total number of commits during the period
total_commits=$(git log --oneline --after="$start_date" --before="$end_date" | wc -l)

# Calculate the number of weeks between the start and end dates
start_timestamp=$(gdate -d "$start_date" +%s)
end_timestamp=$(gdate -d "$end_date" +%s)
total_weeks=$(( ($end_timestamp - $start_timestamp) / (7 * 24 * 3600) ))

# Calculate the average commits per week
average_commits=$(echo "scale=2; $total_commits / $total_weeks" | bc)

# Print the result
echo "The average number of commits per week between $start_date and $end_date is $average_commits."

And another one for splitting the tests.

find cypress/tests -type f -name "*.spec.ts" > $input_file
split --lines=$lines_per_file $input_file "./tmp/spec_" --numeric-suffixes=1

We've also tried to ask ChatGPT to improve the workflow itself and some complicated questions about GitHub Actions, but since Chat has limited knowledge, it can't help you with the latest changes.

Combine small jobs

You can combine linting and testing jobs into one, because linting itself doesn't take much time. But you also have to do a checkout and install/restore dependencies from cache if you run it in a separate job. So basically combining them doesn't make much of a difference in time, but it does allow you to save some money. It's also applicable to any small job, even if it does the task multiple times.

Move everything outside of the matrix

Especially builds and other expensive steps, but also don't forget about the previous point

Use extensions and reuse existing Actions

For VS Code you can use an GitHub Actions extension that will show you the errors in your workflow before the run (hello Jenkins).

And of course don't overthink your own solution if it exists on the market, just use it.

Problems

Lots of extensions use deprecated libs

A serious problem I faced during development was that many useful actions available on the market don't get updates for months or even years, and there are no fresh alternatives. So be sure to check the warnings on GitHub and be ready to replace them with your own solutions.

Path filtering and required jobs

Don't forget to use special workflows for skipped but required checks or you can't merge your PR that changes something beyond the path filtering conditions in your workflows.

Testing might be expensive

You can enjoy the power of free GitHub Actions access on a personal account, you can easily repeat the structure of your project and run similar workflows for testing purposes on the playground. It will also allow you to decrease feedback loop time.

Next steps

We have done a lot of work to improve our workflow, but I know we can do better, especially in execution time, so here is what I suggest to do in the future.

Optimize e2e tests run

As you saw in the script, our current e2e test splitting is very simple, it just uses find and split without any extra optimization logic.

And it's not the best solution because the execution time per job can vary significantly. So we can analyze Cypress logs and execution time and make the test distribution more rational.

Use a modern build system for monorepo

Now our dependency management is kind of chaotic, our packages are split between package.json files and some packages uses global dependencies with the local ones, so you need to install everything just to be sure that you can build and run e.g. only client. Also, it will be nice to have caching of the dependencies in the local environment and between the builds in the CI in a more sophisticated way. We can achieve this by using some of the modern monorepo build systems like nx or Turborepo.

Optimization of run services and containers

Now our dependency management is kind of messy, our packages are split between package.json files and some packages use global dependencies with the local ones, so you have to install everything just to be sure you can build and run e.g. just the client. It would also be nice to cache dependencies in the local environment or even use remote caching and between builds in the CI in a more sophisticated way. We can achieve this by using some of the modern monorepo build systems like nx or Turborepo.

In conclusion

As you can see we still have a lot of work to do and our CI is far from perfect, so these changes are just the start of something good. We will try to keep you updated about interesting findings during the use of GitHub Actions usage and about further improvements, thank you for reading this far!

If you want to help us with this difficult task and are not afraid of the actively changing startup environment, please check out our open positions. We are trying to make our environment more international and foreigner-friendly, and at this stage, your impact will be enormous. Let's build a great culture together!

commmune-careers.studio.site speakerdeck.com