1. Your build pipeline has a lot of hermetic actions.
2. To speed it up, you execute these actions remotely on isolated environments, cache the results and reuse when possible.
You might want to look into https://goo.gl/TB49ED and https://console.cloud.google.com/marketplace/details/google/... if you need a managed service to do just that.
This whole solution seems like a very custom way to use docker. You can already create custom Docker images with specific content, use multi-stage builds to cache layers, split pipelines up into sections that generate static assets and pull the latest ones based on a checksum of its inputs, etc. I think the cost of maintaining this solution is going to far outweigh that of just using existing tooling differently.
even if they don't use docker to run application in prod, it can be [ab]used to perform efficient build layer (build step) caching and distribution.
COPY Gemfile Gemfile.lock /src RUN bundle install
However with a build you might have A and B feed into C. If A changes and B hasn't you want to just build A and get B from cache.
A pattern that I like to use is two multistage build Dockerfiles- `dev` and `prod`.
The `dev` image has 2 stages. The first stage copies only files required by the package manager, such as `package.json` into another directory using a combination of `find` and `cp --parents`, then restores dependencies. The second stage copies the dependencies from the first stage and overlays source code. The `dev` image is then instantiated to run all tests.
The `prod` image also has 2 stages. The first stage starts with the `dev` image and publishes a production bundle to a directory. The second stage starts with a clean image and copies the production bundle from the first stage.
11 minutes is a great time reduction. (11*30 builds a day = 5.5 hours saved in total).
But I still am not sure what constitutes as a slow builds. I assume at some point there's an asymptotic curve of diminishing returns where in order to shave off a minute, the complexity of the pipeline increases dramatically (caching being a tricky example). So do y'all have any opinions on what makes a build slow for you?
We definitely need much more speed in running our pipeline.
The software is mostly C/C++ with a lot of internal dependencies.
Do you guys have any experience in that?
What is worth the complexity and what is not?
Build systems like Bazel are providing some way to ensure some level of isolation that provides correctness guarantees. It also provide an easy way to statically build a superset of the files needed for each step of the build, allowing to integrate with a distributed service. Some related documentation/discussions can be found starting from https://docs.bazel.build/versions/master/remote-execution.ht... Another good source of information is online videos from BazelCon: https://conf.bazel.build
This has been an increasingly difficult problem as more and more pipelines move to containers for testing and building. What other solutions have folks come up with?
Here are a few tricks I use.
- Use a fast build server. This sounds obvious but people try to cut cost for the wrong reasons. CPU matters when you are running a build. This is the reason I never liked travis CI because you could not pay them to give you faster servers; only to give you more servers and they used quite slow instances. When your laptop outperforms your CI server, something is deeply wrong.
- Run your CI/CD tooling in the same data center that your production and staging environments live in and avoid long network delays to move e.g. docker containers or other dependencies around the planet. Amazon is great for this as it has local mirrors for a lot of things that you probably need (e.g. ubuntu and red hat mirrors).
- Use build tools that do things concurrently. If you have multiple CPU cores and all but one of them are idling, that's lost time.
- Run tests in parallel. If you do this right, you can max out most of your CPU while your tests are running
- Learn to test asynchronously and avoid using sleep or other stop gap solutions where your tests is basically waiting for something else to catch up while blocking a thread for many seconds where it does absolutely nothing useful whatsoever. People set timeouts conservatively so most of that time is wasted. Consider polling instead.
- Avoid expensive cleanups in your integration test. I've seen completely trivial database applications take twenty minutes to run a few integration tests because somebody decided it was a good idea to rebuild the database schema in between tests. If your tests are dropping and recreating tables tables, you are going to increase your build time by many seconds for every test you add.
- Randomize test data to avoid tests interacting with each other. So, never re-use the same database ids or other identifiers and avoid having magical names. This helps you skip deleting data in between tests and can save a lot of time. Also, your real world system is likely to have more than 1 user and the point of integration tests is also finding issues related to broken assumptions related to people doing things at the same time.
- Dockerize your builds and use docker layers to your advantage. E.g. dependency resolving is only needed if the file that lists the dependencies actually changed. If you are merging pull requests, you can avoid double work because right after merge the branches are identical and the docker will be able to make use of that.
For reference, I have a kotlin project that builds and compiles in about 3 minutes on my laptop. This includes running a over 500 API integration tests running against Elasticsearch (running as an ephemeral docker container). None of the tests delete data (unless that is what we are testing). Our schema initializes just once.
A cold Docker build for this project on our CI server can take 15 minutes because it just takes that long to download remote docker layers, bootstrap all the stuff we need, download dependencies etc. However, most of our builds don't run cold and typically from commit to finished deploy takes around 6 minutes and it jumps straight into compiling and running tests. Our master branch deploys to a staging environment. When we merge master to our production branch to update production, the docker images start deploying almost immediately because it already built most of the layers it needs for the master branch and the branches are at this point identical. So a typical warm production push would jump straight to pushing out artifacts and be done in 2 minutes.
With other languages that build dependencies from source, doing that in a separate docker build step would probably be a good idea so you can cache the results as a separate docker layer.
IIUC, angular is considering (working on?) using Hazel under the hood to parallelize angular builds.