Docker
We have two competing ways of building docker images
- Using skaffold to launch in cluster builds
- Using Google Cloud Build (possibly launched via Skaffold)
Neither approach currently works in all scenarios
Really big docker images (e.g. TensorFlow) see GoogleContainerTools/skaffold#7701 run into GKE AutoPilot cluster limits when building on GitHub
Concretely GKE Autopilot has a limit (10Gi) on ephemeral storage.
Kaniko seems to rely on ephemeral-storage in a fundamental way GoogleContainerTools/kaniko#2219
Mounting an ephemeral volume doesn’t seem to help
I haven’t fully tried GKE standard clusters to see if we can support large ephmeral-storage
Using Federated Login it should be able to securely connect to GCP from GitHub runners to trigger builds in GCP but we haven’t set that up yet
Hydros relies on Skaffold files and skaffold to trigger builds
Google Cloud Build supports Secret Manager and Cloud KMS for injecting credentials
Google Cloud Build supports triggers but it requires a Cloud Build File
- The tags field could potentially be used in place of labels in Skaffold files with Hydros
- Note: Labels still aren’t supported in skaffold configuration files GoogleContainerTools/skaffold#7425
Decision
Lets try to standardize on using GCB. The autopilot limits seem like the biggest blocker so getting GCB to work in all cases seems like the path of least resistance.
Kaniko On K8s
Running Kaniko on K8s is pretty straightforward. The main challenge is building the context e.g. on GCS. Hydros now handles that.
There should be some examples of trying to use Kaniko in aiengineering/gpuserving/kaniko_job.yaml
Docker and Google Artifact Registry
To authenticate to artifact registry
run
gcloud auth configure-docker
edit /Users/jlewi/.docker/config.json
and add artifact registry URLs
{
"auths": {},
"credsStore": "desktop",
"credHelpers": {
"asia.gcr.io": "gcloud",
"eu.gcr.io": "gcloud",
"gcr.io": "gcloud",
"marketplace.gcr.io": "gcloud",
"staging-k8s.gcr.io": "gcloud",
"us.gcr.io": "gcloud",
"us-west1-docker.pkg.dev": "gcloud",
"pkg.dev": "gcloud"
},
"currentContext": "desktop-linux"
}