Skip to content

Cached Assets

Container-magic can download external resources (files, models, datasets) and cache them locally to avoid re-downloading on every build. Assets are defined at the root level of cm.yaml and copied into the image using copy: steps.

Use cases:

  • Machine learning models from HuggingFace or other sources
  • Large datasets
  • Pre-compiled binaries or libraries
  • Configuration files from remote sources

Configuration

Define assets at the root level of your cm.yaml:

names:
  image: my-project
  user: root

assets:
  - https://example.com/model.tar.gz
  - my-model.bin: https://huggingface.co/bert-base/resolve/main/model.safetensors

Each asset can be either:

  • A bare URL - the filename is derived from the URL path
  • A filename: url mapping - you choose the local filename

Then use copy: steps to place them in the image:

stages:
  base:
    from: python:3-slim
    steps:
      - copy: model.tar.gz /models/model.tar.gz
      - copy: my-model.bin /models/bert.safetensors

How It Works

  1. Run cm build - assets are downloaded (if not cached)
  2. Files cached in .cm-cache/assets/<hash>/ with metadata
  3. Use copy: steps to place cached files into the image
  4. Subsequent builds reuse cached files, skipping downloads

Cache Management

cm cache list    # List cached assets with size and URL
cm cache path    # Show cache directory location
cm cache clear   # Clear all cached assets

Example: ML Model in Production Image

names:
  image: ml-service
  user: nonroot

assets:
  - model.bin: https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/pytorch_model.bin

stages:
  base:
    from: pytorch/pytorch:latest
    steps:
      - pip:
          install:
            - transformers
            - flask
      - copy: model.bin /models/model.bin

  development:
    from: base

  production:
    from: base
    steps:
      - copy: workspace

Multiple Assets

names:
  image: ml-pipeline
  user: nonroot

assets:
  - tokenizer.json: https://example.com/tokenizer.json
  - model.safetensors: https://example.com/model.safetensors
  - config.json: https://example.com/config.json

stages:
  base:
    from: pytorch/pytorch:latest
    steps:
      - copy:
          - tokenizer.json /models/tokenizer.json
          - model.safetensors /models/model.safetensors
          - config.json /models/config.json

  development:
    from: base

  production:
    from: base

The copy: step accepts a list to copy multiple files. Each item follows the same source dest format.