Reference

Arca

class arca.Arca(backend: Union[Callable, arca.backend.base.BaseBackend, str, arca.utils.NotSet] = NOT_SET, settings=None, single_pull=NOT_SET, base_dir=NOT_SET, ignore_cache_errors=NOT_SET)

Basic interface for communicating with the library, most basic operations should be possible from this class.

Available settings:

  • base_dir: Directory where cloned repositories and other files are stored (default: .arca)
  • single_pull: Clone/pull each repository only once per initialization (default: False)
  • ignore_cache_errors: Ignore all cache error initialization errors (default: False)
cache_key(repo: str, branch: str, task: arca.task.Task, git_repo: git.repo.base.Repo) → str

Returns the key used for storing results in cache.

current_git_hash(repo: str, branch: str, git_repo: git.repo.base.Repo, short: bool = False) → str
Parameters:
  • repo – Repo URL
  • branch – Branch name
  • git_repoRepo instance.
  • short – Should the short version be returned?
Returns:

Commit hash of the currently pulled version for the specified repo/branch

get_backend_instance(backend: Union[Callable, arca.backend.base.BaseBackend, str, arca.utils.NotSet]) → arca.backend.base.BaseBackend

Returns a backend instance, either from the argument or from the settings.

Raises:ArcaMisconfigured – If the instance is not a subclass of BaseBackend
get_files(repo: str, branch: str, *, depth: Optional[int] = 1, reference: Optional[pathlib.Path] = None) → Tuple[git.repo.base.Repo, pathlib.Path]

Either clones the repository if it’s not cloned already or pulls from origin. If single_pull is enabled, only pulls if the repo/branch combination wasn’t pulled again by this instance.

Parameters:
  • repo – Repo URL
  • branch – Branch name
  • depth – See run()
  • reference – See run()
Returns:

A Repo instance for the repo and a Path to the location where the repo is stored.

get_path_to_repo(repo: str) → pathlib.Path

Returns a Path to the location where all the branches from this repo are stored.

Parameters:repo – Repo URL
Returns:Path to where branches from this repository are cloned.
get_path_to_repo_and_branch(repo: str, branch: str) → pathlib.Path

Returns a Path to where this specific branch is stored on disk.

Parameters:
  • repo – Repo URL
  • branch – branch
Returns:

Path to where the specific branch from this repo is being cloned.

get_reference_repository(reference: Optional[pathlib.Path], repo: str) → Optional[pathlib.Path]

Returns a repository to use in clone command, if there is one to be referenced. Either provided by the user of generated from already cloned branches (master is preferred).

Parameters:
  • reference – Path to a local repository provided by the user or None.
  • repo – Reference for which remote repository.
get_repo(repo: str, branch: str, *, depth: Optional[int] = 1, reference: Optional[pathlib.Path] = None) → git.repo.base.Repo

Returns a Repo instance for the branch.

See run() for arguments descriptions.

make_region() → dogpile.cache.region.CacheRegion

Returns a CacheRegion based on settings.

  • Firstly, a backend is selected. The default is NullBackend <dogpile.cache.backends.null.NullBackend.
  • Secondly, arguments for the backends are generated. The arguments can be passed as a dict to the setting or as a json string. If the arguments aren’t a dict or aren’t convertible to a dict, ArcaMisconfigured is raised.
  • Lastly, the cache is tested if it works

All errors can be suppressed by the ignore_cache_errors setting.

Raises:
  • ModuleNotFoundError – In case dogpile has trouble importing the library needed for a backend.
  • ArcaMisconfigured – In case the cache is misconfigured in any way or the cache doesn’t work.
pull_again(repo: Optional[str] = None, branch: Optional[str] = None) → None

When single_pull is enables, tells Arca to pull again.

If repo and branch are not specified, pull again everything.

Parameters:
  • repo – (Optional) Pull again all branches from a specified repository.
  • branch – (Optional) When repo is specified, pull again only this branch from that repository.
Raises:

ValueError – If branch is specified and repo is not.

repo_id(repo: str) → str

Returns an unique identifier from a repo URL for the folder the repo is gonna be pulled in.

run(repo: str, branch: str, task: arca.task.Task, *, depth: Optional[int] = 1, reference: Union[pathlib.Path, str, None] = None) → arca.result.Result

Runs the task using the configured backend.

Parameters:
  • repo – Target git repository
  • branch – Target git branch
  • task – Task which will be run in the target repository
  • depth – How many commits back should the repo be cloned in case the target repository isn’t cloned yet. Defaults to 1, must be bigger than 0. No limit will be used if None is set.
  • reference – A path to a repository from which the target repository is forked, to save bandwidth, –dissociate is used if set.
Returns:

A Result instance with the output of the task.

Raises:
  • PullError – If the repository can’t be cloned or pulled
  • BuildError – If the task fails.
save_hash(repo: str, branch: str, git_repo: git.repo.base.Repo)

If single_pull is enabled, saves the current git hash of the specified repository/branch combination, to indicate that it shouldn’t be pull again.

should_cache_fn(value: arca.result.Result) → bool

Returns if the result value should be cached. By default, always returns True, can be overriden.

static_filename(repo: str, branch: str, relative_path: Union[str, pathlib.Path], *, depth: Optional[int] = 1, reference: Union[pathlib.Path, str, None] = None) → pathlib.Path

Returns an absolute path to where a file from the repo was cloned to.

Parameters:
  • repo – Repo URL
  • branch – Branch name
  • relative_path – Relative path to the requested file
  • depth – See run()
  • reference – See run()
Returns:

Absolute path to the file in the target repository

Raises:
validate_depth(depth: Optional[int]) → Optional[int]

Converts the depth to int and validates that the value can be used.

Raises:ValueError – If the provided depth is not valid
validate_reference(reference: Union[pathlib.Path, str, None]) → Optional[pathlib.Path]

Converts reference to Path

Raises:ValueError – If reference can’t be converted to Path.
validate_repo_url(repo: str)

Validates repo URL - if it’s a valid git URL and if Arca can handle that type of repo URL

Raises:ValueError – If the URL is not valid

Task

class arca.Task(entry_point: str, *, timeout: int = 5, args: Optional[Iterable[Any]] = None, kwargs: Optional[Dict[str, Any]] = None)

A class for defining tasks the run in the repositories. The task is defined by an entry point, timeout (5 seconds by default), arguments and keyword arguments. The class uses entrypoints.EntryPoint to load the callables. As apposed to EntryPoint, only objects are allowed, not modules.

Let’s presume we have this function in a package library.module:

def ret_argument(value="Value"):
    return value

This Task would return the default value:

>>> Task("library.module:ret_argument")

These two Tasks would returned an overridden value:

>>> Task("library.module:ret_argument", args=["Overridden value"])
>>> Task("library.module:ret_argument", kwargs={"value": "Overridden value"})
hash

Returns a SHA1 hash of the Task for usage in cache keys.

Result

class arca.Result(result: Union[str, Dict[str, Any]])

For storing results of the tasks. So far only has one attribute, output.

output = None

The output of the task

stderr = None

What the function wrote to stderr

stdout = None

What the function wrote to stdout

Backends

Abstract classes

class arca.BaseBackend(**settings)

Bases: object

Abstract class for all the backends, implements some basic functionality.

Available settings:

  • requirements_location: Relative path to the requirements file in the target repositories. Setting to None makes Arca ignore requirements. (default is requirements.txt)
  • requirements_timeout: The maximum time in seconds allowed for installing requirements. (default is 5 minutes, 300 seconds)
  • pipfile_location: The folder containing Pipfile and Pipfile.lock. Pipenv files take precedence over requirements files. Setting to None makes Arca ignore Pipenv files. (default is the root of the repository)
  • cwd: Relative path to the required working directory. (default is "", the root of the repo)
get_requirements_information(path: pathlib.Path) → Tuple[arca.backend.base.RequirementsOptions, Optional[str]]

Returns the information needed to install requirements for a repository - what kind is used and the hash of contents of the defining file.

get_setting(key, default=NOT_SET)

Gets a setting for the key.

Raises:KeyError – If the key is not set and default isn’t provided.
get_settings_keys(key)

Parameters can be set through two settings keys, by a specific setting (eg. ARCA_DOCKER_BACKEND_KEY) or a general ARCA_BACKEND_KEY. This function returns the two keys that can be used for this setting.

static hash_file_contents(requirements_option: arca.backend.base.RequirementsOptions, path: pathlib.Path) → str

Returns a SHA256 hash of the contents of path combined with the Arca version.

inject_arca(arca)

After backend is set for a Arca instance, the instance is injected to the backend, so settings can be accessed, files accessed etc. Also runs settings validation of the backend.

run(repo: str, branch: str, task: arca.task.Task, git_repo: git.repo.base.Repo, repo_path: pathlib.Path) → arca.result.Result

Executes the script and returns the result.

Must be implemented by subclasses.

Parameters:
  • repo – Repo URL
  • branch – Branch name
  • task – The requested Task
  • git_repo – A Repo of the repo/branch
  • repo_pathPath to the location where the repo is stored.
Returns:

The output of the task in a Result instance.

serialized_task(task: arca.task.Task) → Tuple[str, str]

Returns the name of the task definition file and its contents.

snake_case_backend_name

CamelCase -> camel_case

class arca.backend.base.BaseRunInSubprocessBackend(**settings)

Bases: arca.backend.base.BaseBackend

Abstract class for backends which run scripts in subprocess.

get_or_create_environment(repo: str, branch: str, git_repo: git.repo.base.Repo, repo_path: pathlib.Path) → str

Abstract method which must be implemented in subclasses, which must return a str path to a Python executable which will be used to run the script.

See BaseBackend.run to see arguments description.

run(repo: str, branch: str, task: arca.task.Task, git_repo: git.repo.base.Repo, repo_path: pathlib.Path) → arca.result.Result

Gets a path to a Python executable by calling the abstract method get_image_for_repo and runs the task using subprocess.Popen

See BaseBackend.run to see arguments description.

Current environment

class arca.CurrentEnvironmentBackend(**settings)

Bases: arca.backend.base.BaseRunInSubprocessBackend

Uses the current Python to run the tasks, however they’re launched in a subprocess.

The requirements of the repository are completely ignored.

get_or_create_environment(repo: str, branch: str, git_repo: git.repo.base.Repo, repo_path: pathlib.Path) → str

Returns the path to the current Python executable.

Python virtual environment

class arca.VenvBackend(**settings)

Bases: arca.backend.base.BaseRunInSubprocessBackend

Uses Python virtual environments (see venv), the tasks are then launched in a subprocess. The virtual environments are shared across repositories when they have the exact same requirements. If the target repository doesn’t have requirements, it also uses a virtual environment, but just with no extra packages installed.

There are no extra settings for this backend.

get_or_create_environment(repo: str, branch: str, git_repo: git.repo.base.Repo, repo_path: pathlib.Path) → str

Handles the requirements in the target repository, returns a path to a executable of the virtualenv.

get_or_create_venv(path: pathlib.Path) → pathlib.Path

Gets the location of the virtualenv from get_virtualenv_path(), checks if it exists already, creates it and installs requirements otherwise. The virtualenvs are stored in a folder based on the Arca base_dir setting.

Parameters:pathPath to the cloned repository.
get_virtualenv_path(requirements_option: arca.backend.base.RequirementsOptions, requirements_hash: Optional[str]) → pathlib.Path

Returns the path to the virtualenv the current state of the repository.

Docker

class arca.DockerBackend(**kwargs)

Bases: arca.backend.base.BaseBackend

Runs the tasks in Docker containers.

Available settings:

  • python_version - set a specific version, current env. python version by default
  • keep_container_running - stop the container right away (default) or keep it running
  • apt_dependencies - a list of dependencies to install via apt-get
  • disable_pull - build all locally
  • inherit_image - instead of using the default base Arca image, use this one
  • use_registry_name - use this registry to store images with requirements and dependencies
  • registry_pull_only - only use the registry to pull images, don’t push updated
build_image(image_name: str, image_tag: str, repo_path: pathlib.Path, requirements_option: arca.backend.base.RequirementsOptions, dependencies: Optional[List[str]])

Builds an image for specific requirements and dependencies, based on the settings.

Parameters:
  • image_name – How the image should be named
  • image_tag – And what tag it should have.
  • repo_path – Path to the cloned repository.
  • requirements_option – How requirements are set in the repository.
  • dependencies – List of dependencies (in the formalized format)
Returns:

The Image instance.

Return type:

docker.models.images.Image

build_image_from_inherited_image(image_name: str, image_tag: str, repo_path: pathlib.Path, requirements_option: arca.backend.base.RequirementsOptions)

Builds a image with installed requirements from the inherited image. (Or just tags the image if there are no requirements.)

See build_image() for parameters descriptions.

Return type:docker.models.images.Image
check_docker_access()

Creates a DockerClient for the instance and checks the connection.

Raises:BuildError – If docker isn’t accessible by the current user.
container_running(container_name)

Finds out if a container with name container_name is running.

Returns:Container if it’s running, None otherwise.
Return type:Optional[docker.models.container.Container]
get_arca_base(pull=True)

Returns the name and tag of image that has the basic build dependencies installed with just pyenv installed, with no python installed. (Builds or pulls the image if it doesn’t exist locally.)

get_container_name(repo: str, branch: str, git_repo: git.repo.base.Repo)

Returns the name of the container used for the repo.

get_dependencies() → Optional[List[str]]

Returns the apt_dependencies setting to a standardized format.

Raises:ArcaMisconfigured – if the dependencies can’t be converted into a list of strings
Returns:List of dependencies, None if there are none.
get_dependencies_hash(dependencies)

Returns a SHA1 hash of the dependencies for usage in image names/tags.

get_image(image_name, image_tag)

Returns a Image instance for the provided name and tag.

Return type:docker.models.images.Image
get_image_for_repo(repo: str, branch: str, git_repo: git.repo.base.Repo, repo_path: pathlib.Path)

Returns an image for the specific repo (based on settings and requirements).

  1. Checks if the image already exists locally
  2. Tries to pull it from registry (if use_registry_name is set)
  3. Builds the image
  4. Pushes the image to registry so the image is available next time (if registry_pull_only is not set)

See run() for parameters descriptions.

Return type:docker.models.images.Image
get_image_name(repo_path: pathlib.Path, requirements_option: arca.backend.base.RequirementsOptions, dependencies: Optional[List[str]]) → str

Returns the name for images with installed requirements and dependencies.

get_image_tag(requirements_option: arca.backend.base.RequirementsOptions, requirements_hash: Optional[str], dependencies: Optional[List[str]]) → str

Returns the tag for images with the dependencies and requirements installed.

64-byte hexadecimal strings cannot be used as docker tags, so the prefixes are necessary. Double hashing the dependencies and requirements hash to make the final tag shorter.

Prefixes:

  • Image type:
    • i – Inherited image
    • a – Arca base image
  • Requirements:
    • r – Does have some kind of requirements
    • s – Doesn’t have requirements
  • Dependencies:
    • d – Does have dependencies
    • e – Doesn’t have dependencies

Possible outputs:

  • Inherited images:
    • ise – no requirements
    • ide_<hash(requirements)> – with requirements
  • From Arca base image:
    • <Arca version>_<Python version>_ase – no requirements and no dependencies
    • <Arca version>_<Python version>_asd_<hash(dependencies)> – only dependencies
    • <Arca version>_<Python version>_are_<hash(requirements)> – only requirements
    • <Arca version>_<Python version>_ard_<hash(hash(dependencies) + hash(requirements))> – both requirements and dependencies
get_image_with_installed_dependencies(image_name: str, dependencies: Optional[List[str]]) → Tuple[str, str]

Return name and tag of a image, based on the Arca python image, with installed dependencies defined by apt_dependencies.

Parameters:
  • image_name – Name of the image which will be ultimately used for the image.
  • dependencies – List of dependencies in the standardized format.
get_inherit_image() → Tuple[str, str]

Parses the inherit_image setting, checks if the image is present locally and pulls it otherwise.

Returns:Returns the name and the tag of the image.
Raises:ArcaMisconfiguration – If the image can’t be pulled from registries.
get_install_requirements_dockerfile(name: str, tag: str, repo_path: pathlib.Path, requirements_option: arca.backend.base.RequirementsOptions) → str

Returns the content of a Dockerfile that will install requirements based on the repository, prioritizing Pipfile or Pipfile.lock and falling back on requirements.txt files

get_or_build_image(name: str, tag: str, dockerfile: Union[str, Callable[..., str]], *, pull=True, build_context: Optional[pathlib.Path] = None)

A proxy for commonly built images, returns them from the local system if they exist, tries to pull them if pull isn’t disabled, otherwise builds them by the definition in dockerfile.

Parameters:
  • name – Name of the image
  • tag – Image tag
  • dockerfile – Dockerfile text or a callable (no arguments) that produces Dockerfile text
  • pull – If the image is not present locally, allow pulling from registry (default is True)
  • build_context – A path to a folder. If it’s provided, docker will build the image in the context of this folder. (eg. if ADD is needed)
get_python_base(python_version, pull=True)

Returns the name and tag of an image with specified python_version installed, if the image doesn’t exist locally, it’s pulled or built (extending the image from get_arca_base()).

get_python_version() → str

Returns either the specified version from settings or platform.python_version()

image_exists(image_name, image_tag)

Returns if the image exists locally.

push_to_registry(image, image_tag: str)

Pushes a local image to a registry based on the use_registry_name setting.

Raises:PushToRegistryError – If the push fails.
run(repo: str, branch: str, task: arca.task.Task, git_repo: git.repo.base.Repo, repo_path: pathlib.Path) → arca.result.Result

Gets or builds an image for the repo, gets or starts a container for the image and runs the script.

Parameters:
  • repo – Repository URL
  • branch – Branch ane
  • taskTask to run.
  • git_repoRepo of the cloned repository.
  • repo_pathPath to the cloned location.
start_container(image, container_name: str, repo_path: pathlib.Path)

Starts a container with the image and name container_name and copies the repository into the container.

Return type:docker.models.container.Container
stop_containers()

Stops all containers used by this instance of the backend.

tar_files(path: pathlib.Path) → bytes

Returns a tar with the git repository.

tar_runner()

Returns a tar with the runner script.

tar_task_definition(name: str, contents: str) → bytes

Returns a tar with the task definition.

Parameters:
  • name – Name of the file
  • contents – Contens of the definition, utf-8
try_pull_image_from_registry(image_name, image_tag)

Tries to pull a image with the tag image_tag from registry set by use_registry_name. After the image is pulled, it’s tagged with image_name:image_tag so lookup can be made locally next time.

Returns:A Image instance if the image exists, None otherwise.
Return type:Optional[docker.models.images.Image]
validate_configuration()

Validates the provided settings.

  • Checks inherit_image format.
  • Checks use_registry_name format.
  • Checks that apt_dependencies is not set when inherit_image is set.
Raises:ArcaMisconfigured – If some of the settings aren’t valid.

Vagrant

class arca.VagrantBackend(**kwargs)

Bases: arca.backend.docker.DockerBackend

Uses Docker in Vagrant.

Inherits settings from DockerBackend:

  • python_version
  • apt_dependencies
  • disable_pull
  • inherit_image
  • use_registry_name
  • keep_containers_running - applies for containers inside the VM, default being True for this backend

Adds new settings:

  • box - what Vagrant box to use (must include docker >= 1.8 or no docker), ailispaw/barge being the default
  • provider - what provider should Vagrant user, virtualbox being the default
  • quiet - Keeps the extra vagrant logs quiet, True being the default
  • keep_vm_running - Keeps the VM up until stop_vm() is called, False being the default
  • destroy - Destroy the VM (instead of halt) when stopping it, False being the default
ensure_vm_running(vm_location)

Gets or creates a Vagrantfile in vm_location and calls vagrant up if the VM is not running.

fabric_task

Returns a fabric task which executes the script in the Vagrant VM

get_vm_location() → pathlib.Path

Returns a directory where a Vagrantfile should be - folder called vagrant in the Arca base dir.

init_vagrant(vagrant_file)

Creates a Vagrantfile in the target dir, with only the base image pulled. Copies the runner script to the directory so it’s accessible from the VM.

inject_arca(arca)

Apart from the usual validation stuff it also creates log file for this instance.

run(repo: str, branch: str, task: arca.task.Task, git_repo: git.repo.base.Repo, repo_path: pathlib.Path)

Starts up a VM, builds an docker image and gets it to the VM, runs the script over SSH, returns result. Stops the VM if keep_vm_running is not set.

stop_containers()

Raises an exception in this backend, can’t be used. Stop the entire VM instead.

stop_vm()

Stops or destroys the VM used to launch tasks.

validate_configuration()

Runs arca.DockerBackend.validate_configuration() and checks extra:

  • box format
  • provider format
  • use_registry_name is set and registry_pull_only is not enabled.

Exceptions

exception arca.exceptions.ArcaException

Bases: Exception

A base exception from which all exceptions raised by Arca are subclassed.

exception arca.exceptions.ArcaMisconfigured

Bases: ValueError, arca.exceptions.ArcaException

An exception for all cases of misconfiguration.

exception arca.exceptions.BuildError(*args, extra_info=None, **kwargs)

Bases: arca.exceptions.ArcaException

Raised if the task fails.

extra_info = None

Extra information what failed

exception arca.exceptions.BuildTimeoutError(*args, extra_info=None, **kwargs)

Bases: arca.exceptions.BuildError

Raised if the task takes too long.

exception arca.exceptions.FileOutOfRangeError

Bases: ValueError, arca.exceptions.ArcaException

Raised if relative_path in Arca.static_filename leads outside the repository.

exception arca.exceptions.PullError

Bases: arca.exceptions.ArcaException

Raised if repository can’t be cloned or pulled.

exception arca.exceptions.PushToRegistryError(*args, full_output=None, **kwargs)

Bases: arca.exceptions.ArcaException

Raised if pushing of images to Docker registry in DockerBackend fails.

full_output = None

Full output of the push command

exception arca.exceptions.RequirementsMismatch(*args, diff=None, **kwargs)

Bases: ValueError, arca.exceptions.ArcaException

Raised if the target repository has extra requirements compared to the current environment if the requirements_strategy of CurrentEnvironmentBackend is set to arca.backends.RequirementsStrategy.raise.

diff = None

The extra requirements

exception arca.exceptions.TaskMisconfigured

Bases: ValueError, arca.exceptions.ArcaException

Raised if Task is incorrectly defined.

Utils

class arca.utils.LazySettingProperty(*, key=None, default=NOT_SET, convert: Callable = None)

For defining properties for the Arca class and for the backends. The property is evaluated lazily when accessed, getting the value from settings using the instances method get_setting. The property can be overridden by the constructor.

exception SettingsNotReady
class arca.utils.NotSet

For default values which can’t be None.

class arca.utils.Settings(data: Optional[Dict[str, Any]] = None)

A class for handling Arca settings.

get(*keys, default: Any = NOT_SET) → Any

Returns values from the settings in the order of keys, the first value encountered is used.

Example:

>>> settings = Settings({"ARCA_ONE": 1, "ARCA_TWO": 2})
>>> settings.get("one")
1
>>> settings.get("one", "two")
1
>>> settings.get("two", "one")
2
>>> settings.get("three", "one")
1
>>> settings.get("three", default=3)
3
>>> settings.get("three")
Traceback (most recent call last):
...
KeyError:
Parameters:
  • keys – One or more keys to get from settings. If multiple keys are provided, the value of the first key that has a value is returned.
  • default – If none of the options aren’t set, return this value.
Returns:

A value from the settings or the default.

Raises:
  • ValueError – If no keys are provided.
  • KeyError – If none of the keys are set and no default is provided.
arca.utils.get_hash_for_file(repo: git.repo.base.Repo, path: Union[str, pathlib.Path]) → str

Returns the hash for the specified path.

Equivalent to git rev-parse HEAD:X

Parameters:
  • repo – The repo to check in
  • path – The path to a file or folder to get hash for
Returns:

The hash

arca.utils.get_last_commit_modifying_files(repo: git.repo.base.Repo, *files) → str

Returns the hash of the last commit which modified some of the files (or files in those folders).

Parameters:
  • repo – The repo to check in.
  • files – List of files to check
Returns:

Commit hash.

arca.utils.is_dirty(repo: git.repo.base.Repo) → bool

Returns if the repo has been modified (including untracked files).

arca.utils.load_class(location: str) → type

Loads a class from a string and returns it.

>>> from arca.utils import load_class
>>> load_class("arca.backend.BaseBackend")
<class 'arca.backend.base.BaseBackend'>
Raises:ArcaMisconfigured – If the class can’t be loaded.