The Repo Process

Introduction

What is this?

Illustration of an exemplary repo structure:

User home folder
└── Documents
    ├── main-repo
    │   ├── sub-repo-1
    │   └── sub-repo-2
    ├── peer-repo-1
    └── peer-repo-2

Repo process (repo.process is the name in the shell) is a command, that enables one to iterate over a set of repos, in order to interact with them as one. The iteration process accomplishes this, by generating scripts, configs or similar in order to enable other software to work on the repos like one.

This is mainly done, by providing a program, that takes a root repo and a command pattern as an argument. Repo process iterates over all sub repositories, generates a command via the given pattern for each repo and executes the generated command for each sub repo.

The repos are assumed to be organized in a tree structure. Therefore, the root repo (here called meta repo) contains the sub repos, but usually does not provide version control of the sub repos (for Git repos the subs are generally listed in .gitignore). By default every meta repo has a list of all subs stored in a file under version control.

In order to make this VCS independent, additional repo.* commands are provided, that are built on top of repo process. For instance, repo.commit.all is defined to commit all file changes of the current repo and its sub.

Here the initial blog article, explaining the reasoning behind repo process, can be found.

TODO: The program supports a dry mode, where an SH script is generated, that contains all commands needed in order to achieve a certain goal.

Implementation Details

Every abstraction leaks.

The repo process is shell centric. Every interaction with the VCS directly and many other interactions are done via shell calls. This makes it possible, to replace every command as one likes to.

All actions done via repo process and co, is done on the current branch by default, in order to simplify things for now.

Repo process and its additional commands are intended to be installed via command.managed.install. (see install instructions for details). In order to have default Git implementations for the abstract commands (repo.*), Os State Interface lib GPL 2 has to be installed as well via command.repository.register and user.bin.configure.sh.

Alternatively, repo process commands can be used directly, as these are made without additional dependencies. Therefore, only Python 3 is required. Just make sure, that every required command is present in the environment path and that the file suffixes are trimmed.

Recommended Repo Organization

Support processing a tree of repositories (meta repo) and therefore allow working on all repos as one (i.e. in order to backup everything).

The following tree structure is recommended for the meta repo, in order to maximize the adaptability of the meta repo, while still keeping a relatively simple folder structure: The tree should only have 3 levels of root folders, that are processed by this. The first level consists of one folder and is the root of the meta repo.

The second level splits the repositories into organisational units like private and public repositories. A minimal number of second level repositories is recommended in order to ease administration. If there is no need for such organization, the second level may be omitted.

The third level contains the roots of all repos containing the actual data. There should be no repository roots of higher levels, except if it is managed by the backend (i.e. git submodules). Only third level repositories should be assumed to be fully publicly portable, because a flat meta repo structure is easiest to support by hosting platforms (i.e. Github, Gitlab, sourcehut etc).

The first and second level repositories are only used in order to organize third level repositories by the user. They are portable, but generally it is harder to migrate these to an other platforms.

It is encouraged to use globally unique names for each repo, in order to be able to minimize the number of second level repositories. Java package name convention is a good start for that.

Alternatives

Of course, this is a not invented here syndrome.

Of course, there is similar software, but before the software was created, fitting alternatives, that provided following functionality, were not found. Keep in mind, that it could very well be the case, that the missing functionality could have been available in alternative software, by using them in creative ways. Unfortunately, these ways may have been overlooked.

  1. Easy switching between different remotes.
  2. Easy way of nesting meta repos, so it's easily and safely possible to use this process in order to synchronize public sub repos with public servers and without risking publishing private repos by accident. Simultaneously, synchronization of private and public repos with private servers should be easy and consist of only one manual step, in order to minimize user error. In other words, synchronization with private servers should be simple, while synchronization with public servers should not endanger private repos.
  3. Support different implementations for common tasks, as there can always be important details, that need to be considered. Implementations should be easily changed, because adapting all existing synchronization scripts can be hard.
  4. Ensure that it is easy to migrate from the chosen system to another one, by making the software simple and replaceable. There is no guarantee, that git will be widely available in 30 years or that there will not be a standard for managing multiple repos in the future.

Here are some alternatives. Some of them are viable and some not:

GRM — Git Repository Manager

This software is the closest thing, to feature parity, when looking at the requirements. It was implemented a few years after the repo process and was therefore not considered at the time of creating the process.

It can search for repos on remotes and in the local file system, although, nested repos are not explicitly supported yet.

For the time being the following command can create a config file of a meta repo for GRM. One has to keep in mind, that in the output every instance of trees:, except the very first one, has to be removed, because the output of the command is the concatenation of the config files for each git repo.

repo.process --command='grm repos find local $(pwd) --format yaml' > ../config.yml

Javascript Project Meta

The Javascript project meta seems to be a similar tool.

The major downside of this is, that no repo nesting is explicitly supported. Nesting could be achieved, by creating a dedicated config file for each level and sub meta repo. I also think, that complete support for each remote, would have to be implemented, by creating a new config file for each meta repo and each remote.

In other words, in order to use this software for nested repos with multiple remotes, one would probably have to create config files on the fly. This probably could be done via repo process, so future compatibility is possible.

Google's Mono Repo

Google uses a single repository for most of its source code, but it is not available to the public, as I understand it.

Microsoft's Big Repo(s)

Microsoft seems to use a large monorepo and has a special tool for that. VGS for Git was the first version and seems to be deprecated. It is replaced by Scalar, which offers a similar functionality with a different technical approach.


  • SPDX-License-Identifier: EPL-2.0 OR GPL-2.0-or-later
  • SPDX-FileCopyrightText: Contributors To The net.splitcells.* Projects