Migrating My Repositories From Self Hosted GitLab to Self Hosted Gitea

GitLab has been a great source code management (SCM) tool over the past four years. I have expended it beyond simple just repositories, however, it just hasn't stuck. Three years ago, I had migrated from Gitea to GitLab simply because I wanted a tool with a built in container registry, which - at the time - Gitea did not have. That has changed. Gitea also has, Gitea actions and plans to become federated. GitLab has also started to feel a little clunky. The one thing I recall about Gitea was how quick it was. It feels like now is a good time to go back. This is how I did it.

One by One Repository Migration

This was actually what I was hoping to do, however, it does not sound like it will be possible. I will not be investigating it deeply. From what I have read, there are some issues with https to http redirection issue when it comes to payloads and how Gitea does it. With multiple proxies and a kuberneters configuration, this isn't somehting I want to play with. Next plan.

Mass Migration Using Six Year Old Script

I have come across the up2early/MigrateGitlabToGogs repository; the migration tool recommended by Gitea when migrating from GitLab to Gitea. It is six years old, but, a plus is it doesn't have any issues listed!

The utility migrates repositories one by one. Let's start by writing a script to automate the ... script.

OK - long story short, I bailed. This script seemed to execute migration operation similarily to how the build in migration tool works. The results were returning null.

Time to pivot to automating a manual task.

Mass Migration Through Cloning and Pushing

As I manually work through what all the steps are going to be, I have come up with this list of tasks.

  1. Find all the repositories
  2. Loop through the repositories
  3. Clone the individual repository
  4. Change directory and move into the repository folder
  5. Remove the original origin
  6. Add the new origin
  7. Push the code base to the "new" origin
  8. Remove the local code base

Here are the limitations doing it this way for those that are trying.

There are two required steps to automate this.

Need to Enable git-to-create?!

Using this method and gitea, we are going to be pushing to create repositories. For security, this might be a function that is best temporary enabled for the migration.

Below is the environment variable that has to be set. Below are the settings for kubernetes, modify those values for whatever configuration you require.

  - name: GITEA__REPOSITORY__ENABLE_PUSH_CREATE_USER
    value: "true"

Save Local Credentials

Note: This may not be the best practice for all use cases; do with caution. You are storing credentials.

After you have logged in once, you should be able to store your credentials.

git config --global credential.helper store

You can confirm if they are stored by looking at the git-credneitals file.

more ~/.git-credentials

What Do All These Steps Look Like Manually?

If we break down the above steps one by one, this is what the commands will look like using an example repository.

git clone https://gitlab.snld.ca/seanland/example.git
cd example
git remote remove origin
git remote add origin https://git.snld.ca/seanland/example.git
# We have to figure out what branch to use
git branch --show-current
# Default will always be private, just in case - repositories can be made public after
git push -o repo.private=true -u origin master
cd ..
# Remove the directory (this may be required to have elevated permissions)
sudo rm -r example

Let's Find a Way to Automate This

I have decided to do this in two steps:

  1. Get a list of all the repositories in GitLab. Base code was AI generated.
GITLAB_BASE_URL = """"
PERSONAL_ACCESS_TOKEN = ""

# Define the headers for authentication
headers = {
    "Private-Token": PERSONAL_ACCESS_TOKEN
}

# Function to fetch project data from GitLab
def fetch_projects():
    projects = []
    page = 1

    while True:
        response = requests.get(f"{GITLAB_BASE_URL}/projects", headers=headers, params={"per_page": 100, "page": page})
        if response.status_code != 200:
            print(f"Error fetching data: {response.status_code}, {response.text}")
            break

        data = response.json()
        if not data:
            break

        projects.extend(data)
        page += 1

    return projects

# Function to write repository URLs to a file
def write_repo_urls_to_file(projects, filename="repositories.txt"):
    with open(filename, "w") as file:
        for project in projects:
            file.write(f"{project['http_url_to_repo']}\n")
    print(f"Repository URLs have been written to {filename}")

# Main execution
if __name__ == "__main__":
    projects = fetch_projects()
    write_repo_urls_to_file(projects)
  1. We run the initial steps against the list of repositories. You will also need to change the 'new_origin' to appropriately fit your needs. Base code also generated by AI. I am not claiming to have written this from scratch. I was surprised when they both mostly worked.
import subprocess
import os

def run_command(command):
    """Runs a shell command and returns the output."""
    result = subprocess.run(command, shell=True, text=True, capture_output=True)
    if result.returncode != 0:
        print(f"Error executing command: {command}\n{result.stderr}")
        return None
    return result.stdout.strip()

def process_repository(url):
    # Extract repository name from URL
    repo_name = url.split('/')[-1].replace('.git', '')

    # Clone the repository
    print(f"Cloning {url}...")
    run_command(f"git clone {url}")

    if os.path.exists(repo_name):
        os.chdir(repo_name)

        # Remove old origin and add new one
        print("Updating remote origin...")
        run_command("git remote remove origin")

        # Replace 'gitlab' with 'git' in the URL to form the new origin URL
        new_origin = url.replace("gitlab.snld.ca", "git.snld.ca")
        #run_command(f"git remote add origin {new_origin}")

        # Determine the current branch
        print("Determining the current branch...")
        current_branch = run_command("git branch --show-current")
        if not current_branch:
            print("Unable to determine the current branch. Skipping push operation.")
        else:
            # Push to the new remote, assuming 'master' is the main branch
            print(f"Pushing to {new_origin} on branch {current_branch}...")
            #run_command(f"git push -o repo.private=true -u origin {current_branch}")

        # Move back to the base directory and remove the cloned repo
        os.chdir("..")
        #print(f"Removing the directory {repo_name}...")
        #run_command(f"sudo rm -r {repo_name}")

def main():
    # Path to the text file containing the list of URLs
    file_path = 'repositories.txt'  # Make sure to set the correct path to your file

    try:
        with open(file_path, 'r') as file:
            urls = file.readlines()
            for url in urls:
                url = url.strip()  # Remove newline characters
                if url:  # Make sure it's not an empty line
                    process_repository(url)
    except FileNotFoundError:
        print(f"File not found: {file_path}")
    except Exception as e:
        print(f"An error occurred: {e}")

if __name__ == "__main__":
    main()

What's Left?

Setting GitLab Repositories to Archived

Run a script to archive all of the projects in the GitLab instance. Again, whipped some words into an LLM. This is isn't a learning to program task, more a method to solve a problem.

import requests

# Configuration
base_url = ""
access_token = ""

# Headers for the API request
headers = {
    'PRIVATE-TOKEN': access_token
}

# Function to list all projects
def list_all_projects():
    projects = []
    page = 1
    per_page = 100  # Number of projects per page

    while True:
        response = requests.get(
            f"{base_url}/projects",
            headers=headers,
            params={'per_page': per_page, 'page': page}
        )
        response.raise_for_status()  # Raises an error for bad responses

        current_projects = response.json()
        if not current_projects:
            break

        projects.extend(current_projects)
        page += 1

    return projects

# Function to archive a project
def archive_project(project_id):
    response = requests.post(
        f"{base_url}/projects/{project_id}/archive",
        headers=headers
    )
    response.raise_for_status()
    if response.status_code == 201:
        print(f"Project ID {project_id} archived successfully.")
    else:
        print(f"Failed to archive project ID {project_id}.")

def main():
    projects = list_all_projects()
    print(f"Found {len(projects)} projects.")

    for project in projects:
        print(f"Archiving project: {project['name']} (ID: {project['id']})")
        archive_project(project['id'])

if __name__ == "__main__":
    main()

Now, you should be unable to push new code to all the repositories!

In Summary

We have three different scripts.

  1. One to list all of your repositories
  2. The bait and switch script; the clone n' push.
  3. The sleeper: archiving all of your repositories

It is by no means perfect, but, it's more than I had at my disposal. It also meets all the requirements for my goals. I can still begin the migration off of Jenkins since I have completed the migration off of GitLab.