Migrating My Repositories From Self Hosted GitLab to Self Hosted Gitea
GitLab has been a great source code management (SCM) tool over the past four years. I have expended it beyond simple just repositories, however, it just hasn't stuck. Three years ago, I had migrated from Gitea to GitLab simply because I wanted a tool with a built in container registry, which - at the time - Gitea did not have. That has changed. Gitea also has, Gitea actions and plans to become federated. GitLab has also started to feel a little clunky. The one thing I recall about Gitea was how quick it was. It feels like now is a good time to go back. This is how I did it.
One by One Repository Migration
This was actually what I was hoping to do, however, it does not sound like it will be possible. I will not be investigating it deeply. From what I have read, there are some issues with https to http redirection issue when it comes to payloads and how Gitea does it. With multiple proxies and a kuberneters configuration, this isn't somehting I want to play with. Next plan.
Mass Migration Using Six Year Old Script
I have come across the up2early/MigrateGitlabToGogs repository; the migration tool recommended by Gitea when migrating from GitLab to Gitea. It is six years old, but, a plus is it doesn't have any issues listed!
The utility migrates repositories one by one. Let's start by writing a script to automate the ... script.
OK - long story short, I bailed. This script seemed to execute migration operation similarily to how the build in migration tool works. The results were returning null.
Time to pivot to automating a manual task.
Mass Migration Through Cloning and Pushing
As I manually work through what all the steps are going to be, I have come up with this list of tasks.
- Find all the repositories
- Loop through the repositories
- Clone the individual repository
- Change directory and move into the repository folder
- Remove the original origin
- Add the new origin
- Push the code base to the "new" origin
- Remove the local code base
Here are the limitations doing it this way for those that are trying.
- Obviously, well, hopefully obviously - you can only do this with repositories you have access (token access to) to
- This is only going to bring over the selected branches (I am doing only the default)
- It will also not work for repositories in an organization
- You will lose everything not in the repository. This is fine for me, might not be for you!
There are two required steps to automate this.
Need to Enable git-to-create?!
Using this method and gitea, we are going to be pushing to create repositories. For security, this might be a function that is best temporary enabled for the migration.
Below is the environment variable that has to be set. Below are the settings for kubernetes, modify those values for whatever configuration you require.
- name: GITEA__REPOSITORY__ENABLE_PUSH_CREATE_USER
value: "true"
Save Local Credentials
Note: This may not be the best practice for all use cases; do with caution. You are storing credentials.
After you have logged in once, you should be able to store your credentials.
git config --global credential.helper store
You can confirm if they are stored by looking at the git-credneitals file.
more ~/.git-credentials
What Do All These Steps Look Like Manually?
If we break down the above steps one by one, this is what the commands will look like using an example repository.
git clone https://gitlab.snld.ca/seanland/example.git
cd example
git remote remove origin
git remote add origin https://git.snld.ca/seanland/example.git
# We have to figure out what branch to use
git branch --show-current
# Default will always be private, just in case - repositories can be made public after
git push -o repo.private=true -u origin master
cd ..
# Remove the directory (this may be required to have elevated permissions)
sudo rm -r example
Let's Find a Way to Automate This
I have decided to do this in two steps:
- Get a list of all the repositories in GitLab. Base code was AI generated.
GITLAB_BASE_URL = """"
PERSONAL_ACCESS_TOKEN = ""
# Define the headers for authentication
headers = {
"Private-Token": PERSONAL_ACCESS_TOKEN
}
# Function to fetch project data from GitLab
def fetch_projects():
projects = []
page = 1
while True:
response = requests.get(f"{GITLAB_BASE_URL}/projects", headers=headers, params={"per_page": 100, "page": page})
if response.status_code != 200:
print(f"Error fetching data: {response.status_code}, {response.text}")
break
data = response.json()
if not data:
break
projects.extend(data)
page += 1
return projects
# Function to write repository URLs to a file
def write_repo_urls_to_file(projects, filename="repositories.txt"):
with open(filename, "w") as file:
for project in projects:
file.write(f"{project['http_url_to_repo']}\n")
print(f"Repository URLs have been written to {filename}")
# Main execution
if __name__ == "__main__":
projects = fetch_projects()
write_repo_urls_to_file(projects)
- We run the initial steps against the list of repositories. You will also need to change the 'new_origin' to appropriately fit your needs. Base code also generated by AI. I am not claiming to have written this from scratch. I was surprised when they both mostly worked.
import subprocess
import os
def run_command(command):
"""Runs a shell command and returns the output."""
result = subprocess.run(command, shell=True, text=True, capture_output=True)
if result.returncode != 0:
print(f"Error executing command: {command}\n{result.stderr}")
return None
return result.stdout.strip()
def process_repository(url):
# Extract repository name from URL
repo_name = url.split('/')[-1].replace('.git', '')
# Clone the repository
print(f"Cloning {url}...")
run_command(f"git clone {url}")
if os.path.exists(repo_name):
os.chdir(repo_name)
# Remove old origin and add new one
print("Updating remote origin...")
run_command("git remote remove origin")
# Replace 'gitlab' with 'git' in the URL to form the new origin URL
new_origin = url.replace("gitlab.snld.ca", "git.snld.ca")
#run_command(f"git remote add origin {new_origin}")
# Determine the current branch
print("Determining the current branch...")
current_branch = run_command("git branch --show-current")
if not current_branch:
print("Unable to determine the current branch. Skipping push operation.")
else:
# Push to the new remote, assuming 'master' is the main branch
print(f"Pushing to {new_origin} on branch {current_branch}...")
#run_command(f"git push -o repo.private=true -u origin {current_branch}")
# Move back to the base directory and remove the cloned repo
os.chdir("..")
#print(f"Removing the directory {repo_name}...")
#run_command(f"sudo rm -r {repo_name}")
def main():
# Path to the text file containing the list of URLs
file_path = 'repositories.txt' # Make sure to set the correct path to your file
try:
with open(file_path, 'r') as file:
urls = file.readlines()
for url in urls:
url = url.strip() # Remove newline characters
if url: # Make sure it's not an empty line
process_repository(url)
except FileNotFoundError:
print(f"File not found: {file_path}")
except Exception as e:
print(f"An error occurred: {e}")
if __name__ == "__main__":
main()
What's Left?
- I had to manually migrate an Organization. There were only nine repositories, so I simply used the script to clone the repositories and remove the origin. I, then, manually added the new origin and pushed the repositories to Gitea.
- I need to add context to all the repositories: Descriptions, Issues (if I want), Wiki (if I want) and Gitea actions (which I will be doing!)
- Set the GitLab repositories to Archived so I do not accidentally push code to them.
Setting GitLab Repositories to Archived
Run a script to archive all of the projects in the GitLab instance. Again, whipped some words into an LLM. This is isn't a learning to program task, more a method to solve a problem.
import requests
# Configuration
base_url = ""
access_token = ""
# Headers for the API request
headers = {
'PRIVATE-TOKEN': access_token
}
# Function to list all projects
def list_all_projects():
projects = []
page = 1
per_page = 100 # Number of projects per page
while True:
response = requests.get(
f"{base_url}/projects",
headers=headers,
params={'per_page': per_page, 'page': page}
)
response.raise_for_status() # Raises an error for bad responses
current_projects = response.json()
if not current_projects:
break
projects.extend(current_projects)
page += 1
return projects
# Function to archive a project
def archive_project(project_id):
response = requests.post(
f"{base_url}/projects/{project_id}/archive",
headers=headers
)
response.raise_for_status()
if response.status_code == 201:
print(f"Project ID {project_id} archived successfully.")
else:
print(f"Failed to archive project ID {project_id}.")
def main():
projects = list_all_projects()
print(f"Found {len(projects)} projects.")
for project in projects:
print(f"Archiving project: {project['name']} (ID: {project['id']})")
archive_project(project['id'])
if __name__ == "__main__":
main()
Now, you should be unable to push new code to all the repositories!
In Summary
We have three different scripts.
- One to list all of your repositories
- The bait and switch script; the clone n' push.
- The sleeper: archiving all of your repositories
It is by no means perfect, but, it's more than I had at my disposal. It also meets all the requirements for my goals. I can still begin the migration off of Jenkins since I have completed the migration off of GitLab.