How to Set Up a Raspberry Pi running Ubuntu 20.04 for Archiving

There are certain videos and podcasts that I do not have the chance to follow on a regular basis.  I like having the ability to go through local files and watching or listening to them when I am ready.

We will be configuring Pony - one of the Ants - to do the following tasks:
- running a cron job to execute a archive script
- use youtube-dl to archive a list of videos or audio
- configure a hard drive for additional storage
- configure a samba service to share it on a network

This will leave us with a hard drive full of video and audio clips based on supplied urls.

Setting Up Youtube-dl

First we will be installing youtube-dl.  It will require Python 2.6, 2.7 or 3.2+ in order to work.

## Download the lastest youtube-dl
wget https://youtube-dl.org/downloads/latest/youtube-dl

## Install ffmpeg, this is for media conversion
sudo apt-get install ffmpeg

## Move the youtube-dl executable to the local bin folder
sudo mv youtube-dl /usr/local/bin/

## Change permissions for youtube-dl in local bin folder
chmod 755 /usr/local/bin/youtube-dl
Please be aware, youtube-dl may not work if it is not update to date. Please ensure it is always up to date.

You can test the functionality by typing this:

youtube-dl --version
2020.07.28 ## Successful Output

## If you get this error, you may have 20.04 installed.
/usr/bin/env: ‘python’: No such file or directory

## In order to fix the above error, you can install the following package:
sudo apt-get install python-is-python3
## This will basically install an alias to execute "python" using "python3"

At this point, youtube-dl should be successfully installed and working on your machine.

Configuring The Additional Space

We will now set up the hard drive for additional storage space.  In this case we are using a brand new 4tb desktop drive in an external enclosure.  This drive will also require a partition creation; that will be taken care of below also.

Let's find out which disk we need to use.

## Get a listing of the disks
sudo fdisk -l

## I removed the other outputs, but you get the idea.
## Notice the disk you are looking for.
Disk /dev/sdb: 3.65 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: 004-2CV104
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Once you are sure you have the correct disk, take note of the mounting point.  In this case, we are using the _ /dev/sdb _ disk.  Now to create the partition table and partition.

## Run fdisk with the parameter of the select disk
sudo fdisk /dev/sdb

## Execute 'g' to create a new GPT partition
Command (m for help): g
Created a new GPT disklabel (GUID: 81032A18-E282-C041-XXXX-XXXXXXXXXXXXX).

## Create the Partition. In this case, we want to use the entire disk
Command (m for help): n
Partition number (1-128, default 1): 1
First sector (2048-7814037134, default 2048): 2048
Last sector, +/-sectors or +/-size{K,M,G,T,P} (2048-7814037134, default 7814037134):

Created a new partition 1 of type 'Linux filesystem' and of size 3.7 TiB.

## Use 'p' to verify the creation of the new partition
Command (m for help): p
Disk /dev/sdb: 3.65 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: 004-2CV104
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 81032A18-E282-C041-81DC-A8E27BA63CAF

Device Start End Sectors Size Type
/dev/sdb1 2048 7814037134 7814035087 3.7T Linux filesystem

## Now write the changes with the 'w' command
Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

Now that the partition has been created, it needs to be formatted.

sudo mkfs.ext4 -F /dev/sdb1
mke2fs 1.45.5 (07-Jan-2020)
Creating filesystem with 976754385 4k blocks and 244195328 inodes
Filesystem UUID: 6b77145e-bb34-4b6f-acd5-b89abd60e8fa
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
        102400000, 214990848, 512000000, 550731776, 644972544

Allocating group tables: done
Writing inode tables: done
Creating journal (262144 blocks): done
Writing superblocks and filesystem accounting information: done

Create a directory to mount the disk to.  I will be making a directory, in my home directory called 'desktop'.

## Ensure I am in the home directory
cd ~
mkdir desktop

In order to have the disk auto mount, I will be modifying the fstab file and mounting the drive to the newly created directory.

## use your favourite text editor to modify the fstab file
## Add this line to the bottom on the file.
/dev/sdb1 /home/ubuntu/desktop ext4 defaults 0 0
Configure to line to your personal parameters

Reboot the machine to confirm your changes.  Another warning, be careful when modifying the fstab file.  If you do make a mistake you can encounter boot issues.  So always backup your changes and double check your entries.

Once the system has rebooted, you can do the following commands verify your changes.

df -k
Filesystem 1K-blocks Used Available Use% Mounted on
## Remove the others
/dev/sdb1 3844640564 90140 3649183164 1% /home/ubuntu/desktop

Creating a Cron Job to Automate the Downloads

Using a basic script and the crontab we can set up an automated downloader.

Firstly, you will notice that the owner of the "desktop" directory is root.  This is personal what I want in case I decide to modify the permissions or users that utilize the drive.  The fstab file can be modified if you do want the user to be the owner of the drive.  This post is a good start for that type of configuration.  

Confirmation of the ownership of the directory below.

ls -ltr
total 16
drwxrwxr-x 2 ubuntu ubuntu 4096 Jul 23 18:14 portable
drwxr-xr-x 3 root root 4096 Aug 17 02:30 desktop

This means we will have to create the directories prior to execute the scripts, unless you want the scripts to be executed by root; which we will not be doing.    

Create a directory in "desktop" and change the ownership of the new directory.

## go to desktop directory
cd desktop

## make the video directory
sudo mkdir video

## make the log directory - will get to that later
sudo mkdir log

## change ownership of the new directory - ubuntu is the username
sudo chown ubuntu:ubuntu video

## confirm the changes.
ls -ltr
total 12
drwxr-xr-x 2 ubuntu ubuntu 4096 Aug 17 17:49 video
drwxr-xr-x 4 ubuntu ubuntu 4096 Aug 17 17:50 log

We are going to set up the crontab to run the script once per week, throughout the night.  We are going to make the crontab run through the local user.

## Open the crontab for your user
crontab -u ubuntu -e

## Add the below line to the crontab
3 3 * * 4 /bin/bash /home/ubuntu/podcasts-download.sh > /home/ubuntu/desktop/log/podcasts-download.log 2>&1

The script will execute at 3:03 on the 4th day of the week. 

Let me explain the crontab configuration:
[3 3 * * 4] - this is the schedule; when to execute the script
[/bin/bash] - this is the program to run
[/home/ubuntu/podcasts-download.sh] - this is what the "program" runs
[>] - this is redirecting and overwriting the output to the next parameter
[/home/ubuntu/desktop/log/podcasts-download.log] - this is the file receiving the output of the execute command
[2>&1] - this is a redirection of stderr to stdout.

In summary, once a week the script will execute and replace the output to the log file.  

Build the Script!

We will create a basic script run by the cron job.  It will make a directory, enter the directory, run the youtube-dl command and repeat.

First, we will need the PATH parameter in the script, so the job knows where to run the executable from. (I did this the lazy way, you can improve it if you like - I may down the road).

## type $PATH to get the parameters
$PATH
-bash: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin

## Copy the output and add them to the top of your script.

Once you have obtained the $PATH information.  Add it to the top of your script in place of the data below.  This is a sample of the script you can use to archive podcasts.

PATH=$PATH:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
export $PATH

# <PODCAST-NAME>
echo "Moving to <PODCAST> directory..."
mkdir -p "/home/ubuntu/desktop/Archive/Podcasts/<PODCAST-NAME>"
cd "/home/ubuntu/desktop/Archive/Podcasts/<PODCAST-NAME>"
echo "Getting <PODCAST>..."
youtube-dl -x --audio-format mp3 <PODCAST URL>
echo "<PODCAST> Updated."

There are changes that have to be made to the above script in order for it to be functional:
- \<PODCAST-NAME> should be replaced with a no space, no special character version of the name; just to limit chance of error down the road.  Ex. Sean's Podcast should be seans-podcast.
- \<PODCAST> is just the name of the podcast; I would still avoid special characters - specifically the double quotation.
- \<PODCAST-URL> throw the URL you are looking to archive here.

Again, since this is a sample script, please ensure you do make all the correct modifications for it to function.  If you are looking to archive something other than an mp3, or your directory structure is different, these are obvious changes that have to be made.  

At this point, you have a scheduled job and a script to execute.  For the first time, manually execute the script in order to both test it and start the initial download.

## Go to the directory where the script is and bash it!
bash podcast-download.sh

You should start to see some words fly by!  This could take some time.  Watch it in the background and open up a new terminal session to continue on.

Using Samba Service to Share Content

We will set up a basic samba share where anyone can access the share with read only permissions.  Since I am using Ubuntu, I will be installing samba via the operating system PPA via APT.

## Do the typical update/upgrade and then install the Samba package
sudo apt-get install samba

Now we have to configure the permissions and the share.

## Navigate to the samba directory
cd /etc/samba/

## Always good to make a backup :)
sudo cp smb.conf smb.conf.bk
sudo nano smb.conf

## Now edit the file with the share information at the bottom!

[Archive]
comment = Archive
path = /home/ubuntu/desktop/
browseable = yes
read only = yes
guest ok = yes

## Save and Exit

Next, you can simply map the drive based on the configurations.

Setting Up a Mapped Network Drive

Any sort of firewall rules will have to be in place prior to mapping the drive.  Once it is successfully completed you should have a lovely new drive like below.

Nice New Network Drive

Conclusion

Once all this is complete, you will have an auto updating drive of information!  With some tweaks you can have it updating daily, weekly, monthly; whatever you desire.  No matter what way you look at it, you can now have easily accessible local backups of your favourite internet content!