This post is part of the Reproducible Research series.
Usage notes for data storage within university constraints.
The University of Iceland has an “unlimited” OneDrive setup for staff / graduate
students. This is generally not very helpful, but in conjunction with rclone
1 and
dvc
can be made to be of more use.
Like most of my newer projects, pixi
is used to manage the tooling required,
along with git-subrepo
for handling local changes to projects.
As before, we will assume gnome-keyring
is present, and then since rclone
supports secrets stored in password managers we have2:
1secret-tool store --label="dvc" service "rclone" project "gprd_dimer"
Which can then simply be set to RCLONE_PASSWORD_COMMAND
and used directly:
1export RCLONE_PASSWORD_COMMAND='secret-tool lookup service "rclone" project "gprd_dimer"'
2rclone config --ask-password=false
We don’t want duplication. So there has to be a question of if we want to use the artifact uploaded directly to an S3 instance or if we want to use local and let DVC handle putting it on a bucket if need be.
Rather than use the local NAS approach of letting DVC take an absolute path to
an rclone mount
point, it is a bit nicer to use an rclone server
WebDAV
instance instead.
1pixi add dvc-webdav
2# In one terminal
3rclone serve webdav HIOneDrive:/.dvcstore --vfs-cache-mode full --addr unix:///tmp/dvc.hi.socket
4# In another
5dvc remote add -d HIOneDrive webdavs:///tmp/dvc.hi.socket
Where we assume that rclone
is configured as per the OneDrive instructions.
Since there are no sensitive details in any of the above commands, it is alright
to have DVC store this configuration and track it with git
.
Annoyingly, it turns out that even with ssl_verify false
, an SSL record layer
error will be thrown if the WebDAV instance doesn’t offer SSL.
1dvc remote modify hionedrive ssl_verify false
2dvc push
3Collecting |0.00 [00:00, ?entry/s]
4Pushing
5ERROR: unexpected error - [SSL] record layer failure (_ssl.c:1000): [SSL] record layer failure (_ssl.c:1000)
Which means that we need to setup a key pair, with the defaults, which is OK since it will only be used locally:
1# Keep hitting enter to accept the defaults
2openssl ecparam -out ec_key.pem -name prime256v1 -genkey
3openssl req -new -key ec_key.pem -x509 -nodes -days 365 -out cert.pem
This can now be used to serve the WebDAV instance:
1rclone serve webdav HIOneDrive:/.dvcstore --vfs-cache-mode full --addr localhost:9677 --cert cert.pem --key ec_key.pem
It is important to note that DVC doesn’t care about the exact key-pair, and with
the current setup browsers will (correctly) raise
net::ERR_CERT_AUTHORITY_INVALID
warnings 3 but will now merrily accept the connection (since ssl_verify
is still off):
1dvc push
2Collecting |0.00 [00:00, ?entry/s]
3Pushing
4196 files pushed
dvc
idiosyncracies.gitignore
, else dvc
may complain that the relevant .dvc
is ignored by git
best_trial_8/
don’t ignore best_trial*
but use best_trial*/
This isn’t necessarily the most robust setup as written, given that the
University reclaims access pretty aggressively, however when coupled with the
Materials Archive and a strong compression technique for personal storage via
borg
it is fairly resilient.