Back Original

Storing data with DVC and OneDrive

This post is part of the Reproducible Research series.

Usage notes for data storage within university constraints.

Background

The University of Iceland has an “unlimited” OneDrive setup for staff / graduate students. This is generally not very helpful, but in conjunction with rclone 1 and dvc can be made to be of more use.

Baseline

Like most of my newer projects, pixi is used to manage the tooling required, along with git-subrepo for handling local changes to projects.

As before, we will assume gnome-keyring is present, and then since rclone supports secrets stored in password managers we have2:

1secret-tool store --label="dvc" service "rclone" project "gprd_dimer"

Which can then simply be set to RCLONE_PASSWORD_COMMAND and used directly:

1export RCLONE_PASSWORD_COMMAND='secret-tool lookup service "rclone" project "gprd_dimer"'
2rclone config --ask-password=false

We don’t want duplication. So there has to be a question of if we want to use the artifact uploaded directly to an S3 instance or if we want to use local and let DVC handle putting it on a bucket if need be.

Rather than use the local NAS approach of letting DVC take an absolute path to an rclone mount point, it is a bit nicer to use an rclone server WebDAV instance instead.

1pixi add dvc-webdav
2# In one terminal
3rclone serve webdav HIOneDrive:/.dvcstore --vfs-cache-mode full --addr unix:///tmp/dvc.hi.socket
4# In another
5dvc remote add -d HIOneDrive webdavs:///tmp/dvc.hi.socket

Where we assume that rclone is configured as per the OneDrive instructions. Since there are no sensitive details in any of the above commands, it is alright to have DVC store this configuration and track it with git.

Handling SSL errors

Annoyingly, it turns out that even with ssl_verify false, an SSL record layer error will be thrown if the WebDAV instance doesn’t offer SSL.

1dvc remote modify hionedrive ssl_verify false
2dvc push
3Collecting                                                                                                                                                            |0.00 [00:00,    ?entry/s]
4Pushing
5ERROR: unexpected error - [SSL] record layer failure (_ssl.c:1000): [SSL] record layer failure (_ssl.c:1000)

Which means that we need to setup a key pair, with the defaults, which is OK since it will only be used locally:

1# Keep hitting enter to accept the defaults
2openssl ecparam -out ec_key.pem -name prime256v1 -genkey
3openssl req -new -key ec_key.pem -x509 -nodes -days 365 -out cert.pem

This can now be used to serve the WebDAV instance:

1rclone serve webdav HIOneDrive:/.dvcstore --vfs-cache-mode full --addr localhost:9677 --cert cert.pem --key ec_key.pem

It is important to note that DVC doesn’t care about the exact key-pair, and with the current setup browsers will (correctly) raise net::ERR_CERT_AUTHORITY_INVALID warnings 3 but will now merrily accept the connection (since ssl_verify is still off):

1dvc push
2Collecting                                                          |0.00 [00:00,    ?entry/s]
3Pushing
4196 files pushed

Other dvc idiosyncracies

Conclusions

This isn’t necessarily the most robust setup as written, given that the University reclaims access pretty aggressively, however when coupled with the Materials Archive and a strong compression technique for personal storage via borg it is fairly resilient.


Series info

Reproducible Research series

  1. Storing data with DVC and OneDrive <-- You are here!