Skip to content

sb-ncbr/mddash-fs-mapper

Repository files navigation

Flask WebDAV filesystem connector

This flask application provides a basic interface between WebDAV on the one side and an arbitrary filesystem interface on the other side.

Compliance

WebDAV implementation is compliant with RFC 4918, with WebDAV compliance class of 1. It implements an Apache's ModDAV strategy for partial file updates using Content-Range headers compliant with RFC 9110.

Supported Filesystems

The operations which are currently supported on the WebDAV side are:

  • file: read, write, (partial) update, delete
  • directory: listings, creation, deletion
  • lock: (not supported at the moment)

Python filesystem

Python filesystem uses a regular built-in filesystem operations supported by Python itself. It is supported for file reads, writes, (partial) updates, deletes, directory listing, creation and deletion. The implementation uses mainly system calls from pathlib library.

Invenio

Invenio data repository software is a repository platform for storing experiment data. It is supported for file reads, writes, (partial) updates, deletes, directory listing, creation and deletion.
The Invenio data repository supports only complete reads and writes, as it uses object based Amazon S3 storage for data storing. Due to this disadvantage there are two additional data abstraction layers used to fulfill WebDAV partial updates:

  • REST data layer for data retrieval and upload
  • Caching layer for temporary data local storage

Note

There will always be a tradeoff between space usage on local device and data transmission to data repository. E.g. in case the machine does not have enough space for storing temporary data, there need to be data transferred more frequently over the network to save local space.

The size of the local data storage (upper limit for space occupation) as well as the time to live for cached data can be modified in the config to create optimal environment for the application.

How to run

There are, in general, two methods to install / deploy the application.

Docker and self-build

To install the service using this method, you must have Docker installed.

First, you must download the repository from GitHub and build a Docker image by yourself

# clone the repository from GitHub
git clone <REPO URL>
# change directory to the cloned folder
cd flask-webdav

# build the image, Docker build tag will be "flask-webdav"
docker build . -t flask-webdav

These steps will create new Docker image with tag flask-webdav.

Warning

The Docker image build builds own Python interpreter .
Be aware, because this uses some amount of data (around 1GB) and takes some time (around 15 minutes) to build!

Create docker compose file docker-compose.yaml and use package flask-webdav.

services:
  flask-webdav:
    image: flask-webdav
    restart: unless-stopped
    environment:
      FLASK_RUN_PORT: 8001
    ports:
      - "8001:8000"

The service will be accessible on port 8000. To change this, the line - "8000:8000" must be changed to

- "8001:<desired port>"

Run the service with

docker compose up

Standalone installation

To install the service using this method, you must have Python, version at least 3.11, installed.

First, you must download the repository from GitHub and create a Python virtual environment

# clone the repository from GitHub
git clone <REPO URL>
# change directory to the cloned folder
cd flask-webdav

# crete virtual environment in the folder "venv"
python -m venv venv
# use the newly created Python virtual environment
source ./venv/bin/activate 
# install the requirements
python -m pip install -r requirements.txt

Then, execute the flask command which will run app.py

# change directory to src
flask --app app run

The service is running on port 8001. To change this, code in app.py must be changed as

app.run(host="0.0.0.0", port=<desired_port>, debug=False)

Warning

Be aware, this method does not use uWSGI nor any other WSGI server. This is not recommended as it can be potential security risk!

Configuration

To configure the service, there are three (two) main files, where you can change the variables to desired values.

config.toml

This file configures the app itself. Each variable has its own description which helps better to understand the variable

uwsgi.ini

This file configures the uWSGI server (only applicable in Docker installation). It is recommended to configure this server to experienced users only.

The regular user can be interested in two of the variables

# number of processes on which the service runs (do not be mistaken with threads)
processes = 4
# Unix domain socket
socket = /tmp/uwsgi.sock

The unix socket can be replaced by http directive which will serve the application on the specific port instead of the Unix socket.

# the local address and port of the service
http = 127.0.0.1:8001

Extensibility

The abstract class AbstractFileSystem is an interface provided for the extension to create different filesystem accesses. It offers various filesystem-like calls to provide the communication, with the mostly used (not an extensive list) open(), close(), read(), write(), seek(), etc.

Supported usage

The application was created with further usage in mind, and it is open to extend this list.

Onedata

Onedata is a data management system with mind on heavy computations and big datasets. It supports huge variety of operations with data which can be suitable for bigger research projects.

Setup

To use this application in the Onedata environment it is needed first to be set up correctly. The setup may vary for Onedata versions (this set-up works for version 21.02). Onedata will use this connector as a Storage backend on Oneprovider. The Storage baceknd can be created by opening Onezone interface, navigating to Clusters, selecting a Oneprovider, navigating to Storage backends, Add storage backend (Onezone -> Clusters -> Storage backends -> Add storage backend). Configuration example:

Key Value Explanation Exemplary values
Type WebDAV WebDAV interface is used for communication WebDAV
Name user defined Name of the Storage backend defined by user, can be anything webdav-backend
Endpoint user defined URL with scheme (http/https), port and path to this connector application https://webdav.sbo.sk:8000/invenio
Credentials user defined TBD TBD
Range write support ModDAV Apache's ModDAV is used for partial reads and writes to the storage ModDAV
Connection pool size 1 Maximum number of parallel connections 1
Timeout [ms] 12 000 000 Number of millisecond until Onedata is willing to wait for response. 12 000 000

After adding a new Storage backend by pressing Add, it can be assigned to a space in the usual way (Space support).

Restrictions

For the most optimal use, a few restrictions need to be taken care of.

  • Connection pool size: At the time, maximum of only one parallel connection is supported for the usage with the connector. It is due to the threading (un)support.
  • Timeout: The best tradeoff was found out to be 1 200 seconds (20 minutes). The maximal response time for communication with e.g. Invenio is on the file retrieval, especially when partial read is requested at the end of teh file. It is due to the fact, that the whole file is needed to be downloaded before provision of the data. If the file is large enough, the response may take a long time.

Note

Numerically, it is possible to compute how much will it take to download the whole file from a network location.
The formula is t = ( s * 8 ) / b, where t is time in seconds, s is size of the file in bytes and b is bandwidth of the network connection (download speed) in bits per second. This formula is the lower bound for the time t. In reality it will take longer because the real throughput on the network is always lower than the bandwidth b.
E.g. for 60 GiB file, 1 Gbit/s it will take t = (60 * 1024 * 1024 * 1024 * 8) / 1 000 000 000, t = 515 seconds (~9 minutes) to download the whole file. On 100 Mbit/s it would be 5154 seconds (~86 minutes or ~1,5 hours) which would require longer timeout or different domain (smaller sizes of individual files).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published