BioVault is a free, open-source, permissionless network for collaborative genomics.
Built with end-to-end encryption, secure enclaves, and data visitation, BioVault lets researchers and participants share insights without ever sharing raw data.
curl -sSL https://raw.githubusercontent.com/openmined/biovault/main/install.sh | bashRun bv check and make sure you have the depenencies listed below.
bv check
BioVault Dependency Check
=========================
Checking java... (version 23)β Found
Checking docker... β Found (running)
Checking nextflow... β Found
Checking syftbox... β Found
=========================
β All dependencies satisfied!
You can bv setup on some systems such as macOS and Google Colab and bv will help you to install the dependencies.
SyftBox requires setup and authentication.
- Development Guide - Setup and testing instructions
- Security - How BioVault protects your data with SyftBox permissions
For development setup and commands, see DEV.md.
The bv CLI provides commands to manage BioVault projects, data, messaging, and utilities.
Global flags
-v, --verboseIncrease log verbosity--config <path>Use a specific config file
Top-level commands
bv updateCheck for updates and install the latestbv init [email]Initialize a new BioVault repo; email is optional (detected fromSYFTBOX_EMAILif omitted)bv infoShow system informationbv checkCheck for required dependenciesbv setupSetup environment for known systems (e.g., Google Colab)bv project create [--name <name>] [--folder <path>]Create a new project scaffoldbv run <project_folder> <participant_source> [--test] [--download] [--dry-run] [--with-docker=<bool>] [--work-dir <dir>] [--resume]participant_sourcecan be a local file path, Syft URL, or HTTP URL (with optional#fragment)--with-dockerdefaults totrue
bv sample-data fetch [--participant-ids id1,id2,...] [--all]Fetch sample databv sample-data listList available sample databv participant add [--id <ID>] [--aligned <file>]Add a participant recordbv participant listList participantsbv participant delete <ID>Delete a participantbv participant validate [--id <ID>]Validate participant files (all if omitted)bv biobank listList biobanks in SyftBoxbv biobank publish [--participant-id <ID>] [--all] [--http-relay-servers host1,host2,...]Publish participantsbv biobank unpublish [--participant-id <ID>] [--all]Unpublish participantsbv config email <email>Set email addressbv config syftbox [--path <config.json>]Set SyftBox config pathbv fastq combine <input_folder> <output_file> [--validate] [--no-prompt] [--stats-format tsv|yaml|json]Combine/validate FASTQ filesbv submit <project_path> <destination>Submit a project (destination is datasite email or full Syft URL)bv samplesheet create <input_dir> <output_file> [--file_filter <pattern>] [--extract_cols <pattern>] [--ignore]Create sample sheet CSV from files
Inbox and messaging
bv inboxInteractive inbox (default; uses single-key shortcuts)- Shortcuts:
?/hHelp,nNew,sSync,vChange view,qQuit,1..5Tabs (Inbox, Sent, All, Unread, Projects) - Arrow keys navigate; Enter opens the selected message or Quit
- Shortcuts:
bv inbox --plain [--sent] [--all] [--unread] [--projects] [--type <text|project|request>] [--from <sender>] [--search <term>]- Non-interactive list output with filters
bv message send <recipient> <message> [-s|--subject <subject>]Send a messagebv message reply <message_id> <body>Reply to a messagebv message read <message_id>Read a specific messagebv message delete <message_id>Delete a messagebv message list [--unread]List messages (optionally only unread)bv message thread <thread_id>View a message threadbv message syncSync messages (check for new and update ACKs)
Examples
- Initialize and set email:
bv init [email protected] - Create a new project:
bv project create --name demo --folder ./demo - Run a project with test data:
bv run ./demo participants.yaml --test --download - Combine FASTQs:
bv fastq combine ./fastq_pass ./combined/output.fastq.gz --validate - Interactive inbox:
bv inbox(press?for shortcuts) - Plain inbox list:
bv inbox --plain --unread - Create sample sheet from genotype files:
# Extract participant IDs from filenames matching a pattern bv samplesheet create test_dir output.csv --extract_cols="{participant_id}_X_X_GSAv3-DTC_GRCh38-{date}.txt" # Example with files: 103704_X_X_GSAv3-DTC_GRCh38-07-01-2025.txt # Produces CSV: # participant_id,genotype_file_path # 103704,/absolute/path/test_dir/103704_X_X_GSAv3-DTC_GRCh38-07-01-2025.txt
The bv files commands provide a flexible workflow for importing genomic data files with automatic participant ID extraction and file type detection.
bv files scan /path/to/dataOutput:
π Scan Results: /path/to/data
Extensions Found:
.txt 323 files 6701.8 MB
.csv 4 files 32.6 MB
Total: 332 files
bv files suggest-patterns /path/to/data --ext .txtOutput:
π Detected Patterns:
1. {parent} - Directory name as participant ID
Example: huE922FC/...
Sample extractions:
huE922FC/... β participant ID: huE922FC
huBF0F93/... β participant ID: huBF0F93
bv files import /path/to/data --ext .txt --pattern {parent} --dry-runOutput shows sample participant ID extractions without importing.
bv files export-csv /path/to/data --ext .txt --pattern {parent} -o genotype-files.csvOutput:
π Found 323 files
β Exported 323 files to genotype-files.csv
bv files detect-csv genotype-files.csv -o genotype-files.csvOutput:
π Detecting file types from genotype-files.csv
π Processing 323 files
π Detecting... 323/323
β Updated CSV written to genotype-files.csv
bv files import-csv genotype-files.csvOutput:
π CSV Import Preview: genotype-files.csv
Files to import: 323
{parent}- Use parent directory name as participant ID{filename}- Use filename as participant ID- Custom patterns can extract from any part of the file path
bv files scan <path>- Scan directory and show file type statisticsbv files suggest-patterns <path> --ext <extension>- Analyze files and suggest participant ID extraction patternsbv files import <path> --ext <ext> --pattern <pattern> [--dry-run]- Preview or import files with patternbv files export-csv <path> --ext <ext> --pattern <pattern> -o <output.csv>- Export file list with participant IDs to CSVbv files detect-csv <input.csv> -o <output.csv>- Detect file types and update CSVbv files import-csv <file.csv>- Import files from CSV into BioVault database
If you need to run multiple syftbox instances checkout sbenv which will help you to isolate them on your machine:
https://github.com/openmined/sbenv
BioVault can auto detect when its in an sbenv activate environment and will target that isolated syftbox for all its usage.