GQVis Dataset: Natural Language to Genomics Visualization

This repository contains the code for generating the GQVis dataset available on Hugging Face.

The code generates a collection of natural language Queries on genomics Data and responds with a visualization specification in the form of a Gosling grammar.

📂 Dataset on Hugging Face: HIDIVE/GQVis

🚀 Overview

Template Generation will create abstract questions and specifications with placeholders for sample, entities, and location as well as constraints for those sample and entities.
Data-schema/All-schema are our defined dataset schemas retrieved from 4DN, ENCODE, and Chromoscope.
Template Expansion will reify the template questions/specifications given the provided schemas for all possibilities that satify the constraints.
Paraphraser will use an LLM framework to paraphrase input questions to cover different styles of expertise and formality in the input.
Multi-step defines links, chains, and scripts to generate multi-step queries.
Alt-Gosling exports bulk Alt-Gosling text based on the resulting .csv file.

🗂️ Folder Structure

.
├── datasets/        # Source structured data files
├── ideogram_data/   # Ideograma data for template expansion
├── location_data/   # Retrieve location for genomic intervals
├── misc/            # Helper code for our paper 
├── multi-step/      # Contains code for multi-step generation and linking
├── paraphraser.py   # LLM code to paraphrase questions
├── template_expansion.py   # Code to reify template questions
├── template_generation.py  # Code to create abstract questions and Gosling specifications
└── README.md        # This file

Name		Name	Last commit message	Last commit date
Latest commit History 204 Commits
data-schema		data-schema
ideogram_data		ideogram_data
location_data		location_data
misc		misc
multi-step		multi-step
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
gqvis-pipeline.png		gqvis-pipeline.png
paraphraser.py		paraphraser.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
template_expansion.py		template_expansion.py
template_generation.py		template_generation.py
upload_to_huggingface.py		upload_to_huggingface.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GQVis Dataset: Natural Language to Genomics Visualization

🚀 Overview

🗂️ Folder Structure

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

hms-dbmi/GQVis-Generation

Folders and files

Latest commit

History

Repository files navigation

GQVis Dataset: Natural Language to Genomics Visualization

🚀 Overview

🗂️ Folder Structure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages