Downloader Module

class omrdatasettools.Downloader.Downloader[source]

The class for downloading OMR datasets. It downloads the selected dataset from Github and extracts it to a specified directory.

Downloader.download_and_extract_dataset(dataset: OmrDataset, destination_directory: str | Path, tmp_directory: Path | None = None)[source]

Starts the download of the dataset and extracts it into the specified directory.

Parameters:
  • dataset – The dataset that should be downloaded

  • destination_directory – The target directory, where the dataset should be extracted into

  • tmp_directory – The optional directory where the compressed dataset will be downloaded to

Examples

>>> from omrdatasettools import Downloader, OmrDataset
>>> downloader = Downloader()
>>> downloader.download_and_extract_dataset(OmrDataset.Homus_V2, "data")
Downloader.download_images_from_mei_annotation(dataset: OmrDataset, dataset_directory: str, base_url: str)[source]

Crawls the images of an Edirom dataset, if provided with the respective URL. To avoid repetitive crawling, this URL has to be provided manually. If you are interested in these datasets, please contact the authors.

Examples

>>> from omrdatasettools import Downloader, OmrDataset
>>> downloader = Downloader()
>>> downloader.download_and_extract_dataset(OmrDataset.Edirom_Bargheer, "data/Bargheer")
>>> downloader.download_images_from_mei_annotation(OmrDataset.Edirom_Bargheer, "data/Bargheer",
>>>    "INSERT_DATASET_URL_HERE")

or

>>> downloader.download_and_extract_dataset(OmrDataset.Edirom_FreischuetzDigital, "data/Freischuetz")
>>> downloader.download_images_from_mei_annotation(OmrDataset.Edirom_FreischuetzDigital, "data/Freischuetz",
>>>     "INSERT_DATASET_URL_HERE")

OmrDataset Module

class omrdatasettools.OmrDataset.OmrDataset(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

The available OMR datasets that can be automatically downloaded with Downloader.py

AudioLabs_v1 = 24

The AudioLabs v1 dataset (aka. Measure Bounding Box Annotation) from https://www.audiolabs-erlangen.de/resources/MIR/2019-ISMIR-LBD-Measures, Copyright 2019 by Frank Zalkow, Angel Villar Corrales, TJ Tsai, Vlora Arifi-Müller, and Meinard Müller under CC BY-NC-SA 4.0 license.

AudioLabs_v2 = 25

The AudioLabs v2 dataset, enhanced with staves, staff measures and the original system measures. The annotations are available in csv, JSON and COCO format.

Audiveris = 1

The Audiveris OMR dataset from https://github.com/Audiveris/omr-dataset-tools, Copyright 2017 by Hervé Bitteur under AGPL-3.0 license

Baro = 2

The Baro Single Stave dataset from http://www.cvc.uab.es/people/abaro/datasets.html, Copyright 2019 Arnau Baró, Pau Riba, Jorge Calvo-Zaragoza, and Alicia Fornés under CC-BY-NC-SA 4.0 license

Capitan = 3

The Capitan dataset from http://grfia.dlsi.ua.es/, License unspecified, free for research purposes

ChoiAccidentals = 26

The Accidentals detection dataset by Kwon-Young Choi from https://www-intuidoc.irisa.fr/en/choi_accidentals/, License unspecified.

CvcMuscima_MultiConditionAligned = 4

Custom version of the CVC-MUSCIMA dataset that contains all images in grayscale, binary and with the following staff-line augmentations: interrupted, kanungo, thickness-variation-v1/2, y-variation-v1/2 typeset-emulation and whitespeckles. (all data augmentations that could be aligned automatically). The grayscale images are different from the WriterIdentification dataset, in such a way, that they were aligned to the images from the Staff-Removal dataset. This is the recommended dataset for object detection, as the MUSCIMA++ annotations can be used with a variety of underlying images. See https://github.com/apacha/CVC-MUSCIMA to learn more.

CvcMuscima_StaffRemoval = 5

The larger version of the CVC-MUSCIMA dataset for staff removal in black and white with augmentations from http://www.cvc.uab.es/cvcmuscima/index_database.html, Copyright 2012 Alicia Fornés, Anjan Dutta, Albert Gordo and Josep Lladós under CC-BY-NC-SA 4.0 license

CvcMuscima_WriterIdentification = 6

The smaller version of the CVC-MUSCIMA dataset for writer identification in grayscale from http://www.cvc.uab.es/cvcmuscima/index_database.html, Copyright 2012 Alicia Fornés, Anjan Dutta, Albert Gordo and Josep Lladós under CC-BY-NC-SA 4.0 license

DeepScores_V1_Extended = 21

The DeepScore dataset (version 1) with extended vocabulary from https://tuggeluk.github.io/downloads/, License unspecified.

DeepScores_V1_Extended_100_Pages = 20

Subselection of 100 pages from the DeepScore dataset (version 1) with extended vocabulary from https://tuggeluk.github.io/downloads/, License unspecified.

DeepScores_V2_Complete = 23

The complete DeepScore dataset (version 2) from https://zenodo.org/records/4012193, under CC BY 4.0 license.

WARNING: The size of this dataset is over 80GB!

DeepScores_V2_Dense = 22

Subselection of 1714 pages from the DeepScore dataset (version 2) with extended vocabulary from https://zenodo.org/records/4012193, under CC BY 4.0 license.

DoReMi = 27

DoReMi dataset from https://github.com/steinbergmedia/DoReMi/, License unspecified.

Edirom_Bargheer = 7

Edirom dataset. All rights reserved

Edirom_FreischuetzDigital = 8

Edirom datasets on Freischuetz from https://freischuetz-digital.de/edition.html. All rights reserved.

Fornes = 9

The Fornes Music Symbols dataset from http://www.cvc.uab.es/~afornes/, License unspecified - citation requested

Homus_V1 = 10

The official HOMUS dataset from http://grfia.dlsi.ua.es/homus/, License unspecified.

Homus_V2 = 11

The improved version of the HOMUS dataset with several bugs-fixed from https://github.com/apacha/Homus.

MScoreLib_All = 30

The full MScoreLib corpus from http://mscorelib.com/, manually inputed scores by humans, License unspecified.

MScoreLib_Prokofiev = 32

MScoreLib corpus of Prokofiev music, converted with SharpEye and PhotoScore, from http://mscorelib.com/, License unspecified.

MScoreLib_Scriabin = 31

MScoreLib corpus of Scriabin music, converted with SharpEye and PhotoScore, from http://mscorelib.com/, License unspecified.

MuscimaPlusPlus_Images = 14

The subset of 140 images from the CVC-MUSCIMA dataset that were used for the MUSCIMA++ dataset.

MuscimaPlusPlus_MeasureAnnotations = 15

A sub-set of the MUSCIMA++ annotations that contains bounding-box annotations for staves, staff measures and system measures. It was semi-automatically constructed from existing annotations and manually verified for correctness. The annotations are available in a plain JSON format as well as in the COCO format.

MuscimaPlusPlus_V1 = 12

The MUSCIMA++ dataset from https://ufal.mff.cuni.cz/muscima, Copyright 2017 Jan Hajic jr. under CC-BY-NC-SA 4.0 license.

MuscimaPlusPlus_V2 = 13

The second version of the MUSCIMA++ dataset from https://github.com/OMR-Research/muscima-pp.

OpenOmr = 16

The OpenOMR Symbols dataset from https://sourceforge.net/projects/openomr/, Copyright 2013 by Arnaud F. Desaedeleer under GPL license.

OpenScoreLieder = 28

OpenScore Lieder corpus from https://github.com/OpenScore/Lieder, CC-0 license.

OpenScoreStringQuartets = 29

OpenScore StringQuartet corpus from https://github.com/OpenScore/StringQuartets, CC-0 license.

Printed = 17

The Printed Music Symbols dataset from https://github.com/apacha/PrintedMusicSymbolsDataset, Copyright 2017 by Alexander Pacha under MIT license.

Rebelo1 = 18

The Rebelo dataset (part 1) with music symbols from http://www.inescporto.pt/~arebelo/index.php, Copyright 2017 by Ana Rebelo under CC BY-SA 4.0 license

Rebelo2 = 19

The Rebelo dataset (part 2) with music symbols from http://www.inescporto.pt/~arebelo/index.php, Copyright 2017 by Ana Rebelo under CC BY-SA 4.0 license

dataset_download_urls() Dict[str, str][source]

Returns a mapping with all URLs, mapped from their enum keys

get_dataset_download_url() str[source]

Returns the url of the selected dataset. Example usage: OmrDataset.Fornes.get_dataset_download_url()

get_dataset_filename() str[source]

Returns the name of the downloaded zip file of a dataset. Example usage: OmrDataset.Fornes.get_dataset_filename()