Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vCard (.VCF) #6888

Open
RokeJulianLockhart opened this issue Jun 14, 2024 · 4 comments · May be fixed by #6941
Open

vCard (.VCF) #6888

RokeJulianLockhart opened this issue Jun 14, 2024 · 4 comments · May be fixed by #6941
Labels
Add Language Good First Issue This is a great opportunity to start contributing to Linguist

Comments

@RokeJulianLockhart RokeJulianLockhart added Add Language Good First Issue This is a great opportunity to start contributing to Linguist labels Jun 14, 2024
@DecimalTurn
Copy link
Contributor

DecimalTurn commented Jul 16, 2024

There seems to be 2 data formats using the .vcf extension:

  • vCard with a similar syntax to iCalendar.
  • Variant Call Format (VCF) which is a tab-delimited textual format (source), ie. Tab Seperated Values (TSV).

In both cases, they have enough entries to be added to Linguist:

To make the distinction with a heuristic, a simple check of the presence of BEGIN:VCARD at the top of the file should be enough.

@DecimalTurn
Copy link
Contributor

@Alhadis, it seems that Variant Call Format files use 2 # to indicate metadata at the top of the file and a single # to indicate the column headers. Do you think that would justify treating it seperatly from regular TSV data or we could simply add .vcf to the extensions associated to TSV?

Related PR: #4959

##fileformat=VCFv4.2
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Total number of alternate alleles in called genotypes">
##INFO=<ID=AF,Number=A,Type=Float,Description="Estimated allele frequency in the range (0,1]">
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of samples with data">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=LV,Number=1,Type=Integer,Description="Level in the snarl tree (0=top level)">
##INFO=<ID=PS,Number=1,Type=String,Description="ID of variant corresponding to parent snarl">
##INFO=<ID=AT,Number=R,Type=String,Description="Allele Traversal as path in graph">
##contig=<ID=a,length=630>
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	b
a	281	>1>9	AGCCGGGGCAGAAAGTTCTTCCTTGAATGTGGTCATCTGCATTTCAGCTCAGGAATCCTGCAAAAGACAG	CTGTCTTTTGCAGGATTCCTGTGCTGAAATGCAGATGACCGCATTCAAGGAAGAACTATCTGCCCCGGCT	60.0	.	AC=1;AF=1;AN=1;AT=>1>2>3>4>5>6>7>8>9,>1<8>10<6>11<4>12<2>9;NS=1;LV=0	GT	1

sample source

@lildude
Copy link
Member

lildude commented Jul 17, 2024

I'd suggest adding it to TSV unless it has a unique syntax that necessitates a different grammar for highlighting.

@DecimalTurn
Copy link
Contributor

DecimalTurn commented Jul 17, 2024

Syntax highlighting for Variant Call Format files would be OK, but it won't work with the table "preview" mode.

The first one below works because there is no comment/metadata using # at the top.

image

(source)

Even with one row starting with # at the top, the table preview mode still works:

image
(source)

But as soon as there is more than one row at the top of the file starting with #, the table preview doesn't work.

image
(source)

Because all Variant Call Format files have mutliple rows of metadata at the start, the table preview will never work if we just add it to the list of extensions for TSV.

@DecimalTurn DecimalTurn linked a pull request Jul 17, 2024 that will close this issue
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Add Language Good First Issue This is a great opportunity to start contributing to Linguist
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants