add_qiime_labels.py – Takes a directory, a metadata mapping file, and a column name that contains the fasta file names that SampleIDs are associated with, combines all files that have valid fasta extensions into a single fasta file, with valid QIIME fasta labels.¶
Description:
A metadata mapping file with SampleIDs and fasta file names (just the file name itself, not the full or relative filepath) is used to generate a combined fasta file with valid QIIME labels based upon the SampleIDs specified in the mapping file.
See: http://qiime.org/documentation/file_formats.html#metadata-mapping-files for details about the metadata file format.
Example mapping file: #SampleID BarcodeSequence LinkerPrimerSequence InputFileName Description Sample.1 AAAACCCCGGGG CTACATAATCGGRATT seqs1.fna sample.1 Sample.2 TTTTGGGGAAAA CTACATAATCGGRATT seqs2.fna sample.2
This script is to handle situations where fasta data comes already demultiplexed into a one fasta file per sample basis. Only alters the fasta label to add a QIIME compatible label at the beginning.
Example: With the metadata mapping file above, and an specified directory containing the files seqs1.fna and seqs2.fna, the first line from the seqs1.fna file might look like this: >FLP3FBN01ELBSX length=250 xy=1766_0111 region=1 run=R_2008_12_09_13_51_01_ AACAGATTAGACCAGATTAAGCCGAGATTTACCCGA
and in the output combined fasta file would be written like this >Sample.1_0 FLP3FBN01ELBSX length=250 xy=1766_0111 region=1 run=R_2008_12_09_13_51_01_ AACAGATTAGACCAGATTAAGCCGAGATTTACCCGA
No changes are made to the sequences.
Usage: add_qiime_labels.py [options]
Input Arguments:
Note
[REQUIRED]
- -m, --mapping_fp
- SampleID to fasta file name mapping file filepath
- -i, --fasta_dir
- Directory of fasta files to combine and label.
- -c, --filename_column
- Specify column used in metadata mapping file for fasta file names.
[OPTIONAL]
- -o, --output_dir
- Required output directory for log file and corrected mapping file, log file, and html file. [default: .]
- -n, --count_start
- Specify the number to start enumerating sequence labels with. [default: 0]
Output:
A combined_seqs.fasta file will be created in the output directory, with the sequences assigned to the SampleID given in the metadata mapping file.
Example:
Specify fasta_dir as the input directory of fasta files, use the metadata mapping file example_mapping.txt, with the metadata fasta file name column specified as InputFileName, start enumerating with 1000000, and output the data to the directory combined_fasta
add_qiime_labels.py -i fasta_dir -m example_mapping.txt -c InputFileName -n 1000000 -o combined_fasta