Skip to content

Article image
Sequence Format Converter

Convert your DNA or Amino acids sequences between different bioinformatics formats with this versatile converter.

How to Use

  1. Enter your DNA or Amino acids sequence in the input area.
  2. Enter the sequence name, description, and accession number (if applicable) in the corresponding input fields.
  3. Select the input format from the dropdown menu.
  4. Select the desired output format from the dropdown menu.
  5. Click the “Convert” button.
  6. The converted sequence will be displayed in the output area.
  7. Click the “Download” button to save the converted sequence to a file. The file extension will be automatically determined based on the selected output format.

Supported Formats

This converter supports the following sequence formats:

  • FASTA: A simple text-based format representing nucleotide or Amino acids sequences. A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line must begin with a greater-than (”>”) symbol.
  • EMBL: A comprehensive format for storing nucleotide sequence data. An EMBL format file can contain multiple sequences, each with detailed annotations. Sequence data is preceded by ID, AC, DE, and SQ lines, and the sequence itself is often split into lines of 60 characters. The sequence ends with ”//”.
  • GCG: A format used by the Genetics Computer Group (GCG) software package. A GCG format file usually contains a single sequence with annotations. The start of the sequence is marked by a line ending with two dot (”..”) characters.
  • GenBank: A widely used format for storing nucleotide and Amino acids sequence data. Similar to EMBL, GenBank files can contain multiple sequences with annotations. Sequences begin after the “ORIGIN” keyword and end with ”//”.
  • IG/Stanford: A format used by the Integrated Genetics (IG) software. IG format files can contain multiple sequences, each with comments (lines beginning with ”;”), a name line, and the sequence itself, terminated by “1” (linear) or “2” (circular).
  • Plain/Raw: A simple format containing only the sequence characters (IUPAC characters and spaces). No headers or annotations are included. A plain sequence file may contain only one sequence.
  • Pretty: The sequence is formatted for readability, typically by adding spaces every 10 characters.

Note: This converter provides basic format conversions. For more advanced manipulation or analysis of sequence data, specialized bioinformatics tools are recommended. The formatting of some formats (like GCG) might require further adjustment depending on specific software requirements. Checksums and other metadata might not be fully accurate. Always double-check the output, especially for critical applications. The input format detection is basic and may not correctly identify all variations of a format. It’s best to explicitly select the input format.