Retrogenomics WWU Muenster
About Tools Intern
Tutorial of n-way
slide A simple procedure to generate multiple (n)-way alignments in a few steps.
  1. Select a Target genome
  2. introduce the coordinates of interesting loci or select TEs from preexisting RepeatMasker tables.
  3. select the Query genomes
  4. Run the application. To run a direct search both the Target and Server RM files must be from the same species, e.g., human. To run a reverse search, the Server RM File should be uploaded from one of the query sequences. All parameters are visualized in the Parameter Tutorial.
slide The n-way Results Tutorial shown here represents three different runs of the Chiroptera example as outlined in the manuscript. On this occasion, the lesser dawn bat, the great roundleaf bat, or the common vampire bat were selected as targets to conduct a multidirectional screening. Listed are all post-selected diagnostic loci that show perfect presence/absence patterns (Display perfect). The presence or absence is shown for all investigated species as (+) green or (-) yellow, respectively. All data can be saved as Excel files and subsequently sorted and analyzed in Excel. The loci can be re-aligned (up to 1,000 in one step) using MUSCLE-based optimization. As a consequence, the presence/absence states might be corrected. All individual loci can be downloaded as aligned fasta files.
slide llustration of the parameters that can be optimized to perform presence/absence screening and sequence extraction of orthologous loci. Genomic coordinates are given above the lines. The area of insertion is indicated by flanking upwards leading arrows on the target genome (dotted lines). Downward arrows flank the bed-inserts (solid lines).
Two parameters can be set to fine-tune the search criteria and improve searches. The preset-insert parameter denotes user-defined input coordinates of interest; for example, RepeatMasker coordinates from the Target (or Query) genome. Bed-insert denotes the LASTZ-generated insert in the derived 2-way alignment (one species sequence insert compared with a gap in the second species). Depending on the quality of alignments and the distribution of evolutionary changes, in ideal cases, the preset-insert is identical to the bed-insert.
  • Min flanks: (direct and reverse search; default 50 nt): determines the minimum lengths of conserved block flanks surrounding a preset-insert.
  • Insert/gap size: (direct and reverse search; default 10 nt): determines the maximum opposite gap size by coordinates of gap regions.
  • Max distance: (direct and reverse search for distance method; reverse search for overlap method; default 20 nt): determines the maximum allowed distance between preset-insert coordinates and bed-insert coordinates.
  • Max difference ratio: (reverse search; default 0.3): allows not more than 30% length variation between preset-insert (query) and bed-inserts of another query.
  • Max overlap: (direct and reverse search; only for overlap search method; default 25 nt): determines the maximum overlap of preset-inserts with the flanks of the bed-inserts.
  • Min coverage: (direct and reverse search; only for overlap search method; default 0.7): indicates that at least 70% of the preset-inserts and bed-inserts overlap.
  • Extract extension: (extraction; default 500 nt): extending flanks of bed-insert/gap loci in both 5'-and 3'-directions where available. MUSCLE-based optimization (default without): Based on the MUSCLE sequence realignment results, for all sequences in a specified length range resulting symbols will be corrected.
  • Min target length: (default 10 nt): MUSCLE-based optimization is applied to all preset-inserts 10 nt.
  • Max target length: (default 250 nt): MUSCLE-based optimization is applied to all preset-inserts 250 nt.