Increasing use of retroviral vector-mediated gene transfer and recent reports on insertional mutagenesis in mice and humans created intense interest to characterize vector integrations on the genomic level. Techniques to determine insertion sites, mainly based on time consuming manual data processing and compilation, are thus commonly applied in gene therapy laboratories. Since a high variability in processing methods hampers further data comparison, there is an urgent need to systematically process the data arising from such analysis.

The obtained sequences from the integration site analysis are judged to be authentic only if the matching part of the genomic query sequence is surrounded by the 5′LTR-sequence on the one side and the adapter-sequence on the other side. Therefore we developed an Integrationseq tool. In this task, different methods for converting the ABI sequence trace files to high quality sequences and for recognizing and deleting the LTR and adaptor parts of the isolated clones were implemented. If neither a primer nor a LTR could be found, the sequence is discarded. If the LTR is found on the complementary strand, the integration sequence is reversed. The remaining sequence between primer and LTR positions are taken as the n integration sequence and written to a sequence output file.

We validated the Integrationseq tool using 259 trace files originating from integration site analysis (LM-PCR). Sequences can be trimmed by IntegrationSeq, leading to an increased yield of valid integration sequence detection, which has shown to be more sensitive (100%) than conventional analysis (94.3%) and 15 times faster than conventional analysis, while the specifities are equal (both 100%). Valid integration sequences get further processed with IntegrationMap for automatic genomic mapping. IntegrationMap runs 50 times faster than conventional methods and retrieves detailed information about whether integrations are located in or close to genes, the name of the gene, the exact localization in the transcriptional units and further parameters like the distance from the transcription start site to the integration. Further information, e.g. data about CpG-Islands, LINEs or SINEs, and their distances to the integration is also displayed. Output files generated by the task were found to be 99.8% identical with results retrieved by conventional mapping with the Ensembl alignment tool.

Using both tools, IntegrationSeq and IntegrationMap, a validated, fast and standardized high-throughput analysis of insertion sites can be achieved for the first time.

Author notes

Corresponding author

Sign in via your Institution