rmacaquehoneybeeseaurchintriboliumwallabyorangutanmarmosetnasoniaaphiddolphin 
 projectsrathumancdnamousedrosophiladictyosteliumchimpanzeebovinemicrobial
BAC FisherClone SummaryRecent AssembliesSequencing Project BrowserBLASTFTP dataAtlas dataNCBI Trace ArchiveRat QTL

 
 
       
 
 

-How do I search for Rat reads corresponding to a genomic sequence?
-I have an accession number, how do I find out which rat chromosome my sequence is on?
-How do I find my sequence, and what is it I get when I have found a match?
-When I search the rat genome at BCM or at Ensemble the results look different, why?
-What is a scaffold?
-When I search the rat sequence at the NCBI, the contigs list several BAC clones - how do I find the one or two clones that contain my query sequence?
-How do I get the cDNA for accession XM_nnnnnn?
-How Do I obtain a Rat Genomic clone (BAC/PAC) from a particular chromosome carrying a specific gene?

How do I search for Rat reads corresponding to a genomic sequence?

The BAC fisher is designed to work with a set of reads with associated quality. The BAC fisher expects results to extend to the end of a read, so other types of sequences do not work well.

The BAC fisher can be run for an mRNA or other expressed sequence (this implementation does a BLAST of the expressed sequence and takes the reads from that BLAST search to seed a BAC fisher run).

For a large genomic clone, you need to first identify the corresponding genomic clone or clones. BLAST is the tool to use for this. You can search the Rat dat a either BAC alone or BAC+WGS assemblies from our website (choose the BLAST tab at the top of this page). Use short sequences from the beginning, middle and end of a genomic clone, rather that the entire genomic sequence since the size limit on this search is 10kb. For one particular query with 4 short sequences from a BAC, all 4 sequences matched the BCM project gabz (identified as atlas_gabz.fa in the output with a link to the assembled fasta sequence). If the BAC set alone had been searched, the results would be linked to project_gabz.fa.

The reads for project gabz can be downloaded from the Trace Archive at the NCBI. Use the query "center_name = 'BCM' and center_project = 'GABZ'" in the search window, the URL for the search is below.

This found 765 traces in the archive.These are only BAC traces. To get the WGS reads, you need to stick them to the BAC reads.

For this, download (save as TAR) the quality and fasta files from the NCBI trace archive and load them into the BAC fisher.

I have an accession number, how do I find out which rat chromosome my sequence is on?

Search Entrez at the NCBI with the clone name from that accession (for instance AC118441 is clone CH230-259K23). This search returns all of the genomic contigs from the BACtig containing that clone, as well as the BAC clone itself, and the BAC end sequences from that clone. In this case, there are 24 records, 21 WGS contigs, 1 BAC clone, and 2 BAC end sequences. All of the WGS contigs list the chromosome that they map to in the title line.

How do I find my sequence, and what is it I get when I have found a match?

You can search the rat genome using BLAST from a number of different sites, and in several different forms.

BLAST searches at BCM search superbactigs, enriched BACs or BACs. The genome is most complete in the superbactig search, the enriched BACs provide the most complete clone-based information, and the BACs provide the most reliable clone-based information (since WGS reads are not included that would extend beyond the ends of the BAC). Searches of the genome provided by Ensembl, query the genome sequence data that is found in the superbactigs.

There are formatting differences between these databases.

Individual sequence contigs have a numeric designation like RNOR01100761, an accession in GenBank like AABR01100617 and BCM internal identifiers like gkyx_gbqzContig90. Searching using BLAST from the BCM site links these internal identifiers to the fasta file of the entire BACtig that that sequence contig came from.

All of this information can be linked through Entrez searches. The RNOR01100761 number and the four-letter BCM BAC project identifier (the gvkb in the atlas_gvkb... name) both retrieve sequences with clone names associated. If you search NCBI with one of these designations (the gvkb part of the BCM scaffold or the RNOR01100761 from Ensembl, the accessioned sequence piece is returned. The sequence contigs (RNOR...) list all the BAC clones that were in the scaffold containing that contig. The clone scaffold lists the clone that that sequence came from and the sequence contains the scaffolded sequence within that BAC. The BCM access is more useful if you want the clone association and the sequence of the clone. The Ensembl search gives the location in the chromosome coordinates.

When I search the rat genome at BCM or at Ensemble the results look different, why?

The main difference one sees as an end user, is that there are different programs (WU-BLAST2 and NCBI-BLAST2) and different parameter settings used in the search The net result is that the Ensembl site shows more of the sequence in an alignment (the cut-offs are less stringent at Ensembl). For example a search found this alignment at BCM:


Query: 707   aagctcccacagatggctgccttctttggttgatcccagtccttggagggccatcagtcg
766
             ||||||||||||||| |||||  ||  |||||||||||||||  |||| || ||||||| 
Sbjct: 81366 aagctcccacagatgtctgccc-ctcaggttgatcccagtcccaggagtgctatcagtct
81308

                                                                         
Query: 767   gggatggttagtgtgggaggagatggaggaatggatccaggagctggagtgtctggtcca
826
             ||  ||||| | ||||||||  |  |||| ||||||||||||| |||| |||||||||||
Sbjct: 81307 ggattggtt-gagtgggagggaacagaggcatggatccaggaggtggaatgtctggtcca
81249

                                                  
Query: 827   aacttttgatcagttcatgtgccttttgatgtagcag 863
             |||||||||||||  |||||||| |||||||||||||
Sbjct: 81248 aacttttgatcaggccatgtgccctttgatgtagcag 81212

The same search produced this at Ensembl:


Query:   581 CACAAGAATACTGAGAGAGCCA-AGCGGCCTGATGTGT-TTGATGGTGACCGTCACCGGT
638
             CACA  AAT CT AG GAGC   AG  G  TG  G GT TTGATG T  C   C C  GT
Sbjct: 10293 CACAT-AAT-CTCAG-GAGCATCAGGAGAATGGAGGGTCTTGATGCTTGCATGCTCTTGT
10349

Query:   639 -G-CAGCATGATGAATGCGTTAGTGGTAGTTATTAGGCATCTGCTA-CAGGTTCAAGGTT
695
              G C GCAT A  A    G   GT  TAGT AT AG  AT   CT  CAG T    GGTT
Sbjct: 10350 AGACTGCAT-AGCAGAATGAATGTCTTAGTGAT-AGATATTATCTGTCAGCTGTG-GGTT
10406

Query:   696 AAAGAGCCTGGAAGCTCCCACAGATGGCTGCCTTCTTTGGTTGATCCCAGTCCTTGGAGG
755
              A GA   T  AAGCTCCCACAGATG CTGCC  CT  GGTTGATCCCAGTCC  GGAG 
Sbjct: 10407 CA-GAT--TA-AAGCTCCCACAGATGTCTGCCC-CTCAGGTTGATCCCAGTCCCAGGAGT
10461

Query:   756 GCCATCAGTCGGGGATGGTTAGTGTGGGAGGAGATGGAGGAATGGATCCAGGAGCTGGAG
815
             GC ATCAGTC GG  TGGTT G GTGGGAGG  A  GAGG ATGGATCCAGGAG TGGA 
Sbjct: 10462 GCTATCAGTCTGGATTGGTT-GAGTGGGAGGGAACAGAGGCATGGATCCAGGAGGTGGAA
10520

Query:   816 TGTCTGGTCCAAACTTTTGATCAGTTCATGTGCCTTTTGATGTAGCAGGAAGGAGTACCC
875
             TGTCTGGTCCAAACTTTTGATCAG  CATGTGCC TTTGATGTAGCAG  AG AGT C C
Sbjct: 10521 TGTCTGGTCCAAACTTTTGATCAGGCCATGTGCCCTTTGATGTAGCAGACAG-AGTTCTC
10579

Query:   876 TTTAAGAGCAG---TGCTGAGGCCAGCAGGCCAGCTTGCGTTCCCCTGCAATCTCTGGCA
932
             T  AAGA CA    TGCTGAGGCCAGCAGG CAG TTGC TTCCC  GCA  CTCTGGCA
Sbjct: 10580 TCCAAGATCATCTATGCTGAGGCCAGCAGGGCAG-TTGCTTTCCCA-GCATGCTCTGGCA
10637

Query:   933 GGTATCATCCAAACTTGTGTGGG-TGGATCTTCTTGC-TG-TGCTTC-TTAGT-ATCTC-
986
               T T A C A A   G  TGGG TGG T TTC   C TG T C TC TT GT ATCT  
Sbjct: 10638 CATGTGAAC-ACATGAGCATGGGCTGGGT-TTCAACCCTGGTCCATCCTTTGTCATCTGT
10695

Query:   987 -ATGGTCTGTCCTAGTGATGATCTCCTTAGGTTGT 1020
              ATGGT TG C T  TGATG T TCCT AG  TGT
Sbjct: 10696 GATGGT-TGGCTTGTTGATGCT-TCCT-AGTCTGT 10727

Note that the best part of the alignment is included in both reports (query position 707-863), but the more gap ridden parts of the Ensembl alignment (581-706, 864-1020) is excluded from the BCM report.

What is a scaffold?

In general, a scaffold is the invisible structure that gives the position of each sequence contig in relation to other sequence contigs in a sequence assembly. Usually this structure is due to the relationships between pairs of end sequences from different sized clones (BACs, fosmids, 10kb, 2kb). In the case of the rat sequence, there is also BAC association information that can be used to scaffold sequence contigs within a BAC in the scaffold.

When I search the rat sequence at the NCBI, the contigs list several BAC clones - how do I find the one or two clones that contain my query sequence?

Search the HTGS division of GenBank to identify the individual BAC sequence that contains your sequence of interest. Each BAC has an accessioned sequence in GenBank, although the sequence may vary slightly from the genomic assembly because the assembly methods differ. How to obtain clones

How do I get the cDNA for accession XM_nnnnnn?

The XM_ accessions refer to a reference sequence generated by an automatic process by the NCBI from the rat genomic sequence generated by the Rat Genome Sequencing Consortium. The genomic sequence is available, the contig is listed in the description of the cDNA (NW_044113), the cDNA sequence is virtual. To get the BAC clones associated with the genomic sequence, follow the instructions below.

  1. From the mRNA reference sequence listing in Entrez (XM_236628), remember the title line information including the LOC number (LOC300999), then click on the corresponding genomic sequence fragment.
  2. Change the display to GenBank format, rather than the graphical format.
  3. Search in the page using your browser for the LOC number (LOC300999). find the location in the genomic contig (1711412..1724945).
  4. Click on the "Get Subsequence" button to the right on the top of the page and enter the location in the "from" and "to" windows. This will give you the genomic sequence associated with that gene.
  5. This genomic sequence was generated from a combined Whole Genome Shotgun and BAC based sequencing method. To find the BAC clones associated with this sequence, you can use the BLAST search at the NCBI. Enter the genomic contig accession (NW_044113) and the subsequence coordinates (1711412..1724945), search the HTGS database (limit to Rattus Norvegicus if you would like). You could also use the BLAST search page linked to the Rat Genome Project pages here. The clone that contains this sequence is AC115564 which is clone CH230-133L24.

How do I obtain a Rat Genomic clone (BAC/PAC) from a particular chromosome carrying a specific gene?

To find the BAC that corresponds to a particular region of the RAT genome, search the HTGS databases at the NCBI using BLASTN or MegaBLAST or search the BAC or BAC + WGS assemblies here. Title lines at the NCBI indicate the clones that are in the assembly.

 
       

.
BCM HGSC