Rat Data Access FAQ
How do I search for Rat reads corresponding to a genomic sequence?
I have an accession number, how do I find out which rat chromosome my sequence is on?
How do I find my sequence, and what is it I get when I have found a match?
When I search the rat genome at BCM or at Ensemble the results look different, why?
How do I get the cDNA for accession XM_nnnnnn?
How Do I obtain a Rat Genomic clone (BAC/PAC) from a particular chromosome carrying a specific gene?
How do I search for Rat reads corresponding to a genomic sequence?
For a large genomic clone, you need to first identify the corresponding genomic clone or clones. BLAST is the tool to use for this. You can search the Rat dat a either BAC alone or BAC+WGS assemblies from our website (choose the BLAST link in the sidebar). Use short sequences from the beginning, middle and end of a genomic clone, rather that the entire genomic sequence since the size limit on this search is 10kb. For one particular query with 4 short sequences from a BAC, all 4 sequences matched the BCM project gabz (identified as atlas_gabz.fa in the output with a link to the assembled fasta sequence). If the BAC set alone had been searched, the results would be linked to project_gabz.fa.
The reads for project gabz can be downloaded from the Trace Archive at the NCBI. Use the query "center_name = 'BCM' and center_project = 'GABZ'" in the search window, the URL for the search is below.
This found 765 traces in the archive.These are only BAC traces. To get the WGS reads, you need to stick them to the BAC reads.
I have an accession number, how do I find out which rat chromosome my sequence is on?
Search Entrez at the NCBI with the clone name from that accession (for instance AC118441 is clone CH230-259K23). This search returns all of the genomic contigs from the BACtig containing that clone, as well as the BAC clone itself, and the BAC end sequences from that clone. In this case, there are 24 records, 21 WGS contigs, 1 BAC clone, and 2 BAC end sequences. All of the WGS contigs list the chromosome that they map to in the title line.
How do I find my sequence, and what is it I get when I have found a match?
You can search the rat genome using BLAST from a number of different sites, and in several different forms.
BLAST searches at BCM (see the link in the sidebar) search superbactigs, enriched BACs or BACs. The genome is most complete in the superbactig search, the enriched BACs provide the most complete clone-based information, and the BACs provide the most reliable clone-based information (since WGS reads are not included that would extend beyond the ends of the BAC). Searches of the genome provided by Ensembl, query the genome sequence data that is found in the superbactigs.
There are formatting differences between these databases.
Individual sequence contigs have a numeric designation like RNOR01100761, an accession in GenBank like AABR01100617 and BCM internal identifiers like gkyx_gbqzContig90. Searching using BLAST from the BCM site links these internal identifiers to the fasta file of the entire BACtig that that sequence contig came from.
All of this information can be linked through Entrez searches. The RNOR01100761 number and the four-letter BCM BAC project identifier (the gvkb in the atlas_gvkb... name) both retrieve sequences with clone names associated. If you search NCBI with one of these designations (the gvkb part of the BCM scaffold or the RNOR01100761 from Ensembl, the accessioned sequence piece is returned. The sequence contigs (RNOR...) list all the BAC clones that were in the scaffold containing that contig. The clone scaffold lists the clone that that sequence came from and the sequence contains the scaffolded sequence within that BAC. The BCM access is more useful if you want the clone association and the sequence of the clone. The Ensembl search gives the location in the chromosome coordinates.
When I search the rat genome at BCM or at Ensemble the results look different, why?
The main difference one sees as an end user, is that there are different programs (WU-BLAST2 and NCBI-BLAST2) and different parameter settings used in the search The net result is that the Ensembl site shows more of the sequence in an alignment (the cut-offs are less stringent at Ensembl). For example a search found this alignment at BCM:
Query: 707 aagctcccacagatggctgccttctttggttgatcccagtccttggagggccatcagtcg
766
||||||||||||||| ||||| || ||||||||||||||| |||| || |||||||
Sbjct: 81366 aagctcccacagatgtctgccc-ctcaggttgatcccagtcccaggagtgctatcagtct
81308
Query: 767 gggatggttagtgtgggaggagatggaggaatggatccaggagctggagtgtctggtcca
826
|| ||||| | |||||||| | |||| ||||||||||||| |||| |||||||||||
Sbjct: 81307 ggattggtt-gagtgggagggaacagaggcatggatccaggaggtggaatgtctggtcca
81249
Query: 827 aacttttgatcagttcatgtgccttttgatgtagcag 863
||||||||||||| |||||||| |||||||||||||
Sbjct: 81248 aacttttgatcaggccatgtgccctttgatgtagcag 81212
The same search produced this at Ensembl:
Query: 581 CACAAGAATACTGAGAGAGCCA-AGCGGCCTGATGTGT-TTGATGGTGACCGTCACCGGT
638
CACA AAT CT AG GAGC AG G TG G GT TTGATG T C C C GT
Sbjct: 10293 CACAT-AAT-CTCAG-GAGCATCAGGAGAATGGAGGGTCTTGATGCTTGCATGCTCTTGT
10349
Query: 639 -G-CAGCATGATGAATGCGTTAGTGGTAGTTATTAGGCATCTGCTA-CAGGTTCAAGGTT
695
G C GCAT A A G GT TAGT AT AG AT CT CAG T GGTT
Sbjct: 10350 AGACTGCAT-AGCAGAATGAATGTCTTAGTGAT-AGATATTATCTGTCAGCTGTG-GGTT
10406
Query: 696 AAAGAGCCTGGAAGCTCCCACAGATGGCTGCCTTCTTTGGTTGATCCCAGTCCTTGGAGG
755
A GA T AAGCTCCCACAGATG CTGCC CT GGTTGATCCCAGTCC GGAG
Sbjct: 10407 CA-GAT--TA-AAGCTCCCACAGATGTCTGCCC-CTCAGGTTGATCCCAGTCCCAGGAGT
10461
Query: 756 GCCATCAGTCGGGGATGGTTAGTGTGGGAGGAGATGGAGGAATGGATCCAGGAGCTGGAG
815
GC ATCAGTC GG TGGTT G GTGGGAGG A GAGG ATGGATCCAGGAG TGGA
Sbjct: 10462 GCTATCAGTCTGGATTGGTT-GAGTGGGAGGGAACAGAGGCATGGATCCAGGAGGTGGAA
10520
Query: 816 TGTCTGGTCCAAACTTTTGATCAGTTCATGTGCCTTTTGATGTAGCAGGAAGGAGTACCC
875
TGTCTGGTCCAAACTTTTGATCAG CATGTGCC TTTGATGTAGCAG AG AGT C C
Sbjct: 10521 TGTCTGGTCCAAACTTTTGATCAGGCCATGTGCCCTTTGATGTAGCAGACAG-AGTTCTC
10579
Query: 876 TTTAAGAGCAG---TGCTGAGGCCAGCAGGCCAGCTTGCGTTCCCCTGCAATCTCTGGCA
932
T AAGA CA TGCTGAGGCCAGCAGG CAG TTGC TTCCC GCA CTCTGGCA
Sbjct: 10580 TCCAAGATCATCTATGCTGAGGCCAGCAGGGCAG-TTGCTTTCCCA-GCATGCTCTGGCA
10637
Query: 933 GGTATCATCCAAACTTGTGTGGG-TGGATCTTCTTGC-TG-TGCTTC-TTAGT-ATCTC-
986
T T A C A A G TGGG TGG T TTC C TG T C TC TT GT ATCT
Sbjct: 10638 CATGTGAAC-ACATGAGCATGGGCTGGGT-TTCAACCCTGGTCCATCCTTTGTCATCTGT
10695
Query: 987 -ATGGTCTGTCCTAGTGATGATCTCCTTAGGTTGT 1020
ATGGT TG C T TGATG T TCCT AG TGT
Sbjct: 10696 GATGGT-TGGCTTGTTGATGCT-TCCT-AGTCTGT 10727
Note that the best part of the alignment is included in both reports (query position 707-863), but the more gap ridden parts of the Ensembl alignment (581-706, 864-1020) is excluded from the BCM report.
What is a scaffold?
In general, a scaffold is the invisible structure that gives the position of each sequence contig in relation to other sequence contigs in a sequence assembly. Usually this structure is due to the relationships between pairs of end sequences from different sized clones (BACs, fosmids, 10kb, 2kb). In the case of the rat sequence, there is also BAC association information that can be used to scaffold sequence contigs within a BAC in the scaffold.
When I search the rat sequence at the NCBI, the contigs list several BAC clones - how do I find the one or two clones that contain my query sequence?
Search the HTGS division of GenBank to identify the individual BAC sequence that contains your sequence of interest. Each BAC has an accessioned sequence in GenBank, although the sequence may vary slightly from the genomic assembly because the assembly methods differ. How to obtain clones
How do I get the cDNA for accession XM_nnnnnn?
The XM_ accessions refer to a reference sequence generated by an automatic process by the NCBI from the rat genomic sequence generated by the Rat Genome Sequencing Consortium. The genomic sequence is available, the contig is listed in the description of the cDNA (NW_044113), the cDNA sequence is virtual. To get the BAC clones associated with the genomic sequence, follow the instructions below.
How do I obtain a Rat Genomic clone (BAC/PAC) from a particular chromosome carrying a specific gene?
To find the BAC that corresponds to a particular region of the RAT genome, search the HTGS databases at the NCBI using BLASTN or MegaBLAST or search the BAC or BAC + WGS assemblies here. Title lines at the NCBI indicate the clones that are in the assembly.
