首页 > 其他分享 >sra format SRA文件的格式

sra format SRA文件的格式

时间:2023-10-15 09:56:09浏览次数:37  
标签:SRA format fastq accession application reads sra EGA data

http://www.ebi.ac.uk/ena/about/sra_format

Read metadata format

Metadata is represented using XML documents. For detailed information about the metadata XMLs please refer to SRA XML 1.5 metadata format. For examples how to prepare the XMLs please refer to Preparing SRA XML metadata. The following metadata objects are used:

Metadata object   Description Example
Study A study groups together data submitted to the archive. Please use the study accession number when citing data submitted into ENA. ERP000016
Submission A submission contains submission actions to be performed by the archive.  A submission can add more objects to the archive, update already submitted objects or make objects publicly available. ERA000092
Sample A sample contains information about the sequenced samples. Samples are associated with checklists, which define the attributes used to annotate the samples, and experiments or analysis objects. ERS000081
Experiment An experiment contains information about the sequencing experiments including library and instrument detail. ERX000398
Run

Runs are part of experiments and contain sequencing reads submitted in data files (e.g. BAM or CRAM). Each run can contain all or part of the results for a particular experiment.

ERR003990
Analysis An analysis contains secondary analysis results computed from the primary sequencing results (e.g. VCFs with sequence variations or BAMs with sequence alignments). ERZ000001
DAC An European Genome-phenome Archive (EGA) data access committee (DAC). Required for authorized access submissions. EGAC00001000001
Policy An European Genome-phenome Archive (EGA) data access policy. Required for authorized access submissions. EGAP00001000001
Dataset An European Genome-phenome Archive (EGA) data set. Required for authorized access submissions. EGAD00001000001

Accession number format

Each metadata object is assigned a unique accession number by the archive. The accession numbers can be used to retrieve data and metadata using the EB-Eye search available at the top of all EBI web pages or using the free text search available on the ENA home page. The metadata is then retrieved and displayed through the ENA Browser as in the examples in the above table.

Accession numbers assocaited with read data assigned by EBI start with 'ER' and accession numbers assigned by NCBI and DDBJ start with 'SR' and 'DR', respectively. The third letter of the accession number indicates the type of the metadata object. EGA accession numbers start with 'EGA' with the fourth letter indicating the type of the metadata object.

The  accession numbers have a fixed number of digits after the letters: six for ENA and eleven for EGA.

Metadata object  Accession prefix  Number of digits  Example
Submission ERA, SRA, DRA 6 ERA000092
Sample ERS, SRS, DRS 6 ERS000081
Study ERP, SRP, DRP 6 ERP000016
Experiment ERX, SRX, DRX 6 ERX000398
Run ERR, SRR, DRR 6 ERR003990
Analysis ERZ, SRZ, DRZ 6 ERZ000001
EGA Submission EGA 11 EGA00001000001
EGA Sample EGAN 11 EGAN00001000001
EGA Study EGAS 11 EGAS00001000001
EGA Experiment EGAX 11 EGAX00001000001
EGA Run EGAR 11 EGAR00001000001
EGA Analysis EGAZ 11 EGAZ00001000001
EGA DAC EGAC 11 EGAC00001000001
EGA Policy EGAP 11 EGAP00001000001
EGA Data Set EGAD 11 EGAD00001000001

Archive generated fastq file format

Once made public, data submitted to ENA are available for download using ftp and Aspara. Detailed data download instructions are available here. Currently, both submitted data files and archive generated fastq files are made available for download. The naming and format of the generated fastq files are described below.

In general, one fastq file is created for each application read in a run. Please refer to the table below for full details:

Number of application reads Fastq Files              Description
1 <run accession>.fastq.gz For experiments with single application reads only all reads will be made available in one fastq file.
2

 <run accession>_1.fastq.gz

<run accession>_2.fastq.gz

<run accession>.fastq.gz

For paired experiments with two application reads reads will be made available in 1-3 fastq files. If a paired experiment is submitted with both application reads then the first reads will be in <run accession>_1.fastq.gz file, the second reads will be in  <run accession>_2.fastq.gz, and any unpaired reads will be in <run accession>.fastq.gz file. In case a paired experiment is submitted containing only unpaired reads then only a single file will be created: <run accession>.fastq.gz.
> 2 <run accession>_N.fastq.gz

For experiments with more than two application reads (e.g. Complete Genomics or strobed PacBio) one fastq file is created for each application read, however, no empty fastq files are created.

 

The fastq file format is:

@<run accession>.<spot index> <spot name>\[/<read index>\]
<bases>
+
<phred qualities, ASCII encoded starting with '!' (33)>
Field Description
<run accession> The Run accession. A spot is identified uniquely by the combination of the Run accession and the spot index.
<spot index> A positive integer assigned to the spots in the order in which they appear in the run. A spot is identified uniquely by the combination of the Run accession and the spot index.
<spot name> The spot name as it was provided by the submitter.
<read index> A positive integer assigned to the application reads in the order in which they appear in the spot: /1 for first application read and /2 for the second application read.

Single layout example:

@ERR000017.1 IL6_554:7:1:249:322
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
??????????????????????????????>>>>>>

Paired example (first read):

@ERR005143.1 ID49_20708_20H04AAXX_R1:7:1:41:356/1
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh

Paired example (second read):

@ERR005143.1 ID49_20708_20H04AAXX_R1:7:1:41:356/2
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh

SOLiD color example:

The first base is included before the SOLiD colors.

@ERR000451.1 VAB_S0103_20080915_542_14_17_70_F3
T33023230203102103223330020300233001
+
T%245719<.6353&:%0#$1%&%2(--27*%&%,

标签:SRA,format,fastq,accession,application,reads,sra,EGA,data
From: https://www.cnblogs.com/emanlee/p/3428073.html

相关文章

  • SimpleDateFormat线程安全性
    SimpleDateFormat线程安全性0结论SimpleDateFormat是线程不安全的。在JDK中关于SimpleDateFormat有这样一段描述:Dateformatsarenotsynchronized.Itisrecommendedtocreateseparateformatinstancesforeachthread.Ifmultiplethreadsaccessaformatconcurr......
  • Code-C++-chrono to tm (format time)
    Code-C++-chronototm(formattime)std::chrono::system_clock::time_pointnow=std::chrono::system_clock::now(); std::time_tnow_time_t=std::chrono::system_clock::to_time_t(now); std::tm*now_tm=std::localtime(&now_time_t); charbuffer[128......
  • struct.error: 'H' format requires 0 <= number <= 65535
    全部代码如下:frompymodbus.clientimportModbusTcpClient#避坑:write_registers和write_register函数差一个s。多一个s的参数用整型列表,没有的只能用整型defsplit_float_to_integer_and_fraction_parts(number):"""将浮点数拆分为整数部分和小数部分的函数......
  • cmakelist文件format
    这里主要是希望在vscode中编写CMakeList.txt过程中,对[[cmake]]语言进行format处理。首先在vscode中安装cmake-format插件cmake-format-VisualStudioMarketplace然后需要安装cmake-format.exe程序。这里需要使用[[Python]]进行安装:pipinstallcmake_forma......
  • Codeforces Round 707 (Div. 2, based on Moscow Open Olympiad in Informatics) B. N
    按以下\(n\)次操作制作蛋糕。叠上第\(i\)块面包,然后浇上\(a_i\)单位的奶油。可以使当前往下\(a_i\)块面包沾上奶油。输出空格隔开的\(n\)个数,第\(i\)个的\(0/1\)代表第\(i\)块面包是否沾有奶油。比较显然的思路可以进行差分修改。view1#include<bits/std......
  • Math、System、Runtime //BigDecimal、Date、SimpleDaateFormat、Calendar
    1、Math =======================================================================================BigDecimal 1、构造器publicBigDecimal(Stringval)publicclassBigDecimalDemo01{publicstaticvoidmain(String[]args){//目标:掌握BigDecimal的......
  • EDI data format
    http://www.edidev.com/XMLvsEDI.htmlEDIvs.XML  EDIToolforDevelopersWhatisEDI?EDIorElectronicDataInterchangeistheexchangeofinformationinastandardformatbetweencomputerswithoutanyhumanintermediary.HasEDIbecomeobsolete?Farfrom......
  • mybatis出现错误 java lang NumberFormatException:For input string:A1
    使用mybatis,当使用map传参并且在iftest判断时使用map中所传的参数时,可能会产生如题的报错,具体报错信息见下图:分析这个错误,自己调试也找过度娘,“坚信”自己代码并没问题,但是问题始终无法解决。最后在一个帖子看到说iftest判断时,传入的参数跟匹配的值类型必须一致,于是调整了自己代......
  • jps提示9194 -- process information unavailable
    产生原因processinformationunavailable,是因为进程没有被正常结束,比如资源占用过大时挂掉或者没有结束进程就重启服务器,导致原进程变为--processinformationunavailable空值,解决方案有时候这个异常进程会自动消失,如果不消失的话,进入/tmp目录后,有以hsperfdata_{用户名}这样的......
  • 根据sra号从ncbi下载标准fastq数据
     001、ncbi官网   002、SRALite和SRANormalized的区别:https://www.omicsclass.com/article/2178如下图:sra.lite的磁盘占用小于标准sra的,以SRR3156163为例。    003、sra.lite和sra标准数据下载 004、点击dataaccess  005、如下图:1未标准......