Help


Algorithms


Storage-D integrate multiple well validated algorithms, including those proposed by Church et al., Goldman et al., Erlich et al., Ping et al. and a recently developed “Wukong” algorithm. Users could select any of these algorithms for their data codec.


To choose the algorithm proposed by Church et al., users could go to the reference "Church, G. M., Gao, Y. & Kosuri, S. Next-generation digital information storage in DNA. Science. 2012; 337(6102):1628." for detail introduction of the algorithm


To choose the algorithm proposed by Goldman et al., users could go to the reference "Goldman, N. et al. Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature. 2013; 494(7435):77-80." for detail introduction of the algorithm.


To choose the algorithm proposed by Erlich et al., users could go to the reference "Erlich, Y. & Zielinski, D. DNA Fountain enables a robust and efficient storage architecture. Science. 2017; 355(6328):950-954." for detail introduction of the algorithm.


To choose the algorithm proposed by Ping et al., users could go to the reference "Ping Z., Chen S, Zhou G, Huang X, Zhu S, Zhang H, Lee H, Lan Z, Cui J, Chen T, Zhang W, Yang H, Xu X, Church G, Shen Y. Towards Practical and Robust DNA-Based Data Archiving Using‘Yin-Yang Codec’System. Nature Computational Science. 2022; 2, 234–242.” for detail introduction of the algorithm.


To choose "Wukong" algorithm, users could go to the reference "Huang X, Cui J, Qiang W, Ye J, Wang Y, Xie X, Li Y, Dai J. Storage-D: a user-friendly tool that enables practical and personalized data storage into DNA." for detail introduction of the algorithm.


Alternatively, users could contact junbiao.dai@siat.ac.cn & huangxl@siat.ac.cn for more details.


Parameters for data encoding


  • Upload
  • Users could upload any format of data to the server for data codec. In general, data with smaller size would have a faster speed. Different parameter choice also affects the overall speed. If users select shorter encoded lengh, longer homopolymer length or wider GC range, the running speed would be siginificantly improved. Introduction of error correction code, redundancy, codec pin will generally take more running time.



  • Preview
  • Storage-D supports users to preview data including texts and images. For other types of data, users could directly encode it using the integrated algorithms.



  • DNA Length
  • “Encoded length” refers to the length of the DNA sequence that is encoded by the data uploaded to the server. Owing to the principle of different algorithms, the final encoded DNA length might have 1~2 nt difference with the “length” that users type into the server.


  • Homopolymer
  • The length of the single nucleotide homoplymer run strongly contributes to the success of the downstream biochemical experiments of DNA data storage. Storage-D designed the homopolymer runs in the encoded DNA sequence by the internal logic of different algorithms.


  • "Min GC%" and "Max GC%"
  • The GC content directly relates to the success of DNA synthesis, PCR amplification and DNA sequencing. Storage-D designs the GC content of the encoded DNA sequence in terms of the internal logic of its integrated algorithms. "Min GC%" refers to the minimum GC content of the encoded DNA sequence; and "Max GC%" refers to the maximum GC content of the encoded DNA sequence. Storage-D designed the GC content of the encoded DNA sequence in terms of the internal logic of its integrated algorithms.


  • Codec Pin
  • For users who want to encode their data with private codec pin, they could choose to select “codec pin” accompanied with “Wukong” algorithm to encode their data. If the codec pin is set to be “User Defined”, users should record the Pin value while it would be used for data decoding.


  • ECC (RS)
  • Storage-D currently employs Reed-Solomon (RS) code for the correction of errors happening during data storage. By “ECC (RS)”, users could choose to add any number of bytes RS code for their data encoding. Apart from the algorithm proposed by Goldman et al. (they used a four-fold redundancy for error correction in the algorithm), users could choose to design error correction by “ECC (RS)” along with all other algorithms.


  • Flanking Sequence
  • Storage-D employed a PCR based random-access strategy for encoded DNA sequence design. This random-access flanking sequence is well-designed. In the practical DNA data storage application, users could use the reverse complementary sequence of the flanking sequence as primers to access the file by PCR amplification.


  • Redundancy
  • “Redundancy” refers to the overall redundancy that users would like to add into their encoded DNA sequences. Due to damage by long-term storage of the data, or the errors happening during DNA synthesis, DNA sequencing and PCR amplification, introduction of data redundancy could help more accurate recovery of the data. By selecting “Reduancancy”, Storage-D introduces 1/3 additional redundancy into the encoded DNA sequence by a “XOR” conversion. This choice is open for algorithms proposed by Church et al, Ping et al. and “Wukong” algorithm. For algorithms proposed by Goldman et al. and Erlich et al, they introduce redundancy in terms of the internal logic of the algorithms and are not considered here.


  • E.mail
  • Users could select to send their “encoding” results to their e.mail box.

  • Download
  • Users could download their “encoding” result from the server.


    Parameters for data decoding


    Data decoding is a reverse proccess of data encoding. Similar to data encoding, data decoding also has “upload file”, “preview” , “algorithm” and “E.mail” selection.

    Notably, for decoding of their data, users need to record the “parameter” they use to encode their data, which is shown on the first line of their encoding result. A FASTA format DNA sequence file with the “encoding parameters” is accepted for uploading to the server for data decoding.

    In addition, if users encode their data with a defined codec pin by “Wukong” algorithm, this codec pin should be typed into the server for data decoding as well.