The basmati rice genome

Purugganan lab at NYU, in collaboration with Oxford Nanopore Technologies and the New York Genome Center, have sequenced and assembled the genome of an iconic variety of rice grown in the Indian subcontinent - the basmati rice. The paper was published in Genome Biology, and the accompanying data were deposited in Zenodo. We will explore some of the basic assembly statistics and compare them to the values reported in the paper.

Please download the genome file we will be working with today from here.

Import the genome

We will be working with a polished but unscaffolded version of the genome assembly of the Pakistani variety Basmati 334. Genome assemblies are usually stored in FASTA format. Here is what it looks like:

>header_1
ATCGATCTAGCGATCGAGCTATATATATCCCGCGTAG
>header_2
TAGCGATAGCGGGCATCGATTCAACGCTAGCTGATGC

Note: sequences (but not headers) may be split across multiple lines, but this is not the case in the file we will be working with. In our file, each header and each sequence is a single line of text.

In a perfect genome assembly (e.g. that of C. elegans), the number of sequences will equal the number of chromosomes. However, chromosome-level assemblies are still somewhat rare, and genomes are usually assembled as a larger number of disconnected fragments. These fragments are called “contigs” because they represent contiguous assemblies of shorter sequences.

Let us import the Basmati genome file and examine the number of contigs. FASTA files are text files but they are not really organized as tables. Therefore, we will be using the function readLines() instead of the more familiar read.table() or read.csv() (although you may be able to use those, too). readLines() reads a text file line-by-line and returns a character vector where each element is a line from the file.

genome <- readLines("Basmati334.basmati.not_scaffolded_singleline.fa")

str(genome)
##  chr [1:376] ">contig_1" ...
# use substr() to only display the first 10 characters of each element, because your laptop will likely freeze otherwise (most sequences are very long)
substr(head(genome),
       start = 1,
       stop = 10) 
## [1] ">contig_1"  "AATTTTAGTT" ">contig_2"  "GAGGGGAAGG" ">contig_3" 
## [6] "CACTCCAAAC"

ePDF: How long are the contigs?

Let us create a data.frame with contig names in the first column and contig lengths in the second column. First, extract the contig names.

# contig names are the odd elements of the genome vector
# how would you extract all the odd elements?
contig_names <- genome[c(TRUE,FALSE)] # but there are other ways, too
head(contig_names)
## [1] ">contig_1" ">contig_2" ">contig_3" ">contig_4" ">contig_5" ">contig_6"
# get rid of ">" by splitting each element into the "before >" and "after >" part
contig_names_split <- strsplit(contig_names,
                               split = ">") # split each element by ">"
head(contig_names_split)
## [[1]]
## [1] ""         "contig_1"
## 
## [[2]]
## [1] ""         "contig_2"
## 
## [[3]]
## [1] ""         "contig_3"
## 
## [[4]]
## [1] ""         "contig_4"
## 
## [[5]]
## [1] ""         "contig_5"
## 
## [[6]]
## [1] ""         "contig_6"
class(contig_names_split)
## [1] "list"
contig_names_split_unlisted <- unlist(contig_names_split) # convert list to vector
head(contig_names_split_unlisted)
## [1] ""         "contig_1" ""         "contig_2" ""         "contig_3"
class(contig_names_split_unlisted)
## [1] "character"
contig_names_split_unlisted_cleaned <- contig_names_split_unlisted[c(FALSE,TRUE)] # only keep even elements
head(contig_names_split_unlisted_cleaned)
## [1] "contig_1" "contig_2" "contig_3" "contig_4" "contig_5" "contig_6"
# a simpler way - get rid of the first character in each element (since ">" is always just 1 character)
contig_names_split_unlisted_cleaned <- substring(contig_names,2)
head(contig_names_split_unlisted_cleaned)
## [1] "contig_1" "contig_2" "contig_3" "contig_4" "contig_5" "contig_6"

Now, extract contig sequences and calculate their length.

# sequences are the even elements of the genome vector
sequences <- genome[c(FALSE,TRUE)]

# calculate the length (the number of characters) of each sequences
sequences_length <- nchar(sequences)
head(sequences_length)
## [1]  498296 9857291   67462    8941  199118  192537

Create a data.frame with contig names in the first column and contig lengths in the second column.

basmati <- data.frame("contig_name" = contig_names_split_unlisted_cleaned,
                      "contig_length" = sequences_length)

head(basmati)
##   contig_name contig_length
## 1    contig_1        498296
## 2    contig_2       9857291
## 3    contig_3         67462
## 4    contig_4          8941
## 5    contig_5        199118
## 6    contig_6        192537

How are the contig lengths distributed? Plot a histogram and a horizontal distribution of individual data points in one plot using ggplot2.

ggplot(data = basmati,
       mapping = aes(x = contig_length)) +
  geom_histogram() +
  geom_jitter(mapping = aes(y = -20),
              height = 10,
              width = 0)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

This is essentially an empirical probability density function (ePDF).

eCDF: How along are the top XX% of the contigs?

Let us calculate “by hand” an empirical cumulative density function (eCDF) of the contig lengths and plot it. The X axis will be the same as above. The main difference is that the Y axis must now contain the cumulative fraction of contigs as we are moving left-to-right (i.e. from smallest to largest contig length).

# sort contigs by length and add a column containing the cumulative fraction of contigs (in our case 1/188, 2/188, 3/188, etc.) using dplyr
basmati_ecdf <-
  basmati %>%
  arrange(contig_length) %>%
  mutate(cumulative_fraction_contigs = (1:n()) / n() )

# plot using geom_point
ggplot(data = basmati_ecdf,
       mapping = aes(x = contig_length,
                     y = cumulative_fraction_contigs)) +
  geom_point()

# compare to the built-in function ecdf() to make sure that we got it right
plot(ecdf(basmati_ecdf$contig_length))

How long are the top 50% of the contigs? (The desired answer is something like “longer than XXX kb”). First, draw a line on the plot corresponding to the 50% of contigs and determine by eye where approximately it intersects the eCDF. Then, look at the table we generated and determine the exact number. Plot this number as a vertical line.

# copy the ggplot code from the previous chunk
# and add a horizontal line drawn at 50%
ggplot(data = basmati_ecdf,
       mapping = aes(x = contig_length,
                     y = cumulative_fraction_contigs)) +
  geom_point() +
  geom_hline(yintercept = 0.5)

# print basmati_ecdf and determine what is the shortest contig in the top 50%
basmati_ecdf
##     contig_name contig_length cumulative_fraction_contigs
## 1     contig_81             9                 0.005319149
## 2     contig_24          7433                 0.010638298
## 3    contig_153          7528                 0.015957447
## 4     contig_42          8105                 0.021276596
## 5      contig_4          8941                 0.026595745
## 6    contig_101          9480                 0.031914894
## 7     contig_25          9952                 0.037234043
## 8     contig_20         11700                 0.042553191
## 9     contig_62         12032                 0.047872340
## 10   contig_183         14212                 0.053191489
## 11   contig_181         15949                 0.058510638
## 12    contig_53         16427                 0.063829787
## 13   contig_145         18745                 0.069148936
## 14    contig_36         24859                 0.074468085
## 15    contig_47         29542                 0.079787234
## 16    contig_27         31210                 0.085106383
## 17    contig_39         32749                 0.090425532
## 18    contig_17         33362                 0.095744681
## 19    contig_35         33483                 0.101063830
## 20   contig_169         37647                 0.106382979
## 21    contig_41         40016                 0.111702128
## 22    contig_49         41336                 0.117021277
## 23    contig_16         41376                 0.122340426
## 24    contig_51         44177                 0.127659574
## 25    contig_40         44786                 0.132978723
## 26    contig_54         44788                 0.138297872
## 27    contig_13         45535                 0.143617021
## 28    contig_21         48692                 0.148936170
## 29    contig_11         49007                 0.154255319
## 30    contig_43         52809                 0.159574468
## 31    contig_22         55304                 0.164893617
## 32    contig_33         56366                 0.170212766
## 33    contig_23         61183                 0.175531915
## 34     contig_8         63575                 0.180851064
## 35    contig_18         64234                 0.186170213
## 36    contig_29         67282                 0.191489362
## 37     contig_3         67462                 0.196808511
## 38   contig_156         69065                 0.202127660
## 39    contig_28         71871                 0.207446809
## 40    contig_26         72890                 0.212765957
## 41    contig_50         75157                 0.218085106
## 42   contig_186         76681                 0.223404255
## 43    contig_38         77801                 0.228723404
## 44    contig_10         81664                 0.234042553
## 45   contig_187         83191                 0.239361702
## 46    contig_46         87276                 0.244680851
## 47     contig_9         91244                 0.250000000
## 48    contig_12        100238                 0.255319149
## 49   contig_175        100570                 0.260638298
## 50   contig_173        102007                 0.265957447
## 51    contig_30        104186                 0.271276596
## 52   contig_180        107649                 0.276595745
## 53    contig_32        108243                 0.281914894
## 54    contig_48        109033                 0.287234043
## 55   contig_149        110428                 0.292553191
## 56    contig_37        112909                 0.297872340
## 57    contig_34        113587                 0.303191489
## 58   contig_160        113942                 0.308510638
## 59    contig_45        119599                 0.313829787
## 60   contig_143        120182                 0.319148936
## 61   contig_178        143039                 0.324468085
## 62    contig_97        145297                 0.329787234
## 63     contig_7        146029                 0.335106383
## 64    contig_72        148012                 0.340425532
## 65    contig_19        149136                 0.345744681
## 66   contig_182        154743                 0.351063830
## 67   contig_171        155222                 0.356382979
## 68   contig_188        166611                 0.361702128
## 69    contig_52        172586                 0.367021277
## 70   contig_177        178566                 0.372340426
## 71   contig_172        187082                 0.377659574
## 72     contig_6        192537                 0.382978723
## 73     contig_5        199118                 0.388297872
## 74   contig_179        213575                 0.393617021
## 75   contig_184        222827                 0.398936170
## 76   contig_146        223805                 0.404255319
## 77   contig_127        242673                 0.409574468
## 78   contig_168        245319                 0.414893617
## 79   contig_161        252103                 0.420212766
## 80   contig_107        266072                 0.425531915
## 81   contig_158        271912                 0.430851064
## 82   contig_121        274679                 0.436170213
## 83   contig_155        278187                 0.441489362
## 84   contig_123        284254                 0.446808511
## 85   contig_151        322212                 0.452127660
## 86   contig_170        330676                 0.457446809
## 87   contig_185        348122                 0.462765957
## 88   contig_162        361942                 0.468085106
## 89    contig_68        380643                 0.473404255
## 90   contig_148        395160                 0.478723404
## 91   contig_159        421338                 0.484042553
## 92   contig_150        430447                 0.489361702
## 93   contig_174        441564                 0.494680851
## 94     contig_1        498296                 0.500000000
## 95   contig_129        539136                 0.505319149
## 96   contig_142        539438                 0.510638298
## 97    contig_76        555924                 0.515957447
## 98   contig_110        596148                 0.521276596
## 99   contig_163        600324                 0.526595745
## 100  contig_109        614511                 0.531914894
## 101   contig_84        624327                 0.537234043
## 102  contig_133        627997                 0.542553191
## 103  contig_114        647674                 0.547872340
## 104   contig_92        653668                 0.553191489
## 105  contig_147        668131                 0.558510638
## 106  contig_157        718679                 0.563829787
## 107   contig_82        720367                 0.569148936
## 108   contig_58        757613                 0.574468085
## 109   contig_80        823800                 0.579787234
## 110   contig_57        866722                 0.585106383
## 111   contig_86        890409                 0.590425532
## 112  contig_132        918977                 0.595744681
## 113  contig_134        919952                 0.601063830
## 114  contig_140       1008158                 0.606382979
## 115   contig_78       1033031                 0.611702128
## 116   contig_73       1044897                 0.617021277
## 117   contig_66       1137797                 0.622340426
## 118  contig_116       1191750                 0.627659574
## 119  contig_126       1229814                 0.632978723
## 120  contig_167       1231404                 0.638297872
## 121  contig_137       1311625                 0.643617021
## 122   contig_96       1312965                 0.648936170
## 123  contig_115       1356278                 0.654255319
## 124   contig_95       1466788                 0.659574468
## 125  contig_112       1532608                 0.664893617
## 126  contig_100       1548493                 0.670212766
## 127  contig_164       1585953                 0.675531915
## 128  contig_108       1589164                 0.680851064
## 129   contig_89       1767502                 0.686170213
## 130   contig_79       1968141                 0.691489362
## 131   contig_64       1991304                 0.696808511
## 132  contig_144       2068535                 0.702127660
## 133  contig_119       2114382                 0.707446809
## 134   contig_15       2192592                 0.712765957
## 135  contig_103       2205000                 0.718085106
## 136   contig_75       2214126                 0.723404255
## 137   contig_94       2597398                 0.728723404
## 138  contig_166       2650669                 0.734042553
## 139   contig_99       2734327                 0.739361702
## 140  contig_117       2742382                 0.744680851
## 141  contig_111       3007642                 0.750000000
## 142  contig_105       3073012                 0.755319149
## 143   contig_56       3199752                 0.760638298
## 144   contig_87       3410783                 0.765957447
## 145   contig_63       3414174                 0.771276596
## 146   contig_93       3482043                 0.776595745
## 147  contig_102       3559400                 0.781914894
## 148  contig_136       3648715                 0.787234043
## 149  contig_113       3690720                 0.792553191
## 150   contig_91       3772550                 0.797872340
## 151  contig_139       3796564                 0.803191489
## 152  contig_165       3839796                 0.808510638
## 153  contig_154       3905687                 0.813829787
## 154  contig_125       3909978                 0.819148936
## 155  contig_104       3928836                 0.824468085
## 156   contig_88       4178305                 0.829787234
## 157  contig_120       4327600                 0.835106383
## 158  contig_135       4340191                 0.840425532
## 159   contig_44       4405158                 0.845744681
## 160  contig_131       4605396                 0.851063830
## 161  contig_122       4871081                 0.856382979
## 162  contig_118       5145525                 0.861702128
## 163  contig_141       5267630                 0.867021277
## 164  contig_106       5343470                 0.872340426
## 165   contig_60       5367701                 0.877659574
## 166   contig_65       5479328                 0.882978723
## 167  contig_138       5718025                 0.888297872
## 168  contig_176       6112879                 0.893617021
## 169   contig_67       6316586                 0.898936170
## 170  contig_130       6344805                 0.904255319
## 171   contig_77       6483238                 0.909574468
## 172  contig_128       6953941                 0.914893617
## 173   contig_85       7478188                 0.920212766
## 174   contig_69       7969657                 0.925531915
## 175   contig_70       8515110                 0.930851064
## 176  contig_152       8752696                 0.936170213
## 177   contig_14       8941791                 0.941489362
## 178   contig_71       8990282                 0.946808511
## 179   contig_59       9018131                 0.952127660
## 180   contig_55       9647745                 0.957446809
## 181    contig_2       9857291                 0.962765957
## 182  contig_124      10203661                 0.968085106
## 183   contig_61      10722564                 0.973404255
## 184   contig_31      11117648                 0.978723404
## 185   contig_83      12223136                 0.984042553
## 186   contig_98      12370970                 0.989361702
## 187   contig_74      16390624                 0.994680851
## 188   contig_90      17040366                 1.000000000
# does this relate to any distribution metric that you are familiar with?
median(basmati_ecdf$contig_length)
## [1] 518716
# copy the ggplot code above
# and add a vertical line drawn at the value you just determined 
ggplot(data = basmati_ecdf,
       mapping = aes(x = contig_length,
                     y = cumulative_fraction_contigs)) +
  geom_point() +
  geom_hline(yintercept = 0.5) +
  geom_vline(xintercept = median(basmati_ecdf$contig_length))

The N statistics

In genome biology, the most common way to report the contiguity of an assembly is not the median contig length, but the N statistics, e.g. N50, N90 etc. According to Wikipedia, “given a set of contigs, the N50 is defined as the sequence length of the shortest contig at 50% of the total genome length.” Do not worry if this sounds somewhat confusing. It will become much clearer once we visualize it below.

In fact, the idea of N statistics is inspired by the concept of eCDF. The main difference is that instead of a cumulative fraction of contigs (in our case, 1/188, 2/188, 3/188 etc.), we plot a cumulative length of contigs. Let us calculate the cumulative length and plot it.

# add a column containing the cumulative length of contigs normalized to the total length of the assembly
# hint: use the function cumsum()
basmati_ecdf_n <-
  basmati_ecdf %>%
  mutate(cumulative_length_contigs = cumsum(contig_length) / sum(contig_length) )

# plot the empirical cumulative length function
ggplot(data = basmati_ecdf_n,
       mapping = aes(x = contig_length,
                     y = cumulative_length_contigs)) +
  geom_point()

How long are the contigs that contain top 50% of all bases? Add a horizontal line at y=0.5 and determine where it will intersect the function.

# copy the ggplot code from the previous chunk
# and draw a horizontal line at 50%
ggplot(data = basmati_ecdf_n,
       mapping = aes(x = contig_length,
                     y = cumulative_length_contigs)) +
  geom_point() +
  geom_hline(yintercept = 0.5)

Determine the exact length of the first contig above the 50% line by looking at the table.

basmati_ecdf_n
##     contig_name contig_length cumulative_fraction_contigs
## 1     contig_81             9                 0.005319149
## 2     contig_24          7433                 0.010638298
## 3    contig_153          7528                 0.015957447
## 4     contig_42          8105                 0.021276596
## 5      contig_4          8941                 0.026595745
## 6    contig_101          9480                 0.031914894
## 7     contig_25          9952                 0.037234043
## 8     contig_20         11700                 0.042553191
## 9     contig_62         12032                 0.047872340
## 10   contig_183         14212                 0.053191489
## 11   contig_181         15949                 0.058510638
## 12    contig_53         16427                 0.063829787
## 13   contig_145         18745                 0.069148936
## 14    contig_36         24859                 0.074468085
## 15    contig_47         29542                 0.079787234
## 16    contig_27         31210                 0.085106383
## 17    contig_39         32749                 0.090425532
## 18    contig_17         33362                 0.095744681
## 19    contig_35         33483                 0.101063830
## 20   contig_169         37647                 0.106382979
## 21    contig_41         40016                 0.111702128
## 22    contig_49         41336                 0.117021277
## 23    contig_16         41376                 0.122340426
## 24    contig_51         44177                 0.127659574
## 25    contig_40         44786                 0.132978723
## 26    contig_54         44788                 0.138297872
## 27    contig_13         45535                 0.143617021
## 28    contig_21         48692                 0.148936170
## 29    contig_11         49007                 0.154255319
## 30    contig_43         52809                 0.159574468
## 31    contig_22         55304                 0.164893617
## 32    contig_33         56366                 0.170212766
## 33    contig_23         61183                 0.175531915
## 34     contig_8         63575                 0.180851064
## 35    contig_18         64234                 0.186170213
## 36    contig_29         67282                 0.191489362
## 37     contig_3         67462                 0.196808511
## 38   contig_156         69065                 0.202127660
## 39    contig_28         71871                 0.207446809
## 40    contig_26         72890                 0.212765957
## 41    contig_50         75157                 0.218085106
## 42   contig_186         76681                 0.223404255
## 43    contig_38         77801                 0.228723404
## 44    contig_10         81664                 0.234042553
## 45   contig_187         83191                 0.239361702
## 46    contig_46         87276                 0.244680851
## 47     contig_9         91244                 0.250000000
## 48    contig_12        100238                 0.255319149
## 49   contig_175        100570                 0.260638298
## 50   contig_173        102007                 0.265957447
## 51    contig_30        104186                 0.271276596
## 52   contig_180        107649                 0.276595745
## 53    contig_32        108243                 0.281914894
## 54    contig_48        109033                 0.287234043
## 55   contig_149        110428                 0.292553191
## 56    contig_37        112909                 0.297872340
## 57    contig_34        113587                 0.303191489
## 58   contig_160        113942                 0.308510638
## 59    contig_45        119599                 0.313829787
## 60   contig_143        120182                 0.319148936
## 61   contig_178        143039                 0.324468085
## 62    contig_97        145297                 0.329787234
## 63     contig_7        146029                 0.335106383
## 64    contig_72        148012                 0.340425532
## 65    contig_19        149136                 0.345744681
## 66   contig_182        154743                 0.351063830
## 67   contig_171        155222                 0.356382979
## 68   contig_188        166611                 0.361702128
## 69    contig_52        172586                 0.367021277
## 70   contig_177        178566                 0.372340426
## 71   contig_172        187082                 0.377659574
## 72     contig_6        192537                 0.382978723
## 73     contig_5        199118                 0.388297872
## 74   contig_179        213575                 0.393617021
## 75   contig_184        222827                 0.398936170
## 76   contig_146        223805                 0.404255319
## 77   contig_127        242673                 0.409574468
## 78   contig_168        245319                 0.414893617
## 79   contig_161        252103                 0.420212766
## 80   contig_107        266072                 0.425531915
## 81   contig_158        271912                 0.430851064
## 82   contig_121        274679                 0.436170213
## 83   contig_155        278187                 0.441489362
## 84   contig_123        284254                 0.446808511
## 85   contig_151        322212                 0.452127660
## 86   contig_170        330676                 0.457446809
## 87   contig_185        348122                 0.462765957
## 88   contig_162        361942                 0.468085106
## 89    contig_68        380643                 0.473404255
## 90   contig_148        395160                 0.478723404
## 91   contig_159        421338                 0.484042553
## 92   contig_150        430447                 0.489361702
## 93   contig_174        441564                 0.494680851
## 94     contig_1        498296                 0.500000000
## 95   contig_129        539136                 0.505319149
## 96   contig_142        539438                 0.510638298
## 97    contig_76        555924                 0.515957447
## 98   contig_110        596148                 0.521276596
## 99   contig_163        600324                 0.526595745
## 100  contig_109        614511                 0.531914894
## 101   contig_84        624327                 0.537234043
## 102  contig_133        627997                 0.542553191
## 103  contig_114        647674                 0.547872340
## 104   contig_92        653668                 0.553191489
## 105  contig_147        668131                 0.558510638
## 106  contig_157        718679                 0.563829787
## 107   contig_82        720367                 0.569148936
## 108   contig_58        757613                 0.574468085
## 109   contig_80        823800                 0.579787234
## 110   contig_57        866722                 0.585106383
## 111   contig_86        890409                 0.590425532
## 112  contig_132        918977                 0.595744681
## 113  contig_134        919952                 0.601063830
## 114  contig_140       1008158                 0.606382979
## 115   contig_78       1033031                 0.611702128
## 116   contig_73       1044897                 0.617021277
## 117   contig_66       1137797                 0.622340426
## 118  contig_116       1191750                 0.627659574
## 119  contig_126       1229814                 0.632978723
## 120  contig_167       1231404                 0.638297872
## 121  contig_137       1311625                 0.643617021
## 122   contig_96       1312965                 0.648936170
## 123  contig_115       1356278                 0.654255319
## 124   contig_95       1466788                 0.659574468
## 125  contig_112       1532608                 0.664893617
## 126  contig_100       1548493                 0.670212766
## 127  contig_164       1585953                 0.675531915
## 128  contig_108       1589164                 0.680851064
## 129   contig_89       1767502                 0.686170213
## 130   contig_79       1968141                 0.691489362
## 131   contig_64       1991304                 0.696808511
## 132  contig_144       2068535                 0.702127660
## 133  contig_119       2114382                 0.707446809
## 134   contig_15       2192592                 0.712765957
## 135  contig_103       2205000                 0.718085106
## 136   contig_75       2214126                 0.723404255
## 137   contig_94       2597398                 0.728723404
## 138  contig_166       2650669                 0.734042553
## 139   contig_99       2734327                 0.739361702
## 140  contig_117       2742382                 0.744680851
## 141  contig_111       3007642                 0.750000000
## 142  contig_105       3073012                 0.755319149
## 143   contig_56       3199752                 0.760638298
## 144   contig_87       3410783                 0.765957447
## 145   contig_63       3414174                 0.771276596
## 146   contig_93       3482043                 0.776595745
## 147  contig_102       3559400                 0.781914894
## 148  contig_136       3648715                 0.787234043
## 149  contig_113       3690720                 0.792553191
## 150   contig_91       3772550                 0.797872340
## 151  contig_139       3796564                 0.803191489
## 152  contig_165       3839796                 0.808510638
## 153  contig_154       3905687                 0.813829787
## 154  contig_125       3909978                 0.819148936
## 155  contig_104       3928836                 0.824468085
## 156   contig_88       4178305                 0.829787234
## 157  contig_120       4327600                 0.835106383
## 158  contig_135       4340191                 0.840425532
## 159   contig_44       4405158                 0.845744681
## 160  contig_131       4605396                 0.851063830
## 161  contig_122       4871081                 0.856382979
## 162  contig_118       5145525                 0.861702128
## 163  contig_141       5267630                 0.867021277
## 164  contig_106       5343470                 0.872340426
## 165   contig_60       5367701                 0.877659574
## 166   contig_65       5479328                 0.882978723
## 167  contig_138       5718025                 0.888297872
## 168  contig_176       6112879                 0.893617021
## 169   contig_67       6316586                 0.898936170
## 170  contig_130       6344805                 0.904255319
## 171   contig_77       6483238                 0.909574468
## 172  contig_128       6953941                 0.914893617
## 173   contig_85       7478188                 0.920212766
## 174   contig_69       7969657                 0.925531915
## 175   contig_70       8515110                 0.930851064
## 176  contig_152       8752696                 0.936170213
## 177   contig_14       8941791                 0.941489362
## 178   contig_71       8990282                 0.946808511
## 179   contig_59       9018131                 0.952127660
## 180   contig_55       9647745                 0.957446809
## 181    contig_2       9857291                 0.962765957
## 182  contig_124      10203661                 0.968085106
## 183   contig_61      10722564                 0.973404255
## 184   contig_31      11117648                 0.978723404
## 185   contig_83      12223136                 0.984042553
## 186   contig_98      12370970                 0.989361702
## 187   contig_74      16390624                 0.994680851
## 188   contig_90      17040366                 1.000000000
##     cumulative_length_contigs
## 1                2.328254e-08
## 2                1.925207e-05
## 3                3.872663e-05
## 4                5.969385e-05
## 5                8.282376e-05
## 6                1.073480e-04
## 7                1.330934e-04
## 8                1.633607e-04
## 9                1.944868e-04
## 10               2.312525e-04
## 11               2.725118e-04
## 12               3.150076e-04
## 13               3.635000e-04
## 14               4.278089e-04
## 15               5.042326e-04
## 16               5.849713e-04
## 17               6.696913e-04
## 18               7.559971e-04
## 19               8.426159e-04
## 20               9.400067e-04
## 21               1.043526e-03
## 22               1.150460e-03
## 23               1.257498e-03
## 24               1.371781e-03
## 25               1.487641e-03
## 26               1.603505e-03
## 27               1.721302e-03
## 28               1.847265e-03
## 29               1.974044e-03
## 30               2.110658e-03
## 31               2.253727e-03
## 32               2.399543e-03
## 33               2.557820e-03
## 34               2.722285e-03
## 35               2.888455e-03
## 36               3.062510e-03
## 37               3.237031e-03
## 38               3.415699e-03
## 39               3.601625e-03
## 40               3.790188e-03
## 41               3.984616e-03
## 42               4.182985e-03
## 43               4.384253e-03
## 44               4.595513e-03
## 45               4.810724e-03
## 46               5.036503e-03
## 47               5.272546e-03
## 48               5.531857e-03
## 49               5.792026e-03
## 50               6.055913e-03
## 51               6.325437e-03
## 52               6.603920e-03
## 53               6.883939e-03
## 54               7.166001e-03
## 55               7.451673e-03
## 56               7.743763e-03
## 57               8.037607e-03
## 58               8.332369e-03
## 59               8.641765e-03
## 60               8.952670e-03
## 61               9.322705e-03
## 62               9.698581e-03
## 63               1.007635e-02
## 64               1.045925e-02
## 65               1.084506e-02
## 66               1.124537e-02
## 67               1.164692e-02
## 68               1.207793e-02
## 69               1.252441e-02
## 70               1.298635e-02
## 71               1.347032e-02
## 72               1.396840e-02
## 73               1.448351e-02
## 74               1.503602e-02
## 75               1.561246e-02
## 76               1.619143e-02
## 77               1.681921e-02
## 78               1.745384e-02
## 79               1.810602e-02
## 80               1.879433e-02
## 81               1.949776e-02
## 82               2.020834e-02
## 83               2.092799e-02
## 84               2.166334e-02
## 85               2.249689e-02
## 86               2.335233e-02
## 87               2.425291e-02
## 88               2.518923e-02
## 89               2.617394e-02
## 90               2.719619e-02
## 91               2.828617e-02
## 92               2.939972e-02
## 93               3.054202e-02
## 94               3.183109e-02
## 95               3.322581e-02
## 96               3.462130e-02
## 97               3.605945e-02
## 98               3.760166e-02
## 99               3.915466e-02
## 100              4.074437e-02
## 101              4.235947e-02
## 102              4.398407e-02
## 103              4.565957e-02
## 104              4.735058e-02
## 105              4.907900e-02
## 106              5.093818e-02
## 107              5.280174e-02
## 108              5.476164e-02
## 109              5.689277e-02
## 110              5.913494e-02
## 111              6.143838e-02
## 112              6.381573e-02
## 113              6.619559e-02
## 114              6.880365e-02
## 115              7.147605e-02
## 116              7.417914e-02
## 117              7.712256e-02
## 118              8.020556e-02
## 119              8.338703e-02
## 120              8.657261e-02
## 121              8.996571e-02
## 122              9.336229e-02
## 123              9.687091e-02
## 124              1.006654e-01
## 125              1.046302e-01
## 126              1.086361e-01
## 127              1.127388e-01
## 128              1.168499e-01
## 129              1.214224e-01
## 130              1.265138e-01
## 131              1.316653e-01
## 132              1.370164e-01
## 133              1.424862e-01
## 134              1.481584e-01
## 135              1.538626e-01
## 136              1.595904e-01
## 137              1.663098e-01
## 138              1.731669e-01
## 139              1.802405e-01
## 140              1.873349e-01
## 141              1.951155e-01
## 142              2.030652e-01
## 143              2.113428e-01
## 144              2.201663e-01
## 145              2.289986e-01
## 146              2.380065e-01
## 147              2.472145e-01
## 148              2.566535e-01
## 149              2.662012e-01
## 150              2.759606e-01
## 151              2.857821e-01
## 152              2.957155e-01
## 153              3.058193e-01
## 154              3.159342e-01
## 155              3.260979e-01
## 156              3.369070e-01
## 157              3.481023e-01
## 158              3.593301e-01
## 159              3.707260e-01
## 160              3.826400e-01
## 161              3.952412e-01
## 162              4.085524e-01
## 163              4.221795e-01
## 164              4.360028e-01
## 165              4.498887e-01
## 166              4.640635e-01
## 167              4.788557e-01
## 168              4.946694e-01
## 169              5.110101e-01
## 170              5.274238e-01
## 171              5.441956e-01
## 172              5.621851e-01
## 173              5.815308e-01
## 174              6.021479e-01
## 175              6.241761e-01
## 176              6.468188e-01
## 177              6.699508e-01
## 178              6.932082e-01
## 179              7.165376e-01
## 180              7.414959e-01
## 181              7.669962e-01
## 182              7.933925e-01
## 183              8.211312e-01
## 184              8.498920e-01
## 185              8.815127e-01
## 186              9.135157e-01
## 187              9.559174e-01
## 188              1.000000e+00
# 6316586, or 6.32 Mb

Is this the same value as the N50 reported in the abstract of the paper?

Why do genome biologists prefer N statistics to eCDF metrics, such as median?