I programatically confirmed that the data I have stored earlier is equal to the last dataset Shameema sent for check.

Surprisingly, the count by PlotName and PlotCensusNumber is different between the computation by Shameema and mine.

#> # A tibble: 10 x 3
#>    plotname              plotcensusnumber  count
#>    <chr>                            <dbl>  <dbl>
#>  1 Bukit Timah Secondary               1.  6700.
#>  2 Bukit Timah Secondary               2.  6723.
#>  3 Bukit Timah Secondary               3.  7603.
#>  4 Bukit Timah Primary                 1. 13472.
#>  5 Bukit Timah Primary                 2. 14343.
#>  6 Bukit Timah Primary                 3. 15177.
#>  7 Bukit Timah Primary                 4. 16122.
#>  8 Bukit Timah Primary                 5. 18738.
#>  9 Bukit Timah Primary                 6. 18637.
#> 10 Bukit Timah Big Trees               1. 11019.
#> # A tibble: 60 x 3
#> # Groups:   PlotName, PlotCensusNumber [60]
#>    PlotName              PlotCensusNumber     n
#>    <chr>                            <int> <int>
#>  1 Bukit Timah Secondary                1  6700
#>  2 Bukit Timah Secondary                2  6723
#>  3 Bukit Timah Secondary                3  7603
#>  4 Bukit Timah Primary                  1 13461
#>  5 Bukit Timah Primary                  2 14328
#>  6 Bukit Timah Primary                  3 15151
#>  7 Bukit Timah Primary                  4 16091
#>  8 Bukit Timah Primary                  5 18691
#>  9 Bukit Timah Primary                  6 18589
#> 10 Bukit Timah Primary                 NA   178
#> 11 Bukit Timah Big Trees                1 11019
#> 12 41569                               NA     1
#> 13 36565                               NA     2
#> 14 35285                               NA     2
#> 15 33763                               NA     2
#> # ... with 45 more rows

Most importantly, in my data some values of PlotCensusNumber are NA.

#> Observations: 356
#> Variables: 32
#> $ PlotName         <chr> "Bukit Timah Primary", "10849", "Bukit Timah ...
#> $ PlotCensusNumber <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
#> $ Tag              <chr> "C3-22552", NA, "C3-22552", NA, "C3-22552", N...
#> $ DBHID            <int> 4849, NA, 24650, NA, 26259, NA, 27524, NA, 82...
#> $ PlotID           <int> 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, ...
#> $ StemID           <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
#> $ StemNumber       <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
#> $ StemTag          <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
#> $ PrimaryStem      <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
#> $ CensusID         <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
#> $ DBH              <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
#> $ LargeStem        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
#> $ Family           <chr> "Clusiaceae", NA, "Clusiaceae", NA, "Clusiace...
#> $ Genus            <chr> "Calophyllum", "main", "Calophyllum", NA, "Ca...
#> $ SpeciesName      <chr> "ferrugineum", "2", "ferrugineum", "3", "ferr...
#> $ Mnemonic         <chr> "CALOBM", "1", "CALOBM", "2", "CALOBM", "3", ...
#> $ Subspecies       <chr> "NULL", "18", "NULL", "21", "NULL", "19", "NU...
#> $ SpeciesID        <int> 112, NA, 112, NA, 112, NA, 112, NA, 112, NA, ...
#> $ SubspeciesID     <chr> "NULL", "1993-05-04", "NULL", "1995-12-18", "...
#> $ QuadratName      <chr> "C3", "12177", "C3", "13135", "C3", "NULL", "...
#> $ QuadratID        <int> 26, NA, 26, NA, 26, NA, 26, NA, 26, NA, 26, N...
#> $ PX               <dbl> 42.6, 1.0, 42.6, 1.0, 42.6, 1.0, 42.6, 1.0, 4...
#> $ PY               <dbl> 57.5, NA, 57.5, NA, 57.5, NA, 57.5, NA, 57.5,...
#> $ QX               <dbl> 2.6, NA, 2.6, NA, 2.6, NA, 2.6, NA, 2.6, NA, ...
#> $ QY               <dbl> 17.5, NA, 17.5, NA, 17.5, NA, 17.5, NA, 17.5,...
#> $ TreeID           <int> 10849, NA, 10849, NA, 10849, NA, 10849, NA, 1...
#> $ HOM              <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
#> $ ExactDate        <date> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
#> $ Date             <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
#> $ ListOfTSM        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
#> $ HighHOM          <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
#> $ Status           <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...

Here is the problematic data. Here, NAs are represented as empty values.

Any idea what is going on? Should this data be discarded?

Follow up

Share problematic tags

On Tue, Mar 6, 2018 at 4:07 PM, Shameema Jafferjee Esufali shameemaesufali@gmail.com wrote:

Mauro, can you send me a single tag number with this problem

Problematic Tags.

#>  [1] "C3-22552"  NA          "D5-23889"  "F1-25034"  "H5-27859" 
#>  [6] "J1-29089"  "J1-29090"  "1346"      "J1-29151"  "A2-20306" 
#> [11] "7984"      "A4-20692"  "C4-22724"  "D3-23527"  "I4-28608" 
#> [16] "J4-29758"  "A2-20307"  "D5-23817"  "D5-23937"  "E2-24315" 
#> [21] "E2-24316"  "E5-24823"  "G2-26207"  "G5-26926"  "J1-29150" 
#> [26] "J3-29541"  "J4-29678"  "A3-20441"  "A4-20671"  "C1-22166" 
#> [31] "D3-23528"  "I5-28847"  "I1-28102"  "E3-24462"  "C4-22641" 
#> [36] "458NPARKS" "455NPARKS" "456NPARKS" "453NPARKS" "446NPARKS"
#> [41] "445NPARKS" "444NPARKS" "439NPARKS" "437NPARKS" "A5-20899" 
#> [46] "A4-20695"

Get alternative dataset but it has unexpected names

On Tue, Mar 6, 2018 at 6:15 PM, Shameema Jafferjee Esufali shameemaesufali@gmail.com wrote:

Mauro, I ran an export with NA fill in null values and got something that works in R. Please find attached.

save(ViewNA,file="/home/asiaplots/CTFSRPackage/bukittimah/ViewNA.rdata")
table(ViewNA$Plot,ViewNA$PlotCensusNumber)

New data had different names:

On Sat, Mar 10, 2018 at 3:36 AM, Mauro Lepore maurolepore@gmail.com wrote: Hi Shameema,

Compared to other ViewFullTables, the dataset you attached (ViewNA.rdata) has different column names. Can you fix that and resend?

Get new data. Names are fixed. Check if data is correct.

https://goo.gl/GzQv13

From: Shameema Jafferjee Esufali shameemaesufali@gmail.com Date: Fri, Mar 9, 2018 at 7:55 PM

I attach the csv file and corresponding rdata file

Great! Now the data has the expected column names.

#> character(0)

And the count of observations per PlotCensusNuber equals what Shameema computed.

Shemeema’s computation.

#> # A tibble: 10 x 3
#>    plotname              plotcensusnumber  count
#>    <chr>                            <dbl>  <dbl>
#>  1 Bukit Timah Secondary               1.  6700.
#>  2 Bukit Timah Secondary               2.  6723.
#>  3 Bukit Timah Secondary               3.  7603.
#>  4 Bukit Timah Primary                 1. 13472.
#>  5 Bukit Timah Primary                 2. 14343.
#>  6 Bukit Timah Primary                 3. 15177.
#>  7 Bukit Timah Primary                 4. 16122.
#>  8 Bukit Timah Primary                 5. 18738.
#>  9 Bukit Timah Primary                 6. 18637.
#> 10 Bukit Timah Big Trees               1. 11019.

My computation (same)

#> # A tibble: 10 x 3
#> # Groups:   PlotName, PlotCensusNumber [10]
#>    PlotName              PlotCensusNumber     n
#>    <chr>                            <int> <int>
#>  1 Bukit Timah Secondary                1  6700
#>  2 Bukit Timah Secondary                2  6723
#>  3 Bukit Timah Secondary                3  7603
#>  4 Bukit Timah Primary                  1 13472
#>  5 Bukit Timah Primary                  2 14343
#>  6 Bukit Timah Primary                  3 15177
#>  7 Bukit Timah Primary                  4 16122
#>  8 Bukit Timah Primary                  5 18738
#>  9 Bukit Timah Primary                  6 18637
#> 10 Bukit Timah Big Trees                1 11019