6 fileIndex

indexFTP recursively lists all the files on an FTP-server (using RCurl::getURL).
From those paths, createIndex generates fileIndex and gridIndex. It also downloads all (ca 84) description files with metadata and creates metaIndex and geoIndex.
All indexes are updated irregularly with the internal function updateIndexes.

selectDWD helps to query the fileIndex.

The DWD often (but irregularly) updates or expands datasets, at which point the filenames in historical folders change.

If you find the index to be outdated (“download.file errors: […] cannot open URL”), check in the latest commits if the github development version already has an updated index (install with rdwd::updateRdwd()).
If not, please let me know and I will update it.

Meanwhile, either use current=TRUE in selectDWD:

# all files at a given path, with current file index (RCurl required):
links <- selectDWD(res="monthly", var="more_precip", per="hist", current=TRUE)

or, for repetitive usage, create your own file index (for a certain subfolder):

# recursively list files on the FTP-server:
files <- indexFTP("hourly/sun") # use dir="some_path" to save the output elsewhere
berryFunctions::headtail(files, 5, na=TRUE)
# create and use a personal file index:
cursun <- createIndex(files)
head(cursun)
sunlink <- selectDWD("Potsdam", res="hourly", var="sun", per="r", findex=cursun)


# with other FTP servers, this should also work...
funet <- indexFTP(base="ftp.funet.fi/pub/standards/w3/TR/xhtml11/", folder="")
p <- RCurl::getURL(    "ftp.funet.fi/pub/standards/w3/TR/xhtml11/",
                       verbose=T, ftp.use.epsv=TRUE, dirlistonly=TRUE)

6.1 metaIndex

selectDWD also uses a complete data.frame with meta information, metaIndex (derived from the “Beschreibung” files in fileIndex).

# All metadata at all folders:
data(metaIndex)
str(metaIndex, vec.len=2)
## 'data.frame':    149238 obs. of  12 variables:
##  $ Stations_id  : int  1 1 1 1 1 ...
##  $ von_datum    : Date, format: "1891-01-01" "1891-01-01" ...
##  $ bis_datum    : Date, format: "1986-06-30" "1986-06-30" ...
##  $ Stationshoehe: int  478 478 478 478 478 ...
##  $ geoBreite    : num  47.8 47.8 ...
##  $ geoLaenge    : num  8.85 8.85 ...
##  $ Stationsname : chr  "Aach" "Aach" ...
##  $ Bundesland   : chr  "Baden-Wuerttemberg" "Baden-Wuerttemberg" ...
##  $ res          : chr  "annual" "annual" ...
##  $ var          : chr  "more_precip" "more_precip" ...
##  $ per          : chr  "historical" "recent" ...
##  $ hasfile      : logi  TRUE FALSE TRUE ...
View(data.frame(sort(unique(rdwd:::metaIndex$Stationsname)))) # ca 6k entries

readDWD can correctly read such a data.frame from any folder on the FTP server:

# file with station metadata for a given path:
m_link <- selectDWD(res="monthly", var="more_precip", per="hist", meta=TRUE)
m_link <- grep(".txt$", m_link, value=TRUE)
print_short(m_link) # (Monatswerte = monthly values, Beschreibung = description)
## [1] "---/monthly/more_precip/historical/RR_Monatswerte_Beschreibung_Stationen.txt"
meta_monthly_rain <- dataDWD(m_link)
str(meta_monthly_rain)
## 'data.frame':    5603 obs. of  8 variables:
##  $ Stations_id  : int  1 2 3 4 6 7 8 9 10 12 ...
##  $ von_datum    : int  18910101 19140101 18440101 18980101 19821101 19360101 19090101 19920601 19380101 19310101 ...
##  $ bis_datum    : int  19860630 20061231 20110331 19791031 20220430 19960131 19911231 20101231 20050831 20061231 ...
##  $ Stationshoehe: int  478 138 202 243 455 473 307 200 378 61 ...
##  $ geoBreite    : num  47.8 50.8 50.8 50.8 48.8 ...
##  $ geoLaenge    : num  8.85 6.1 6.09 6.12 10.06 ...
##  $ Stationsname : chr  "Aach" "Aachen (Kläranlage)" "Aachen" "Aachen-Brand" ...
##  $ Bundesland   : chr  "Baden-Württemberg" "Nordrhein-Westfalen" "Nordrhein-Westfalen" "Nordrhein-Westfalen" ...

Meta files may list stations for which there are actually no files. These refer to nonpublic datasets (The DWD cannot publish all datasets because of copyright restrictions). To request those, please contact or .

The from and to dates do not always reflect the real time period with avalaible data. They are read from the DWD _Beschreibung_Stationen.txt files, e.g. this one, see also issue 32.

For up-to-date metaIndexes, check for updates in the development version (rdwd::updateRdwd()), prompt me to update it, or use your own version:

ll <- selectDWD("", c("hourly","daily"), c("wind","kl"), "r", meta=TRUE)
ll <- grep(".txt$", ll, value=TRUE)
ll <- ll[!grepl("mn4",ll)]
ll <- sub(dwdbase, "", ll)
ll
## [1] "/daily/kl/recent/KL_Tageswerte_Beschreibung_Stationen.txt"     
## [2] "/hourly/wind/recent/FF_Stundenwerte_Beschreibung_Stationen.txt"
ind <- createIndex(ll, dir=tempdir(), meta=TRUE, checkwarn=FALSE)
ind$metaIndex$hasfile <- TRUE
metaInfo(3987, mindex=ind$metaIndex)
## rdwd station id 3987 with 2 files.
## Name: Potsdam, State: Brandenburg
## For up-to-date info, see https://bookdown.org/brry/rdwd/fileindex.html#metaindex
##      res  var    per hasfile       from         to     lat    long ele
## 1  daily   kl recent    TRUE 1893-01-01 2024-05-12 52.3812 13.0622  81
## 2 hourly wind recent    TRUE 1893-01-01 2024-05-12 52.3812 13.0622  81