6 fileIndex

If you find the fileIndex to be outdated (“download.file errors: […] cannot open URL”), run
rdwd::updateRdwd()
If the issue persists, please let me know and I will update it.
Meanwhile, either use current=TRUE in selectDWD:

# all files at a given path, with current file index (RCurl required):
links <- selectDWD(res="monthly", var="more_precip", per="hist", current=TRUE)

or, for repetitive usage, create your own file index (for a certain subfolder):

# recursively list files on the FTP-server:
files <- indexFTP("hourly/sun") # use dir="some_path" to save the output elsewhere
berryFunctions::headtail(files, 5, na=TRUE)
# create and use a personal file index:
cursun <- createIndex(files)
head(cursun)
sunlink <- selectDWD("Potsdam", res="hourly", var="sun", per="r", findex=cursun)

6.1 background

indexFTP recursively lists all the files on an FTP-server (using RCurl::getURL).
From those paths, createIndex generates fileIndex and gridIndex. It also downloads all (ca 84) description files with metadata and creates metaIndex and geoIndex.
All indexes are updated irregularly with the internal function updateIndexes.

selectDWD helps to query the fileIndex.

The DWD often (but irregularly) updates or expands datasets, at which point the filenames in historical folders change.
I check this with checkUpdates, fairly often around April-May when the historical files of active stations are updated with last year’s data.

indexFTP can also access other servers:

funet <- indexFTP(base="ftp.funet.fi/pub/standards/w3/TR/xhtml11/", folder="")
p <- RCurl::getURL(    "ftp.funet.fi/pub/standards/w3/TR/xhtml11/",
                       verbose=TRUE, ftp.use.epsv=TRUE, dirlistonly=TRUE)

6.2 metaIndex

selectDWD also uses a complete data.frame with meta information, metaIndex (derived from the “Beschreibung” files in fileIndex).

# All metadata at all folders:
data(metaIndex)
str(metaIndex, vec.len=2)
## 'data.frame':    160869 obs. of  13 variables:
##  $ Stations_id  : int  1 1 1 1 1 ...
##  $ von_datum    : Date, format: "1891-01-01" "1891-01-01" ...
##  $ bis_datum    : Date, format: "1986-06-30" "1986-06-30" ...
##  $ Stationshoehe: int  478 478 478 478 478 ...
##  $ geoBreite    : num  47.8 47.8 ...
##  $ geoLaenge    : num  8.85 8.85 ...
##  $ Stationsname : chr  "Aach" "Aach" ...
##  $ Bundesland   : chr  "Baden-Wuerttemberg" "Baden-Wuerttemberg" ...
##  $ Abgabe       : chr  "Frei" "Frei" ...
##  $ res          : chr  "annual" "annual" ...
##  $ var          : chr  "more_precip" "more_precip" ...
##  $ per          : chr  "historical" "recent" ...
##  $ hasfile      : logi  TRUE FALSE TRUE ...
View(data.frame(sort(unique(rdwd:::metaIndex$Stationsname)))) # ca 6k entries

readDWD can correctly read such a data.frame from any folder on the FTP server:

# file with station metadata for a given path:
m_link <- selectDWD(res="monthly", var="more_precip", per="hist", meta=TRUE)
m_link <- grep(".txt$", m_link, value=TRUE)
print_short(m_link) # (Monatswerte = monthly values, Beschreibung = description)
## [1] "---/monthly/more_precip/historical/RR_Monatswerte_Beschreibung_Stationen.txt"
meta_monthly_rain <- dataDWD(m_link)
str(meta_monthly_rain)
## 'data.frame':    6470 obs. of  9 variables:
##  $ Stations_id  : int  1 2 3 4 6 7 8 9 10 12 ...
##  $ von_datum    : int  18910101 19140101 18440101 18980101 19821101 19360101 19090101 19920601 19380101 19310101 ...
##  $ bis_datum    : int  19860630 20061231 20110331 19791031 20250430 19960131 19911231 20101231 20050831 20061231 ...
##  $ Stationshoehe: int  478 138 202 243 455 473 307 200 378 61 ...
##  $ geoBreite    : num  47.8 50.8 50.8 50.8 48.8 ...
##  $ geoLaenge    : num  8.85 6.1 6.09 6.12 10.06 ...
##  $ Stationsname : chr  "Aach" "Aachen (Kläranlage)" "Aachen" "Aachen-Brand" ...
##  $ Bundesland   : chr  "Baden-Württemberg" "Nordrhein-Westfalen" "Nordrhein-Westfalen" "Nordrhein-Westfalen" ...
##  $ Abgabe       : chr  "Frei" "Frei" "Frei" "Frei" ...

Meta files may list stations for which there are actually no files. These refer to nonpublic datasets (The DWD cannot publish all datasets because of copyright restrictions). To request those, please contact or .

The from and to dates do not always reflect the real time period with avalaible data. They are read from the DWD _Beschreibung_Stationen.txt files, e.g. this one, see also issue 32.

For up-to-date metaIndexes, check for updates in the development version (rdwd::updateRdwd()), prompt me to update it, or use your own version:

ll <- selectDWD("", c("hourly","daily"), c("wind","kl"), "r", meta=TRUE)
ll <- grep(".txt$", ll, value=TRUE)
ll <- ll[!grepl("mn4",ll)]
ll <- sub(dwdbase, "", ll)
ll
## [1] "/daily/kl/recent/KL_Tageswerte_Beschreibung_Stationen.txt"     
## [2] "/hourly/wind/recent/FF_Stundenwerte_Beschreibung_Stationen.txt"
ind <- createIndex(ll, dir=tempdir(), meta=TRUE, checkwarn=FALSE)
ind$metaIndex$hasfile <- TRUE
metaInfo(3987, mindex=ind$metaIndex)
## rdwd station id 3987 with 2 files.
## Name: Potsdam, State: Brandenburg
## For up-to-date info, see https://bookdown.org/brry/rdwd/fileindex.html#metaindex
##      res  var    per hasfile       from         to     lat    long ele
## 1  daily   kl recent    TRUE 1893-01-01 2025-05-22 52.3812 13.0622  81
## 2 hourly wind recent    TRUE 1893-01-01 2025-05-22 52.3812 13.0622  81