6 fileIndex
indexFTP
recursively lists all the files on an FTP-server (using RCurl::getURL
).
From those paths, createIndex
generates fileIndex
and gridIndex
.
It also downloads all (ca 84) description files with metadata and creates metaIndex
and geoIndex
.
All indexes are updated irregularly with the internal function updateIndexes
.
selectDWD
helps to query the fileIndex
.
The DWD often (but irregularly) updates or expands datasets, at which point the filenames in historical folders change.
If you find the index to be outdated (“download.file errors: […] cannot open URL”),
check in the latest commits
if the github development version already has an updated index (install with rdwd::updateRdwd()
).
If not, please let me know and I will update it.
Meanwhile, either use current=TRUE in selectDWD
:
# all files at a given path, with current file index (RCurl required):
links <- selectDWD(res="monthly", var="more_precip", per="hist", current=TRUE)
or, for repetitive usage, create your own file index (for a certain subfolder):
# recursively list files on the FTP-server:
files <- indexFTP("hourly/sun") # use dir="some_path" to save the output elsewhere
berryFunctions::headtail(files, 5, na=TRUE)
# create and use a personal file index:
cursun <- createIndex(files)
head(cursun)
sunlink <- selectDWD("Potsdam", res="hourly", var="sun", per="r", findex=cursun)
# with other FTP servers, this should also work...
funet <- indexFTP(base="ftp.funet.fi/pub/standards/w3/TR/xhtml11/", folder="")
p <- RCurl::getURL( "ftp.funet.fi/pub/standards/w3/TR/xhtml11/",
verbose=T, ftp.use.epsv=TRUE, dirlistonly=TRUE)
6.1 metaIndex
selectDWD
also uses a complete data.frame with meta information,
metaIndex
(derived from the “Beschreibung” files in fileIndex
).
## 'data.frame': 149238 obs. of 12 variables:
## $ Stations_id : int 1 1 1 1 1 ...
## $ von_datum : Date, format: "1891-01-01" "1891-01-01" ...
## $ bis_datum : Date, format: "1986-06-30" "1986-06-30" ...
## $ Stationshoehe: int 478 478 478 478 478 ...
## $ geoBreite : num 47.8 47.8 ...
## $ geoLaenge : num 8.85 8.85 ...
## $ Stationsname : chr "Aach" "Aach" ...
## $ Bundesland : chr "Baden-Wuerttemberg" "Baden-Wuerttemberg" ...
## $ res : chr "annual" "annual" ...
## $ var : chr "more_precip" "more_precip" ...
## $ per : chr "historical" "recent" ...
## $ hasfile : logi TRUE FALSE TRUE ...
readDWD
can correctly read such a data.frame from any folder on the FTP server:
# file with station metadata for a given path:
m_link <- selectDWD(res="monthly", var="more_precip", per="hist", meta=TRUE)
m_link <- grep(".txt$", m_link, value=TRUE)
print_short(m_link) # (Monatswerte = monthly values, Beschreibung = description)
## [1] "---/monthly/more_precip/historical/RR_Monatswerte_Beschreibung_Stationen.txt"
## 'data.frame': 5689 obs. of 8 variables:
## $ Stations_id : int 1 2 3 4 6 7 8 9 10 12 ...
## $ von_datum : int 18910101 19140101 18440101 18980101 19821101 19360101 19090101 19920601 19380101 19310101 ...
## $ bis_datum : int 19860630 20061231 20110331 19791031 20230331 19960131 19911231 20101231 20050831 20061231 ...
## $ Stationshoehe: int 478 138 202 243 455 473 307 200 378 61 ...
## $ geoBreite : num 47.8 50.8 50.8 50.8 48.8 ...
## $ geoLaenge : num 8.85 6.1 6.09 6.12 10.06 ...
## $ Stationsname : chr "Aach" "Aachen (Kläranlage)" "Aachen" "Aachen-Brand" ...
## $ Bundesland : chr "Baden-Württemberg" "Nordrhein-Westfalen" "Nordrhein-Westfalen" "Nordrhein-Westfalen" ...
Meta files may list stations for which there are actually no files. These refer to nonpublic datasets (The DWD cannot publish all datasets because of copyright restrictions). To request those, please contact cdc.daten@dwd.de or klima.vertrieb@dwd.de.
The from and to dates do not always reflect the real time period with avalaible data.
They are read from the DWD _Beschreibung_Stationen.txt
files, e.g. this one, see also issue 32.
For up-to-date metaIndexes, check for updates in the development version (rdwd::updateRdwd()
), prompt me to update it, or use your own version:
ll <- selectDWD("", c("hourly","daily"), c("wind","kl"), "r", meta=TRUE)
ll <- grep(".txt$", ll, value=TRUE)
ll <- ll[!grepl("mn4",ll)]
ll <- sub(dwdbase, "", ll)
ll
## [1] "/daily/kl/recent/KL_Tageswerte_Beschreibung_Stationen.txt"
## [2] "/hourly/wind/recent/FF_Stundenwerte_Beschreibung_Stationen.txt"
ind <- createIndex(ll, dir=tempdir(), meta=TRUE, checkwarn=FALSE)
ind$metaIndex$hasfile <- TRUE
metaInfo(3987, mindex=ind$metaIndex)
## rdwd station id 3987 with 2 files.
## Name: Potsdam, State: Brandenburg
## For up-to-date info, see https://bookdown.org/brry/rdwd/fileindex.html#metaindex
## res var per hasfile from to lat long ele
## 1 daily kl recent TRUE 1893-01-01 2024-07-25 52.3812 13.0622 81
## 2 hourly wind recent TRUE 1893-01-01 2024-07-25 52.3812 13.0622 81