timeAverage.Rd
Function to flexibly aggregate or expand data frames by different time periods, calculating vectoraveraged wind direction where appropriate. The averaged periods can also take account of data capture rates.
timeAverage( mydata, avg.time = "day", data.thresh = 0, statistic = "mean", type = "default", percentile = NA, start.date = NA, end.date = NA, interval = NA, vector.ws = FALSE, fill = FALSE, ... )
mydata  A data frame containing a 

avg.time  This defines the time period to average to. Can be
“sec”, “min”, “hour”, “day”,
“DSTday”, “week”, “month”, “quarter”
or “year”. For much increased flexibility a number can
precede these options followed by a space. For example, a
timeAverage of 2 months would be Note that 
data.thresh  The data capture threshold to use (%). A value
of zero means that all available data will be used in a
particular period regardless if of the number of values
available. Conversely, a value of 100 will mean that all data
will need to be present for the average to be calculated, else
it is recorded as 
statistic  The statistic to apply when aggregating the data;
default is the mean. Can be one of “mean”, “max”,
“min”, “median”, “frequency”, “sd”,
“percentile”. Note that “sd” is the standard
deviation, “frequency” is the number (frequency) of valid
records in the period and “data.cap” is the percentage
data capture. “percentile” is the percentile level (%)
between 0100, which can be set using the “percentile”
option  see below. Not used if 
type 

percentile  The percentile level in % used when

start.date  A string giving a start date to use. This is
sometimes useful if a time series starts between obvious
intervals. For example, for a 1minute time series that starts
“20091129 12:07:00” that needs to be averaged up to
15minute means, the intervals would be “20091129
12:07:00”, “20091129 12:22:00” etc. Often, however, it
is better to round down to a more obvious start point e.g.
“20091129 12:00:00” such that the sequence is then
“20091129 12:00:00”, “20091129 12:15:00”
... 
end.date  A string giving an end date to use. This is
sometimes useful to make sure a time series extends to a known
end point and is useful when 
interval  The This option can sometimes be useful with 
vector.ws  Should vector averaging be carried out on wind
speed if available? The default is 
fill  When time series are expanded i.e. when a time
interval is less than the original time series, data are
‘padded out’ with 
...  Additional arguments for other functions calling

Returns a data frame with date in class POSIXct
.
This function calculates time averages for a data frame. It also treats wind direction correctly through vectoraveraging. For example, the average of 350 degrees and 10 degrees is either 0 or 360  not 180. The calculations therefore average the wind components.
When a data capture threshold is set through data.thresh
it
is necessary for timeAverage
to know what the original time
interval of the input time series is. The function will try and
calculate this interval based on the most common time gap (and
will print the assumed time gap to the screen). This works fine
most of the time but there are occasions where it may not e.g.
when very few data exist in a data frame or the data are monthly
(i.e. nonregular time interval between months). In this case the
user can explicitly specify the interval through interval
in the same format as avg.time
e.g. interval =
"month"
. It may also be useful to set start.date
and
end.date
if the time series do not span the entire period
of interest. For example, if a time series ended in October and
annual means are required, setting end.date
to the end of
the year will ensure that the whole period is covered and that
data.thresh
is correctly calculated. The same also goes for
a time series that starts later in the year where
start.date
should be set to the beginning of the year.
timeAverage
should be useful in many circumstances where it
is necessary to work with different time average data. For
example, hourly air pollution data and 15minute meteorological
data. To merge the two data sets timeAverage
can be used to
make the meteorological data 1hour means first. Alternatively,
timeAverage
can be used to expand the hourly data to 15
minute data  see example below.
For the research community timeAverage
should be useful for
dealing with outputs from instruments where there are a range of
time periods used.
It is also very useful for plotting data using
timePlot
. Often the data are too dense to see
patterns and setting different averaging periods easily helps with
interpretation.
See timePlot
that plots time series data
and uses timeAverage
to aggregate data where necessary.
## daily average values daily < timeAverage(mydata, avg.time = "day") ## daily average values ensuring at least 75 % data capture ## i.e. at least 18 valid hours if (FALSE) daily < timeAverage(mydata, avg.time = "day", data.thresh = 75) ## 2weekly averages if (FALSE) fortnight < timeAverage(mydata, avg.time = "2 week") ## make a 15minute time series from an hourly one if (FALSE) { min15 < timeAverage(mydata, avg.time = "15 min", fill = TRUE) } # average by grouping variable if (FALSE) { dat < importAURN(c("kc1", "my1"), year = 2011:2013) timeAverage(dat, avg.time = "year", type = "site") # can also retain site code timeAverage(dat, avg.time = "year", type = c("site", "code")) # or just average all the data, dropping site/code timeAverage(dat, avg.time = "year") }