Experience Metric Space Module (part of the Interaction History Architecture). More...
Data Structures | |
| class | iCub::iha::BinWindowMaxEntropy |
| Calculate the bin number of a data value using bin boundaries that maximize the entropy over the previous window of data values. More... | |
| class | iCub::iha::DataFrame |
| Class to hold one frame of data from the sensors. More... | |
| class | iCub::iha::DistanceSpaceClass |
| Distance Space Class holds a single distance space and local processing. More... | |
| class | iCub::iha::Experience |
| A single Experience. More... | |
| class | iCub::iha::ExperienceProcessor |
| Experience Processor. More... | |
| class | iCub::iha::Limits |
| Sensor Limits for calculating the normalized value of a sensor. More... | |
| class | iCub::iha::WindowIDCalc |
| Calculate the Information Distance between two data streams over a given horizon. More... | |
Experience Metric Space Module (part of the Interaction History Architecture).
The metric space is constructed continuously as the robot experiences its environment.
timesteps, and consists of Horizon
timesteps counting back from the current timestep.
the experiences will overlap.
evenly-sized bins.A quality value is assigned to the quantized experience, determined by factors such as environmental reward/punishment, internal drive and affective state.
Finally, the last action executed during the experience is also noted and stored with the quantized experience.
Thus the metric space of experience in the Interaction History Architecture, the interaction history space, can be described by the tuple
, where
is a collection of quantized "experiences",
is the a matrix of distances between elements of
,
is a vector of quality values and
a vector of actions.
Say one is interested in finding all nearest neighbours of an experience
within a "ball" of radius
, then the triangle inequality can be employed to reduce the number of distances that need to be measured. Specifically:
that is distance
from
, then any neighbours of
that are further away than
are not within distance
of
. {theorem}(for proof see Mirza:Thesis)
This fact can be used discard experiences from consideration when finding nearest neighbours within a specified radius. Of course, this requires first finding an experience with radius
of the new experience. One approach to this problem is to simply randomly sample the experience space until one is found. Other strategies exist, for example: using the continuous nature of the environment to start the search for near experiences (in terms of information distance) with those experiences near in terms of time.
--metricSpaceHeuristic switch and associated parametersThe merging strategy in the Interaction History Architecture is to merge any two experiences closer than a threshold
(parameter --mergeThreshold).
When two experiences are merged, the resulting experience takes the experience number of the oldest experience and that experience's data frame (i.e. in terms of sensory data and comparison to other experiences it is identical to that experience). It's "mass" is the sum of the two experience masses and the action frequencies are also merged.
can be fixed or can be adapted. This is controlled by the --mergeAdaptType switch and the --mergeIncrement parameter.--mergeAdaptType NONE No adaptation of threshold: always merge experience closer than --mergeThreshold --mergeAdaptType CYCLE_TIME Increase/decrease the merge threshold to maintain a given cycle time set in --mergeCycleTimeThreshold --mergeAdaptType NUM_COMPARISONS Increase/decrease the merge threshold to maintain a given number of comparisons set in --mergeExpThreshold --onlyMergeSameActions will restrict merging to experiences having the same actionExperiences may also be deleted, that is, forgotten. This serves two particular purposes in the present architecture.
The deletion is controlled using the two parameters --purgeExpSwitch and --purgeExpThreshold
Each experience in the interaction history space is associated with a quality value
. This value has bearing on the selection of the experience, and in turn on the action-selection process. The quality value is intended to reflect how useful the experience is in terms of positive or negative environmental feedback, and is derived directly from the internal reward function or an external reward measured by the robot's sensors.
In the simplest case, the immediate (instantaneous) reward received from the environment is associated with the current experience. An alternative scheme is for the quality associated with an experience to be dependent not only on the current reward, but also on the future reward.
The future reward for an experience
for some given horizon
(set by the --futureHorizon parameter) is a function
on all reward values received for
timesteps after time
. Of course, this value cannot be known completely until at least
timesteps have passed, but it is estimated until that point.
Two functions have been used in the implementation (Controlled by the --futureValueUpdateType switch:
--futureValueUpdateType=MAX,
, returns the most proximal maximum or minimum reward.--futureValueUpdateType=BIASED,
simply returns the maximum reward over the horizon.--------------------------------------------------------------------------- --dbg [INT] : debug printing level --name [STR] : process name for ports --file [STR] : config file --save [STR] : Save data to file at end of process --load [STR] : Load complete data store from file before starting --------------------------------------------------------------------------- --dsnumber [INT] : Number of this data store --horizon [INT] : Horizon length --num_bins [INT] : number of bins for quantizing data --granularity [INT] : number of timesteps between experiences --------------------------------------------------------------------------- --connect_to_sensors [STR] : connect to specified port for sensors --------------------------------------------------------------------------- --experience_action_gap [INT] : delay to associated action --regular_experiences [T/F] : create exp at regular timestep --action_experiences [T/F] : create exp when action changes --value_experiences [T/F] : create exp when feedback reward value changes --numActions [INT] : number of actions defined --writeCurrDistToPortFlag [T/F] : Write current distance to port --writeMaxDSPNeighbours [INT] : Max number of neighbours to write --writeMaxDSPRadius [FLT] : Max radius of experiences that are written in neighbour list --neighbourRadius [FLT] : Radius for experience to be considered as a neighbour --mergeAdaptType [STR] : Type of adaptation: NONE, CYCLE_TIME or NUM_COMPARISONS --mergeThreshold [FLT] : merge experiences closer than this threshold distance --mergeIncrement [FLT] : the amount that the threshold is inc/dec when adapting --mergeExpThreshold [INT] : for mergeAdaptType=NUM_COMPARISONS this is threshold --mergeCycleTimeThreshold [INT] : for mergeAdaptType=CYCLE_TIME Merge experiences if cycle time exceeds this --onlyMergeSameActions [T/F] : Only merge experiences associated with the same actions --purgeExpSwitch [T/F] : Carry out purging --purgeExpThreshold [FLT] : Purge experiences with reward value less than this --adaptiveBinning [T/F] : Use Adaptive Binning --adaptiveBinningWindowSize [INT] : Adaptive Binning: Window size --histogramResolution [INT] : Adaptive Binning: Histogram resolution --futureHorizon [INT] : Horizon over which Reward feeds back --futureValueUpdateType [STR] : MAX (feedback Max horizon) or BIASED (Feeback Lowest or Highest) --metricSpaceHeuristic [STR] : Heuristic used to speed up. NONE, TREE or TREENEIGHBOUR --verifyHeuristic [T/F] : Output check data for Heuristic --heuristicStartThreshold [INT] : Parameter for TREE Heuristic --heuristicTreeRadius [FLT] : Parameter for TREE Heuristic --------------------------------------------------------------------------
conf/ihaExperienceMetricSpace.ini
Sample INI file:
# The data store number dsnumber 1 ############################################################ # Actions # num_actions 21 # ############################################################ ############################################################ # section has the names of the sensors and the hi/lo range # to allow binning calculations # remember to put reward and action as last two items SENSORS HEAD_PITCH HEAD_YAW HEAD_PAN EYES_UD EYES_RL EYES_CD LSH_ROT LSH_ELV LSH_TWST LELB_FLX LELB_TWST LWR_ABD LWR_FLX LDIG_1 LDIG_2 LDIG_3 LDIG_4 LDIG_5 LDIG_6 LDIG_7 LDIG_8 LDIG_9 RSH_ROT RSH_ELV RSH_TWST RELB_FLX RELB_TWST RWR_ABD RWR_FLX RDIG_1 RDIG_2 RDIG_3 RDIG_4 RDIG_5 RDIG_6 RDIG_7 RDIG_8 RDIG_9 FACE SOUNDS ACTION REWARD # 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 LIMIT_HI 30 60 55 15 52 45 90 161 100 106 90 10 40 30 105 90 90 90 90 90 90 120 90 161 100 106 90 10 40 30 105 90 90 90 90 90 90 120 1 1 21 1 LIMIT_LO -40 -70 -55 -35 -50 -1 -96 -5 -39 -2 -90 -90 -20 -10 -15 -2 -2 -2 -2 -2 -2 -2 -96 -5 -39 -2 -90 -90 -20 -10 -15 -2 -2 -2 -2 -2 -2 -2 0 0 0 0 reward_index 41 action_index 40 num_image_sensors_x 4 num_image_sensors_y 4 # ########################################################### ############################################################ # Main configuration settings for experience creation numBins 5 granularity 2 HORIZONS 20 experience_action_gap 2 # ############################################################ ############################################################ # Updating Future Value # can be MAX which always gives max value # or BIASED which gives 0 or MAX future_value_update_type MAX future_horizon 40 # ############################################################ ############################################################ # Writing of metric space or neighbour list # just send current neighbours write_curr_dist_to_port TRUE # send the whole space (I have never used this it is silly) write_ms_to_port FALSE # These set how many neighbours are sent in a neighbour list # You can choose to limit number directly or the maximum # radius or both write_max_dsp_neighbours 6 write_max_dsp_radius 3.0 # This setting gives a max radius for the neighbour list for # the purposes of working out relative probablities in the # experience selection neighbour_radius 3.0 # ############################################################ ############################################################ # TEMPERATURE # start temp #temperature 3.0 temperature 4.0 # cooling (use instead of adapt_temp) # 0 = no cooling temp_dec 0.01 # set to adapt temp according to reward adapt_temp FALSE # adaptation rates and limits # so far only used in the pioneer code adapt_rate_inc 0.4 adapt_rate_dec 0.2 # if there is a low reward, the minimum temperature is this min_temp_lo_reward 1.0 # if there is a high reward, the maximum temperature is this max_temp_hi_reward 0.2 max_temperature 3.0 # ############################################################ ########################################################### # Merging # merge_threshold 0.20 only_merge_same_actions FALSE #merge_threshold 1.60 #only_merge_same_actions FALSE # NONE, CYCLE_TIME, NUM_COMPARISIONS merge_adapt_type NONE merge_increment 0.01 merge_exp_threshold 300 merge_cycle_time_threshold 600 # ########################################################### ########################################################### # Forgetting # # purge experiences based on value purge_experiences TRUE purge_threshold 0.8 ########################################################### # Adaptive Binning # adaptive_binning FALSE adaptive_binning_window_size 96 histogram_resolution 256 # ########################################################### ########################################################### # Metric Space calculation speed-up # # Types available : NONE TREE TREENEIGHBOUR metric_space_heuristic NONE heuristic_start_threshold 20 heuristic_tree_radius 1.5 verify_heuristic FALSE # ########################################################### ############################################################ # Experience selection feedback # controls modification of experience space depending on # reward feedback as a result of action taken from the # selected experience # # Currently not implemented due to delayed reward updating # Instead merging plays a role in this # experience_feedback TRUE ef_value_increment 0.1 ef_value_decrement 0.1 # ############################################################
Linux
ihaExperienceMetricSpace --name /iha/ds --file conf/ihaExperienceMetricSpace.ini --connect_to_sensors /iha/sm/sensor:out --dbg 30
See script $ICUB_ROOT/app/iha_manual/iha_datastore.sh
Copyright (C) 2008 RobotCub Consortium
CopyPolicy: Released under the terms of the GNU GPL v2.0.
This file can be edited at src/interactionHistory/experience_metric_space/src/ExperienceMetricSpaceModule.cpp.
1.7.1