Data Structures

Experience Metric Space Module (IHA)
[Interaction History Architecture (IHA) Modules]

Experience Metric Space Module (part of the Interaction History Architecture). More...

Data Structures

class  iCub::iha::BinWindowMaxEntropy
 Calculate the bin number of a data value using bin boundaries that maximize the entropy over the previous window of data values. More...
class  iCub::iha::DataFrame
 Class to hold one frame of data from the sensors. More...
class  iCub::iha::DistanceSpaceClass
 Distance Space Class holds a single distance space and local processing. More...
class  iCub::iha::Experience
 A single Experience. More...
class  iCub::iha::ExperienceProcessor
 Experience Processor. More...
class  iCub::iha::Limits
 Sensor Limits for calculating the normalized value of a sensor. More...
class  iCub::iha::WindowIDCalc
 Calculate the Information Distance between two data streams over a given horizon. More...

Detailed Description

Experience Metric Space Module (part of the Interaction History Architecture).

Metric Spaces of Experience

The metric space is constructed continuously as the robot experiences its environment.

A quality value is assigned to the quantized experience, determined by factors such as environmental reward/punishment, internal drive and affective state.

Finally, the last action executed during the experience is also noted and stored with the quantized experience.

Thus the metric space of experience in the Interaction History Architecture, the interaction history space, can be described by the tuple $(\epsilon, D, q, a)$, where $\epsilon$ is a collection of quantized "experiences", $D$ is the a matrix of distances between elements of $\epsilon$, $q$ is a vector of quality values and $ a $ a vector of actions.

Nearest Neighbours

Say one is interested in finding all nearest neighbours of an experience $E^{new}$ within a "ball" of radius $r$, then the triangle inequality can be employed to reduce the number of distances that need to be measured. Specifically:

(for proof see Mirza:Thesis)

This fact can be used discard experiences from consideration when finding nearest neighbours within a specified radius. Of course, this requires first finding an experience with radius $r$ of the new experience. One approach to this problem is to simply randomly sample the experience space until one is found. Other strategies exist, for example: using the continuous nature of the environment to start the search for near experiences (in terms of information distance) with those experiences near in terms of time.

Clustering Experiences

The merging strategy in the Interaction History Architecture is to merge any two experiences closer than a threshold $T_{merge}$ (parameter --mergeThreshold).

When two experiences are merged, the resulting experience takes the experience number of the oldest experience and that experience's data frame (i.e. in terms of sensory data and comparison to other experiences it is identical to that experience). It's "mass" is the sum of the two experience masses and the action frequencies are also merged.

Forgetting Experiences

Experiences may also be deleted, that is, forgotten. This serves two particular purposes in the present architecture.

The deletion is controlled using the two parameters --purgeExpSwitch and --purgeExpThreshold

Update of Environmental Reward

Each experience in the interaction history space is associated with a quality value $q$. This value has bearing on the selection of the experience, and in turn on the action-selection process. The quality value is intended to reflect how useful the experience is in terms of positive or negative environmental feedback, and is derived directly from the internal reward function or an external reward measured by the robot's sensors.

In the simplest case, the immediate (instantaneous) reward received from the environment is associated with the current experience. An alternative scheme is for the quality associated with an experience to be dependent not only on the current reward, but also on the future reward.

The future reward for an experience $E_{t,h}$ for some given horizon $h_{future}$ (set by the --futureHorizon parameter) is a function ${\mathcal F}()$ on all reward values received for $h_{future}$ timesteps after time $t$. Of course, this value cannot be known completely until at least $h_{future}$ timesteps have passed, but it is estimated until that point.

Two functions have been used in the implementation (Controlled by the --futureValueUpdateType switch:

Dependencies

Parameters

---------------------------------------------------------------------------
--dbg [INT]   : debug printing level
--name [STR]  : process name for ports
--file [STR]  : config file
--save [STR] : Save data to file at end of process
--load [STR] : Load complete data store from file  before starting
---------------------------------------------------------------------------
--dsnumber [INT]  : Number of this data store
--horizon [INT]  : Horizon length
--num_bins [INT]  : number of bins for quantizing data
--granularity [INT]  : number of timesteps between experiences
---------------------------------------------------------------------------
--connect_to_sensors [STR]       : connect to specified port for sensors
---------------------------------------------------------------------------
--experience_action_gap [INT]       : delay to associated action
--regular_experiences [T/F]         : create exp at regular timestep
--action_experiences [T/F]          : create exp when action changes
--value_experiences [T/F]           : create exp when feedback reward value changes
--numActions [INT]                  : number of actions defined

--writeCurrDistToPortFlag [T/F]     : Write current distance to port
--writeMaxDSPNeighbours [INT]       : Max number of neighbours to write
--writeMaxDSPRadius [FLT]           : Max radius of experiences that are written in neighbour list
--neighbourRadius [FLT]             : Radius for experience to be considered as a neighbour

--mergeAdaptType [STR]              : Type of adaptation: NONE, CYCLE_TIME or NUM_COMPARISONS
--mergeThreshold [FLT]              : merge experiences closer than this threshold distance
--mergeIncrement [FLT]              : the amount that the threshold is inc/dec when adapting
--mergeExpThreshold [INT]           : for mergeAdaptType=NUM_COMPARISONS this is threshold
--mergeCycleTimeThreshold [INT]     : for mergeAdaptType=CYCLE_TIME Merge experiences if cycle time exceeds this
--onlyMergeSameActions [T/F]        : Only merge experiences associated with the same actions

--purgeExpSwitch [T/F]              : Carry out purging
--purgeExpThreshold [FLT]           : Purge experiences with reward value less than this

--adaptiveBinning [T/F]             : Use Adaptive Binning
--adaptiveBinningWindowSize [INT]   : Adaptive Binning: Window size
--histogramResolution [INT]         : Adaptive Binning: Histogram resolution

--futureHorizon [INT]               : Horizon over which Reward feeds back
--futureValueUpdateType [STR]       : MAX (feedback Max horizon) or BIASED (Feeback Lowest or Highest)

--metricSpaceHeuristic [STR]        : Heuristic used to speed up.  NONE, TREE or TREENEIGHBOUR
--verifyHeuristic [T/F]             : Output check data for Heuristic
--heuristicStartThreshold [INT]     : Parameter for TREE Heuristic
--heuristicTreeRadius [FLT]         : Parameter for TREE Heuristic
--------------------------------------------------------------------------

Ports Accessed

Ports Created

Configuration Files

conf/ihaExperienceMetricSpace.ini

Sample INI file:

# The data store number
dsnumber 1

############################################################
# Actions
#

num_actions 21

#
############################################################

############################################################
# section has the names of the sensors and the hi/lo range
# to allow binning calculations
# remember to put reward and action as last two items

SENSORS HEAD_PITCH HEAD_YAW HEAD_PAN EYES_UD EYES_RL EYES_CD LSH_ROT LSH_ELV LSH_TWST LELB_FLX LELB_TWST LWR_ABD LWR_FLX LDIG_1 LDIG_2 LDIG_3 LDIG_4 LDIG_5 LDIG_6 LDIG_7 LDIG_8 LDIG_9 RSH_ROT RSH_ELV RSH_TWST RELB_FLX RELB_TWST RWR_ABD RWR_FLX RDIG_1 RDIG_2 RDIG_3 RDIG_4 RDIG_5 RDIG_6 RDIG_7 RDIG_8 RDIG_9 FACE SOUNDS ACTION REWARD

#          0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41

LIMIT_HI  30  60  55  15  52  45  90 161 100 106  90  10  40  30 105  90  90  90  90  90  90 120  90 161 100 106  90  10  40  30 105  90  90  90  90  90  90 120  1   1  21   1
LIMIT_LO -40 -70 -55 -35 -50  -1 -96  -5 -39  -2 -90 -90 -20 -10 -15  -2  -2  -2  -2  -2  -2  -2 -96  -5 -39  -2 -90 -90 -20 -10 -15  -2  -2  -2  -2  -2  -2  -2  0   0   0   0

reward_index 41
action_index 40

num_image_sensors_x 4
num_image_sensors_y 4

#
###########################################################

############################################################
# Main configuration settings for experience creation
numBins 5
granularity 2
HORIZONS 20
experience_action_gap 2
#
############################################################

############################################################
# Updating Future Value
# can be MAX which always gives max value
# or BIASED which gives 0 or MAX

future_value_update_type MAX
future_horizon 40
#
############################################################

############################################################
# Writing of metric space or neighbour list

# just send current neighbours
write_curr_dist_to_port TRUE
# send the whole space (I have never used this it is silly)
write_ms_to_port FALSE

# These set how many neighbours are sent in a neighbour list
# You can choose to limit number directly or the maximum 
# radius or both
write_max_dsp_neighbours 6
write_max_dsp_radius 3.0

# This setting gives a max radius for the neighbour list for
# the purposes of working out relative probablities in the
# experience selection
neighbour_radius 3.0
#
############################################################

############################################################
# TEMPERATURE
# start temp
#temperature 3.0
temperature 4.0
# cooling (use instead of adapt_temp)
# 0 = no cooling
temp_dec 0.01

# set to adapt temp according to reward
adapt_temp FALSE

# adaptation rates and limits
# so far only used in the pioneer code
adapt_rate_inc 0.4
adapt_rate_dec 0.2
# if there is a low reward, the minimum temperature is this
min_temp_lo_reward 1.0
# if there is a high reward, the maximum temperature is this
max_temp_hi_reward 0.2
max_temperature 3.0
#
############################################################

###########################################################
# Merging
#
merge_threshold 0.20
only_merge_same_actions FALSE
#merge_threshold 1.60
#only_merge_same_actions FALSE

# NONE, CYCLE_TIME, NUM_COMPARISIONS
merge_adapt_type NONE

merge_increment 0.01
merge_exp_threshold 300
merge_cycle_time_threshold 600
#
###########################################################

###########################################################
# Forgetting
#
# purge experiences based on value
purge_experiences TRUE
purge_threshold 0.8


###########################################################
# Adaptive Binning
#
adaptive_binning FALSE
adaptive_binning_window_size 96

histogram_resolution 256
#
###########################################################

###########################################################
# Metric Space calculation speed-up
#
# Types available : NONE TREE TREENEIGHBOUR

metric_space_heuristic NONE
heuristic_start_threshold 20
heuristic_tree_radius 1.5
verify_heuristic FALSE
#
###########################################################

############################################################
# Experience selection feedback
# controls modification of experience space depending on
# reward feedback as a result of action taken from the
# selected experience
#
# Currently not implemented due to delayed reward updating
# Instead merging plays a role in this
#
experience_feedback TRUE
ef_value_increment 0.1
ef_value_decrement 0.1
#
############################################################

Tested OS

Linux

Example Instantiation of the Module

ihaExperienceMetricSpace --name /iha/ds --file conf/ihaExperienceMetricSpace.ini --connect_to_sensors /iha/sm/sensor:out --dbg 30

See script $ICUB_ROOT/app/iha_manual/iha_datastore.sh

See also:
iCub::iha::ExperienceProcessor
iCub::iha::DistanceSpaceClass
iCub::iha::DataFrame
iCub::iha::Experience
iCub::iha::Limits
iCub::iha::WindowIDCalc
iCub::iha::BinWindowMaxEntropy
iCub::contrib::ExperienceMetricSpaceModule
Author:
Assif Mirza

Copyright (C) 2008 RobotCub Consortium

CopyPolicy: Released under the terms of the GNU GPL v2.0.

This file can be edited at src/interactionHistory/experience_metric_space/src/ExperienceMetricSpaceModule.cpp.

 All Data Structures Namespaces Files Functions Variables Typedefs Enumerations Friends Defines