Differences

This shows you the differences between two versions of the page.

--- pepper-imitation:pepper-imitation [2018/06/28 17:55]
s4pepper2018 created
+++ pepper-imitation:pepper-imitation [2019/04/25 14:08] (current)
@@ Line 1: / Line 1: @@
-====== Pepper Game ======
+====== S4 project 2018 : A reinforcement learning approach based on imitation using kinect and Pepper. ======
 {{tag>pepper}}
-===== Principe =====
+===== Summary =====
-Cette approche étend les modèles à mélanges Gaussien aux variétés Riemanniennes où les données de rotation comme les quaternions sont naturellement représentées. Plus d'information théorique peut être trouvé dans l'article:
+The project aims to be the first steps for a development of a more general learn-by-imitation approach for autistic children. It was developed in collaboration with the CHU Brest, more specifically with Dr. Nathalie Collot as the main contact.
-    * M J A Zeestraten, I Havoutis, J Silvério, S Calinon and D G Caldwell. An Approach for Imitation Learning on Riemannian Manifolds. IEEE Robotics and Automation Letters (RA-L) 2(3):1240–1247, June 2017
-Dans notre cas, la variété Riemannienne étudiée est la combinaison des espaces cartésiens (position) et Riemanniens (orientation) de chaque articulation du squelette humain formant l'espace de la pose humaine. Plus d'information peut être trouvé dans l'article:
-    * Maxime Devanne, Sao Mai Nguyen. Multi-level Motion Analysis for Physical Exercises Assessment in Kinaesthetic Rehabilitation. IEEE-RAS International Conference on Humanoid Robots 2017, Birmingham, UK.
-===== Code =====
+The main goal was to simulate a well-known nursery rhyme used by Ms Collot's team to interact with the patients. The song is divided up in several sections, where the entertainer would assume a pre-defined posture that the kids should try to imitate.
-Le code suivant est en matlab. Une version python de l'extension des GMM sur les variétés Riemannienne peut être trouvée ici: https://gitlab.martijnzeestraten.nl/martijn/riepybdlib
-Il y a deux fichiers principaux, un pour l'apprentissage d'un modèle à partir de démonstrations, l'autre pour l'évaluation d'une séquence à partir d'un modèle appris.
+The gestures currently used are shown in the image below:
-    * 1. Apprentissage
+{{:pepper-imitation:gestures.jpg?200|}}
-Le script 'mainLearning.m' permet d'apprendre un modèle à partir de démonstrations. Les premières lignes définissent les paramètres qui sont expliqués dans le code. Principalement, les paramètres qui changeront en fonction du nombre de données d'apprentissage sont:
-<code matlab>
-%% Parameters
-nbData = 300; %Number of datapoints
-nbSamples =2; %Number of demonstrations
-trainName={'data/Assis3Maxime/'}; % folders names from where to load data
-nspp=2; %number of skeleton sequence per folder
-</code>
+As such, the project is built on top of two main pillars, namely:
-//nbData// est le nombre de trames à laquelle chaque séquence sera échantillonnée.
+  * **Body pose detection**: Windows Kinect V1 was the sensor chosen for this task.
-//nbSamples// est le nombre de séquences qui seront chargées et utilisées pour l'apprentissage
+  * **Robot control and synchronization**.
-//trainName// est le nom de chaque dossier où récupérer les données
-//nspp// est le nombre de séquence de squelette à charger dans chaque dossier. Si la valeur est 2, les séquence 1 et 2 seront chargées depuis chaque dossier.
+Body pose detection was entirely done using publicly available software. Thus, the main focus of our work was to familiarize ourselves with the robot itself and synchronize all the tasks, as well as fine-tune everything, taking into account the nature of the project and the target subjects.
-Tout d'abord on charge les données de squelettes d'apprentissage.
+The chosen framework was ROS (the only tested version was kinetic), written mainly in C++, with some parts in python.
-<code matlab>
-%% Data processing
-trainName={'data/Assis3Maxime/'};
-[model,xIn,uIn,xOut,uOut] = processTrainingData(model,trainName,nspp,registration,fastDP,filt,est,rem,ws,nbData);
-% data projected on tangent spaces of the human pose space
-u = [uIn; uOut{1}.data; uOut{2}.data; uOut{3}.data; uOut{4}.data; uOut{5}.data; uOut{6}.data; uOut{7}.data; uOut{8}.data; uOut{9}.data; uOut{10}.data; uOut{11}.data; uOut{12}.data; uOut{13}.data; uOut{14}.data; uOut{15}.data];
-% original data x (positions 3D and quaternions) in the human pose space
-x = [xIn; xOut{1}.data; xOut{2}.data; xOut{3}.data; xOut{4}.data; xOut{5}.data; xOut{6}.data; xOut{7}.data; xOut{8}.data; xOut{9}.data; xOut{10}.data; xOut{11}.data; xOut{12}.data; xOut{13}.data; xOut{14}.data; xOut{15}.data];
-model.x=x;
-</code>
-Les données de squelettes ont été capturées à l'aide de la [[sensors:kinect_library|bibliothèque Kinect]]. Le format est une matrice où chaque ligne est une trame, les colonnes correspondent aux positions 3D et orientations (quaternion) de chaque articulation dans l'ordre de la hiérarchie du squelette. Dans notre cas uniquement les données du haut du corps sont considérées. Si le paramètre //registration=1//, l'alignement temporel entre les séquences d'apprentissage est effectué, ce qui est utile pour apprendre un modèle idéal.
-De plus, pour segmenter la séquence en différent segments temporels correspondant à différents mouvements unitaires, et ainsi utiliser ces segments lors de l'évaluation, les fonctions 'segmentSequence' et 'segmentSequenceKeyPose' sont utilisées. La première fonction segmente le mouvement en considérant les points de coupure comme les points de transitions entre deux mouvements unitaires. La seconde fonction permet d'en plus considérer les phases de pose de maintient comme des segments temporels. Plus de détail sur la méthode de segmentation peuvent être trouvé dans le 2ème article cité plus haut.
+===== Installation and Set Up =====
-Une fois les données chargées, //xIn// et //uIn// correspondent aux données de temps (trames), //xOut// est l'ensemble des données pour chaque articulation séparément dans l'espace des poses et //uOut// est l'ensemble des données pour chaque articulation séparément projeté dans les espaces tangent correspondant. Ces données temporelles et spatiales sont ensuite concaténées pour passer en entrée de l'apprentissage.
+The code is fully available [[https://github.com/Thordreck/pepper-imitation-pkg|here]]. The repo should be cloned directly into the src folder of a catkin workspace.
+The repository above contains two ROS packages at the root folder level:
+  * **openni_tracker**: skeleton tracker for use with kinect. This was included in the repository, as no kinetic version is available in the official repos.
+  * **pepper_imitation**: the system developed by the S4 team.
-<code matlab>
+Dependencies (and even ROS itself if not present in the system), can be installed running installDependencies.sh in pepper_imitation/installDependencies.sh.
-%% GMM learning
-[ model ] = learnGMMmodel(model,u,xIn,xOut,nbSamples,nbIterEM,nbIter,nbData);
-save('modelExo3','model');
-</code>
-Le modèle GMM est appris à partir des données d'apprentissage. Le nombre de gaussiennes du modèle est décidé par l'intermédiaire du paramètre //model.nbStates// au début du code. Le paramètre //nbIter// détermine le nombre d'itérations maximum pour l'apprentissage.
-Le modèle appris est ensuite sauvegardé.
-    * 2. Evaluation
+===== Running the game =====
-Le script 'mainEvaluation.m' permet d'évaluer une séquence à partir d'un modèle appris. En plus des paramètres identiques à ceux de l'apprentissage, le paramètre //Seuil// permet de définir le seuil utilisé pour le calcul des scores en pourcentage. C'est un seuil négatif qui plus il est proche de zéro, plus le calcul du score en pourcentage sera strict.
-La première étape est de charger les données (modèle appris, une séquence d'apprentissage, la séquence de test à évaluer). La séquence d'apprentissage est utile pour l'alignement temporel.
+The game sequence can be started by running the roslaunch file pepper_imitation_game.launch.
-<code matlab>
+The skeleton tracker and the kinect driver are to be run separately using the file pepper_imitation_kinect.launch.
-%% load data
-% model
+Body pose checks can be disabled in pepper_imitation_node.launch, setting the "skip_pose_checks" parameter to true. If disabled, the game will assume all the checks to be correct.
-load modelExo3
-% data train for temporal alignment
+===== Architecture =====
-dirTrain='data/Assis3Maxime/';
-fnameTrain='SkeletonSequence1.txt';
-[oriMatTrain,posMatTrain,dataTrain] = loadData(dirTrain,fnameTrain,filt,est,rem,ws,nbData);
-% data test
+The system is made up of the following nodes:
-dirTest='data/Assis1Maxime/';
+  * **pepper_audio_player_node**: offers an interface capable of loading, starting and stopping and audio file saved in Pepper's internal computer. It reports back the current playing time as well.
-fnameTest='SkeletonSequence3.txt';
+  * **pepper_face_tracker_node**: enable/disable Pepper's built-in random face tracking.
-[oriMatTest_,posMatTest_,dataTest_] = loadData(dirTest,fnameTest,filt,est,rem,ws,nbData);
+  * **pepper_tts_node**: interface to use Pepper's TTS engine. It supports emotional speech as well (e.g: //style=joyful//).
-dataTest{1}=dataTest_;oriMatTestLong{1}=oriMatTest_;posMatTestLong{1}=posMatTest_;
+  * **pepper_tablet_node**: pops up an input box for the user to type his name before starting the game.
-</code>
+  * **pepper_imitation_node**: command the different gestures and checks if the detected person's pose - if any - is similar.
+  * **pepper_teleop_joy_node**: allows to control the robot's movement and rotation using a joystick. Keep in mind that the default joystick's values defined in the node have been set up to match those of a wired Xbox controller.
-Ensuite la séquence peut être évaluée:
+All these nodes can be run separately and commanded by publishing in their respective topics, allowing a Wizard of Oz-ish level of control.
-<code matlab>
+The actual game and synchronization is achieved by means of a state machine, defined in "pepper_imitation_game_node.py" file. SMACH was the library chosen for this.
-%% Evaluate sequence
-for rep=1:length(dataTest)
-    % temporal alignment
-    if registration==1
-        [dataTestAligned,r,allPoses,poses,motion,distFI] = temporalAlignmentEval(model, dataTrain,dataTest{rep},fastDP);
-        posMatTest=posMatTestLong{rep}(:,r);
-    else
-        dataTestAligned=dataTest{rep};
-    end
-    % compute likelihoods
+===== Game Flow =====
-    [Lglobal,Lbodypart,Ljoints] = computeLikelihoods(model,dataTestAligned);
-    % get scores
+The expected behaviour is as follows:
-    seuils=[seuil seuil seuil seuil seuil seuil];minseuils=[-500 -500 -500 -500 -500 -500]; %default values
+  * Pepper says hi, welcome the users to IMT Atlantique, and invites them to click on the tablet to enter a name.
-    [Sglobal,Sbodypart,Sjoints] = computeScores(model,Lglobal,Lbodypart,Ljoints,seuils,minseuils);
+  * When the user clicks, an input dialog will appear after a few seconds.
-    scoreLA=[Sbodypart{1}.global.global Sbodypart{1}.global.perSegment];
+  * After entering his name, the music starts playing and the game goes into its main loop.
-    scoreRA=[Sbodypart{2}.global.global Sbodypart{2}.global.perSegment];
+  * Pepper prompts the user to do as him, and assumes a given pose sync'd with the music.
-    scoreCol=[Sbodypart{3}.global.global Sbodypart{3}.global.perSegment];
+  * Now, two possible scenarios can follow:
-    % For each score, the first value corresponds to global score for the
+     * The player adopts a pose similar enough. Then Pepper will encourage him to keep going, and the song continues.
-    % whole sequence, and then for each temporal segment
+     * The player does not make a similar pose, or the body pose is not detected properly. In this case, Pepper will ask the player to focus and try again, and the music will go back to the previous part.
-end
+     * If no skeleton at all is found, Pepper will tell the user that it cannot find him. Two more additional sub-scenarios are possible here:
-</code>
+         * The user is detected again after a while. Pepper informs the player and the game continues.
-Le score pour chaque partie du corps se trouve dans //scoreLA//, //scoreRA// et //scoreCol// pour le bras gauche, le bras droit et la colonne respectivement. Chaque vecteur contient d'abord le score calculé pour toute la séquence puis pour chaque segment temporel.
+         * If no skeleton is found after a while the game is stopped.
+  * When all the gestures are done and the music ends, Pepper thanks the user and the game goes back to its first state. The user can then click on the tablet again to re-start the game without having to input his name again.
+===== Known Issues and Limitations =====
+  * The synchronization process is hardcoded. As such, correct synchronization will only be achieved if the same file is used (it is available in the robot's internal memory in /home/pepper/resources/audio), The state machine waits until the file is played to a given time (usually the time when a new gesture is started).
+  * The way openni_tracker publish the skeleton information (using TF frames following the convention <body_part>_<id>, for example torso_1) poses some problems when people are lost and new ids are assigned. This is due to old TF frames still being listed by ROS API even after a while. An easy solution to this would be to query the TF frames, but keep only those with the highest id values.
+  * As of now, only one person is tracked. However, adding multiple person's body poses verification should be rather trivial.
+===== Future Work =====
+  * The detection provided by the kinect and the openni libraries is not enough. The calibration phase is a limiting factor when interacting with the subjects. A possible solution to this would be to adopt body pose detection based on RGB data alone. An example of this would be [[https://github.com/CMU-Perceptual-Computing-Lab/openpose|OpenPose]]. This particular game could be reproduced entirely using only joints angles data in 2D detections, even allowing us to get rid of the external kinect completely, and using the built-in cameras. More complex scenarios and gestures may need visual and depth data fusion. For a real time application, a GPU should be used.
+  * The concept of this game can be generalized a bit, using some simple config files. These files could define the audio file to be used and the time when each posture should be adopted. Postures may be defined here as well.
+  * Furthermore, these files could be generated using a user-friendly GUI, where the user could set an audio file and set up the sync times and the robot poses.
+===== Contact =====
+Álvaro Páez Guerra
+paezguerraalvaro@gmail.com