lucraresesiunecomunicari final
TRANSCRIPT
![Page 1: LucrareSesiuneComunicari Final](https://reader030.vdocumente.com/reader030/viewer/2022021222/577c78331a28abe0548f0f87/html5/thumbnails/1.jpg)
7/25/2019 LucrareSesiuneComunicari Final
http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 1/19
SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ STUDEN E TIȚ Ș , 2016
Multimodal Interface
Alexandru-Florin Gavril
University POLITEHNICA of Bucharest
Faculty of Automatic Control and Comuters! Comuter "cience #eartment
Emails$ ale%andru&'avril(cti&u)&ro
Keywords: conversational agents, ambient intelligence, ambient assisted living, speec
recognition, natural language understanding
!" Introduction
Multimodal interaction provides the user numerous ways of interacting with a system. It is
usually possible to be naturally integrated with systems for ambient intelligence, ambient
assisted living, personal assistants and many others while maintaining a good user experience.
Interactions with the system include the natural modes of communication like handwriting,
speech, body gestures and graphics, as described in [1], and the classical graphic user
interface or command line interface.
mbient Intelligent !mI" systems use data ac#uisition devices and actuators to offer an
environment which is sensitive and reactive to the user$s presence. mbient ssisted %iving!%" aims to make use of the mI technologies to provide better living conditions for older
adults and people unable to sustain an independent way of living.
&his paper proposes a multimodal interface for such systems, which allows interaction
through hand control, spoken language and a visual interface.
#" $elated %or&
'poken language interactions are usually built using five main modules(
) utomatic 'peech *ecognition !'*"
) 'poken %anguage +nderstanding !'%+"
) ialog Management!M"
) -atural %anguage eneration !-%") &ext)to)'peech synthesis !&&'"
utomatic speech recognition module has the goal of providing a recognition hypothesis of
the user/s speech. Many approaches have been investigated to solve the problem of speech
recognition. &he most commonly used today is the stochastic method, which is based on
acoustic and language models corresponding to a given language. 0idden Markov Models are
usually used in representing the acoustic models. &he language models are usually
automatically generated from processing large corpus of data. nother approach to '* is the
-)best recognition. &his method provides additional hypotheses. &his method is commonly
used by speech dialog systems because it enables the system to choose the utterance that fits
best as described in []. '* module then outputs the utterance to the '%+ module. &he goal
of the '%+ module is to obtain the user$s intent and key entities from the utterance. nothergoal of the '%+ module is to correct errors made by the '*, for which multiple techni#ues
can be used like relaxing the grammars, focusing on key entities and employing statistical
approaches [2]. &his simplest way of intent and entities extraction is using regex expressions.
![Page 2: LucrareSesiuneComunicari Final](https://reader030.vdocumente.com/reader030/viewer/2022021222/577c78331a28abe0548f0f87/html5/thumbnails/2.jpg)
7/25/2019 LucrareSesiuneComunicari Final
http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 2/19
SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ STUDEN E TIȚ Ș , 2016
ialog management receives the output from the '%+ and decides what the system should do
next in response to the user/s input. It usually #ueries a local database and keeps a dialoghistory. -umerous approaches can be found in the literature for carrying out dialogue
management such as rule)based systems [3], plan)based or based on statistical reinforcement
learning [4]. -atural language generators were considered the easiest part of a spoken dialog system in the
fifties, since a generator based on user input was easy to do using sentence templates. &he
complexity of this task comes from providing the user different generated output. n example
of a system which produces stylistically appropriate texts from a single story representation
under various settings that model pragmatic circumstances is 5+%I-6 [7].
0and tracking using the Microsoft 8inect evice is facilitated by the skeleton data
ac#uisition. *esearches in tracking fingertips and palm center have been done by putting a
threshold on depth of the hand points and applying a big circular filter on the resulted image[9].
'" (ystem arcitecture
&he proposed interface is composed of three interaction modules( speech, hand tracking and a
graphical user interface. &he speech and hand tracking modules are intended to be used
together, most of the time, however, they are designed to function independently of each
other.
&he speech module is composed of five main parts( audio preprocessing, automated speech
recognition, natural language understanding !-%+", dialog manager and text to speech.
0and tracking is done using a Microsoft 8inect device which provides *: data, depthimaging and user tracking in the form of a 2 skeleton position in key points like arms, legs,
head and body.
Fi'ure * + "ystem Architecture
&he system runs as a -ode ;' server, which provides the webpage in which the user can
interact with the system. &he -ode ;' servers is responsible with all 8inect interactions such
as reading the position of the hand, transforming it to <oordinate and broadcasting it, in
real)time, to all connected clients.
![Page 3: LucrareSesiuneComunicari Final](https://reader030.vdocumente.com/reader030/viewer/2022021222/577c78331a28abe0548f0f87/html5/thumbnails/3.jpg)
7/25/2019 LucrareSesiuneComunicari Final
http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 3/19
SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ STUDEN E TIȚ Ș , 2016
&he 8inect device is integrated using -ode)=pen-I1 which provides the position of the right
hand from the skeleton data ac#uired using =pen-I. &he position is then processed to confera more stable coordinate and broadcasted to all connected. 6ach client will process the
coordinate to show the same position of the cursor related to its screen si>e.
If the 8inect is used for speech recognition a custom made driver must be used for installing
the Microphone rray.
&he system integrates two similar 5Is for 'peech *ecognition( ?it.ai2 and Microsoft :ing
'peech *ecognition @ %uis.ai3. :oth use *6'& 5Is re#uests for speech recognition and
provide a natural language processing of the provided utterance. 'peech recognition is done
on the client side such that the system can be used on portable devices with internet
connection. &he processed sentence is then sent to the Microsoft :ot Aramework, which is a
matching system that will process the #uery, finds the necessary information, then output the
answer to the user re#uest as text and speech.
&he system also integrates two text to speech systems( Mary&&' which runs offline and
Microsoft &ext to 'peech 5I for online #ueries.
)" (po&en *anguage Interaction
&he ability of interacting using spoken language re#uires less effort for communication and
improves the user experience. lthough speech recognition is a complex task, the natural
language understanding and the dialog manager both play a big part in enabling interactions
with the system.
Fi'ure , + "o-en Lan'ua'e Architecture
)"!" Automatic (peec $ecognition +A($
'peech interaction is very common nowadays and very used in smartphones, call steering,
user authentication and many others. speech recognition system used in mbientIntelligence should not be only limited to a domain specific knowledge base, even though the
interaction with the system might be restricted to simple tasks like turning on the lights,
raising the drapes and so on. +sing a limited grammar based speech recognition system will
lead to a lower user experience. n example of limited grammar speech recognition system is
<M+ 'phinx. It is an open)source speech recognition system first developed by <arnegie
Mellon +niversity in BBB. It has been tested with different configurations and increasing the
grammar to provide a better user experience was decreasing the success of the recognition.
&he response time however is very good since <M+ 'phinx was developed for low)resource
1 https(@@github.com@pgte@node)openni
https(@@git.ao.it@kinect)audio)setup.git@2 https(@@www.wit.ai@
3 https(@@www.luis.ai@
![Page 4: LucrareSesiuneComunicari Final](https://reader030.vdocumente.com/reader030/viewer/2022021222/577c78331a28abe0548f0f87/html5/thumbnails/4.jpg)
7/25/2019 LucrareSesiuneComunicari Final
http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 4/19
SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ STUDEN E TIȚ Ș , 2016
platforms. ood results were reported using <M+ 'phinx by training the acoustic model of
the speech recognition system, yet this was outside of scope.'peech recognition 5Is are now broadly used. 'ome examples of such 5Is are oogle
'peech 5I, ?it.ai and Microsoft :ing 'peech *ecognition 5I. &hey all offer good response
time while maintaining a high success rate of recognition. Most of them offer a #uery limited
free to use subscription and '8s for speech ending detection.
A($ valuation
&ests have been done on a set of CC user utterances from D different users on the ?it.ai 5I.
&he testing procedure is done using the fu>>ywu>>y 4 python library which uses %evenshtein
istance to calculate the differences between the speech recognition result and the phrase
text. &he results of the testing can be seen in ppendix . ?it.ai had a successful recognition
percentage of D.
&he results show that long sentences such as Eplease put a reminder to take my pills at eight
this eveningF provided a low average detection score. 'ome words such as FtakenF, FworkedF
were sometimes interpreted as similar phonetic words EtakeF, EworkF but did not influence
the correct interpretation of the utterance. &he best batch of detections !C4G correct '*
detections" provided an DD.DG accuracy in the interpretation of the user/s intent and D2.9G
accuracy in the entity extraction, while the lowest rate of '* !9G" provided a 73.7G
accuracy in intent interpretation and a 43.G in correct entity extraction.
)"#" (po&en *anguage .nderstanding +(*.
-atural language processing is necessary for the speech interaction by offering language
understanding support. &he system receives a plain text utterance, determine the intent of the
user and extracting the relevant entities, then outputs it to the ialog 'ystem. &he simplest
way of processing the phrase is by using re'e% expressions to extract the relevant information.
&his incurs a limitation in user expressivity, an overhead in the matching system and low
flexibility to changes in the training data. 6nsuring a better user experience re#uires a more
complex natural language processing of the utterance. &wo natural language processing 5Is,
?it.ai and %uis.ai, were trained with % and MI domain specific #ueries. &he intents and
entities are described in ppendix :. &he training is done by labeling each sentence with the
user intent and each relevant word for the understanding of the utterance with the specific
entity that defines it. &he dialog system will be triggered by the intent of the utterance and
will match the entities so that it can correctly process the phrase.
(*. valuation
&ests have been done on the set in ppendix . &he utterance recogni>ed by the '* was
processed by the ?it.ai '%+ system and compared to the results received with the original
text #uery. &he average percentage of successful intent recognition was, 9G while the
percentage of successful entity recognition was 77G.
4 https(@@pypi.python.org@pypi@fu>>ywu>>y
![Page 5: LucrareSesiuneComunicari Final](https://reader030.vdocumente.com/reader030/viewer/2022021222/577c78331a28abe0548f0f87/html5/thumbnails/5.jpg)
7/25/2019 LucrareSesiuneComunicari Final
http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 5/19
SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ STUDEN E TIȚ Ș , 2016
&he dialog manager first matches the user intent, then matches the entities. &his implies that a
low detection accuracy in intent recognition will mean wrong interpretation of the wholeutterance, while missing entity extraction will simply make the manager ask for the missed
entities. &he results show that the accuracy of the '* system is proportional with the
accuracy of intent interpretation and entity extraction as can be seen in Aigures [2,3,4,7].
![Page 6: LucrareSesiuneComunicari Final](https://reader030.vdocumente.com/reader030/viewer/2022021222/577c78331a28abe0548f0f87/html5/thumbnails/6.jpg)
7/25/2019 LucrareSesiuneComunicari Final
http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 6/19
SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ STUDEN E TIȚ Ș , 2016
Fi'ure . + Avera'e Levenshtein "core and /Entities 0atched + First )atch
Fi'ure 1 2 Avera'e Levenshtein "core and /Intent detected + First )atch
Fi'ure 32 Avera'e Levenshtein "core and /Entities 0atched 2 "econd )atch
Fi'ure 4 2 Avera'e Levenshtein "core and /Intent detected + "econd )atch
![Page 7: LucrareSesiuneComunicari Final](https://reader030.vdocumente.com/reader030/viewer/2022021222/577c78331a28abe0548f0f87/html5/thumbnails/7.jpg)
7/25/2019 LucrareSesiuneComunicari Final
http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 7/19
SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ STUDEN E TIȚ Ș , 2016
)"'" /ialog (ystem
&he ialog 'ystem is the core of the speech interaction. It is responsible with extracting the
relevant data after matching the processed utterance and offer a response to the user. =nce the
intent is determined by the natural language processing module, the system should be able to
differentiate between a #uery, in which it should gather the necessary data and provide a text
output to the user and@or to the &ext)&o)'peech 'ynthesis module, a command in which it
should trigger the MI 'ystem Interface and data insertion in which the system should save
the relevant data about the user. &hree main ialog 'ystems were evaluated( *aven<law,
=penial and Microsoft :ot Aramework.
)"'"!" General description
5avenCla6 + Carne'ie 0ellon University 4
*aven<law is a dialogue manager build by <M+ and it is part of the =lympus dialogue
system. It manages the dialog using two data structures, the task tree and the agenda. &he task
tree is a plan for performing specified tasks in a given domain. &he task tree should include
each of the activities that humans may reasonably want to undertake using the system. It also
permits the user to dynamically modify the task tree that is usually constructed by the
developer but it still is a limited way of planning.
*aven<law is placed above <M+ 'phinx for speech recognition and 5hoenix for natural
language understanding, that uses <A grammars for parsing the user utterance, in the
=lympus complete framework for spoken dialog systems.
Oen#ial + Lan'ua'e Technolo'y 7rou! University of Oslo 8
=pen)ial is a dialogue manager that uses :ayesian -etworks to represent the dialogue
states. It combines the benefits of logical and statistical approaches of dialogue modelling. It
relies on probabilistic rules to represent the domain models.
0icrosoft Bot Frame6or- 9
Microsoft :ot Aramework is a newly launched platform for building interactive chat bots. It
easily integrates powerful I Arameworks like %+I' or ?it.ai for speech recognition and
natural language preprocessing.
)"'"#" 0ontext matcing and spo&en language understanding +(*.
In *aven<law the mapping of the context is done by using grammar mappings. grammar
mapping is a list of one or more grammar mapping element, each with a single grammar slot
name, and optional scope and binding filters or normali>ations. &his is done by 5hoenix
framework from the =lympus system.
7 http(@@wiki.speech.cs.cmu.edu@[email protected]@*aven<law9 http(@@www.opendial)toolkit.net@
D https(@@bots.botframework.com@
![Page 8: LucrareSesiuneComunicari Final](https://reader030.vdocumente.com/reader030/viewer/2022021222/577c78331a28abe0548f0f87/html5/thumbnails/8.jpg)
7/25/2019 LucrareSesiuneComunicari Final
http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 8/19
SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ STUDEN E TIȚ Ș , 2016
=penial uses a more advanced pattern matching on the user utterance for the system to
correctly understand it. 6ach state contains a set of utterances that can be matched withdifferent probabilities. It identifies entities in the phrase by matching them to variables in the
text, for example in the phrase E&ake the H=:;F the system can match the =:; word to
anything and pass it to the next state in the system.
&he bots build with the bot framework are stateless which helps them easily scale. &he
context mapping is done by ?it.ai @ %+I' by using the Model &raining. :oth ?it.ai and %+I'
are trained using labeled user utterances. 6ach utterance is categori>ed into Intents and
labeled with key entities that the developer wants the system to understand !e.g. Eid I take
my medications todayJF could be interpreted by a Kuery Intent, with entities like
=bLectMedications, ate&ime &oday, ction&ake. fter training the bot will receive a
labeled user utterance with Intent , Entities and "core in the form of a ;'=- that will be
matched against the predefined rules. &he benefits of this approach is that the system need a
limited number of training utterances, unlike a 5=M5, yet is limited by the number of
defined rules of matching.
)"'"'" 1ransition between te dialog states
In *aven<law transitions between the dialog phases are done by the agents in the agenda. &he
agenda is a list of agents that is used to dispatch inputs to appropriate agents in the task tree.
&he agenda is recomputed on every turn, placing the current agent !the focus of the
conversation" on the top of the list. ?hen a match occurs the corresponding agent is activated.
Fi'ure 8 + 5avenCla6 0yBus Tas- Tree :9;
In =penial the transitions between states is done using a :ayesian -etwork. 6ach state has a
limited number of utterances that the user can tell the system. fter the system correctly
matches the utterance, it will go to the next state.
&he :ot Aramework is stateless. 6ach utterance will be matched against a predefined rule
regardless the state, yet it can be extended to add states for having a continuous dialog by
limiting the types of rules to be matched in a given state(
![Page 9: LucrareSesiuneComunicari Final](https://reader030.vdocumente.com/reader030/viewer/2022021222/577c78331a28abe0548f0f87/html5/thumbnails/9.jpg)
7/25/2019 LucrareSesiuneComunicari Final
http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 9/19
SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ STUDEN E TIȚ Ș , 2016
<< User "tarts #ialo'
2ello
<< "ystem resonse + the system 6ill no6 6ait for a name of the user! forcin' the user to only
use one tye of utterance that can )e matched as a Name entity
2i3 %at is your name4
5on
<< If the user 6ould have not used a valid name =matched )y the NLP rocessor> the system
6ould have as-ed a'ain
2ello 5on3
E%amle * + 0icrosoft Bot Frame6or- #ialo' E%amle
&he bot has three different types of states. &he simplest one is the Closure state in which the
system will answer the same way regardless of what the user said. &his is useful for ending
the dialog, or a different state for answering the user phrases like ?7lad I@ve could )e
useful&! ?7ood)ye etc. &he second one is the aterfall state in which the system processes
a certain phrase as a continuous dialog in which the answer of the user will be the input to the
next rule !Aig.". &he third and most powerful one is the #ialo' O)ect state that is using the
%anguage 5rocessing to match the phrase. &his could be combined with the aterfall state for
a more complex ialog 'ystem.
)"'")" Adaptability to context canges
*aven<law uses the genda as a list of ordered agents, yet it is limited to the way the task
tree was created. If the user is not answering the #uestion addressed by the system and
suddenly changes context, the system will not be able to adapt itself.
=penial cannot be adapted to sudden context changes. &he user has to finish the started
dialog before starting a new one, because the system is limited to the defined utterances it can
understand in the given state !the actions it can do in the state of the :ayesian -etwork".
:ecause the :ot framework is stateless, even if the user suddenly changes context, the system
can process the new #uery independently. If the user is in a ?aterfall state, the system will
choose between answering the #uery as a newly matched #uery or by answering the default
answer from the ?aterfall state.
)"'"6" Integrating user feedbac&
*aven<law is a part of a larger framework and can be hardly adapted to any changes to the
way it was build. Integrating user feedback can only be used as a way of reordering the
genda. If the user is not satisfied with an answer, the agenda could bring the next agent in
the front to react to the user phrase.
If the user is not satisfied with the answer of the system, the system will get stuck in a state
that it could not recover from.
+ser feedback is not part of the bot framework, yet it can be adapted to match some phrases
with a higher probability.
![Page 10: LucrareSesiuneComunicari Final](https://reader030.vdocumente.com/reader030/viewer/2022021222/577c78331a28abe0548f0f87/html5/thumbnails/10.jpg)
7/25/2019 LucrareSesiuneComunicari Final
http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 10/19
SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ STUDEN E TIȚ Ș , 2016
)"'"7" rror andling
In *aven<law, if the system has answered in a wrong way, the system cannot recover until the
task tree was fully processed, but if for example the system is in a state where it should get the
=rigin and the estination of the user !Aigure etKuery'pecs 'tate", the user can choose to
answer by telling the =rigin, estination or both and the agent will adapt itself and ask the
user the rest.
=penial have a default action that the system always chooses when it cannot match the
utterance. &his state usually is a state in which the system will ask the user to repeat himself,
yet as the *aven<law framework, it can process the information by filling the information
one by one.
In the bot framework, if the system cannot understand the phrase of the user, it will simply
move to a default state !a <losure state" that will ask the user to rephrase, or it will choose adefault action, based on what it could understand from the phrase.
6" 2and trac&ing interactions
0and tracking interaction is done using a 8inect device, which provides the 2 coordinate of
the right hand using -ode)=pen-I library. &he coordinate is then transformed to a
coordinate using classical viewport transformations and affine transformations. &he
coordinate is then broadcasted to all connected clients. 6ach client scales the position received
to the screen dimensions and will provide a cursor to interact with the +I.
Fi'ure 9 + Hand Trac-in' Interactions
6"!" Initial coordinate smooting
'keleton tracking algorithms provide positions of important Loints on the human body. &he 0icrosoft Dinect for indo6s * device can track up to B Loints. 'tudies [C] shows that the
error in skeleton movement for hand tracking can go up to 13. cm. +sing the hand as a
cursor needs a way of smoothing the movements, especially in still position for clicking and
interacting with the +I.
![Page 11: LucrareSesiuneComunicari Final](https://reader030.vdocumente.com/reader030/viewer/2022021222/577c78331a28abe0548f0f87/html5/thumbnails/11.jpg)
7/25/2019 LucrareSesiuneComunicari Final
http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 11/19
SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ STUDEN E TIȚ Ș , 2016
90
95
100
105
110
115
120
125
Kinect Still Hand Coordinate on X
Fi'ure + Dinect still hand coordinate on a%is
102
104
106
108
110
112
114
Kinect Still Hand Coordinate on Y
Fi'ure *G2 Dinect still hand coordinate on a%is
'moothing the coordinate is done using a buffer of the last 1B coordinates. s a baseline, the
average of the coordinates in the buffer was computed. &he results can be seen in Aigure 11 )
8inect N coordinate and average computed coordinate.
![Page 12: LucrareSesiuneComunicari Final](https://reader030.vdocumente.com/reader030/viewer/2022021222/577c78331a28abe0548f0f87/html5/thumbnails/12.jpg)
7/25/2019 LucrareSesiuneComunicari Final
http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 12/19
SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ STUDEN E TIȚ Ș , 2016
-200
-100
0
100
200
300
400
500
600
700
Average Coordinate
X Kinect X Average
Fi'ure ** 2 Dinect coordinate and avera'e comuted coordinate
&hen a 8alman filter on N and O axis was used to smooth the coordinates. &his level of
smoothing was still not enough for a proper user experience. 8arel 'lavicek, Pladimir
'chindler, Martin 'ykora, and =tto ostal described a method of using a 'moothed 8alman
Ailter [1B] for improving the gyroscopic mouse movements. &he usual 8alman filter is
modified by using an average of five consecutive values for the predicted value. 'tarting from
the method they described, I used the Chai-in "moothin' Al'orithm [11] for smoothing theresults of the 8alman filter. <haikin/s method uses the corner)cutting paradigm for generating
curves directly from a set of control points by cutting the corners in the initial se#uence.
&he algorithm was applied twice, once on the initial point
se#uence and then on the resulting points. Arom the
resulted set of points, the middle point was used as a final
point. <haikin/s corner cutting algorithm can be easily
computed and can be used for the real)time coordinates
received from the 8inect. &he final results can be seen in
Aigure 12) 8alman filter with <haikin smoothing and
Aigure 13 ) 8alman filter with <haikin smoothing on O
axis. Fi'ure *, 2 Chai-in smoothin' al'orithm e%amle :*,;
![Page 13: LucrareSesiuneComunicari Final](https://reader030.vdocumente.com/reader030/viewer/2022021222/577c78331a28abe0548f0f87/html5/thumbnails/13.jpg)
7/25/2019 LucrareSesiuneComunicari Final
http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 13/19
SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ STUDEN E TIȚ Ș , 2016
-200-100
0
100
200
300
400
500
600
700
Kalan !ilter "it# C#ai$in Soot#ing
X Kinect X Cali%rated
Fi'ure *.2 Dalman filter 6ith Chai-in smoothin' on a%is
-300
-200
-100
0
100
200
300
400
500
Kalan !ilter "it# C#ai$in Soot#ing
Y Kinect Y Cali%rated
Fi'ure *1 2 Dalman filter 6ith Chai-in smoothin' on a%is
6"#" 8iewport transformations
7" 0onclusions
&he paper presented a multimodal interface, which uses spoken language, hand gestures and a
visual interface to provide the user an improved experience.
&he spoken language interactions are done using '*, -%+, a dialog management module
and a &&' module. &he results shown that *6'& 5Is can successfully accomplish '* and
-%+, while multiple dialog management techni#ues can be used for matching the user
sentence and provide a valid answer.
![Page 14: LucrareSesiuneComunicari Final](https://reader030.vdocumente.com/reader030/viewer/2022021222/577c78331a28abe0548f0f87/html5/thumbnails/14.jpg)
7/25/2019 LucrareSesiuneComunicari Final
http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 14/19
SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ STUDEN E TIȚ Ș , 2016
0and tracking is done using a 8inect device, which provides 2 coordinates for the hand
position in real)time. &he 2 coordinate is then converted to a coordinate and works as acursor device for interacting with the +I.
9" Future wor&
&o provide a better spoken dialog module the dialog manager should be able to match
advanced #ueries to the system.
&he hand tracking interaction needs to be extended with a hand calibration module for
stabili>ing the hand position.
&he interface can be integrated as part of the MI)%ab in +niversity 5=%I&60-I< of:ucharest Aaculty of utomatic <ontrol and <omputers, <omputer 'cience epartment. &he
laboratory acts like a monitoring % system for elderly people.
![Page 15: LucrareSesiuneComunicari Final](https://reader030.vdocumente.com/reader030/viewer/2022021222/577c78331a28abe0548f0f87/html5/thumbnails/15.jpg)
7/25/2019 LucrareSesiuneComunicari Final
http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 15/19
SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ STUDEN E TIȚ Ș , 2016
Appendix A (peec 1esting $esults
UtteranceAverage Lev.
Score% Entitiesmatched
% IntentDetected
how are you today 9100 !2"6 "#$1
a% & 'ree at (&(e )% today "1$1 $1!* $1!*
th&+ &+ +tu)&d 9""6 92"6 10000
do (ot re%&(d %e a(yth&( 6"00 21!* 2"#$
tur( o'' the a&r -o(d&t&o(&( at e&ht today $900 #$1! "#$1
re%&(d %e to ta.e %y )&//+ a'ter & wor. out "0"6 $1!* "#$1do you .(ow what wa+ %y )u/+e ye+terday "*00 *#$1 #$1!
+ay &t aa&( $*29 "#$1 $1!*
+w&t-h o'' the /&ht+ at (&(e th&+ ee(&( $"1! #$1! "#$1
(o & d&d (ot "!00 "#$1 "#$1
ye+ & d&d "!$1 "#$1 "#$1
a''&r%at&e #1#$ "#$1 #$1!
-a( you )/ea+e tur( o'' the a&r -o(d&t&o(&( "6#$ 6!29 "#$1
re%&(d %e to ta.e %y )&//+ at (&(e ""00 666$ "#$1
(o tha(. you 10000 10000 10000
-/o+e the /&(d+ 9"1! 9#2! 10000o)e( the /&(d+ at (&(e a% today $61! 90!" "#$1
)/ea+e re%&(d %e to eer-&+e th&+ ee(&( ""!* 90!" $1!*
+ee you /ater 99!* 92"6 10000
)/ea+e +w&t-h o( the /&ht+ 9*00 10000 10000%oe the re%&(der 'ro% +& a% to (&(e th&+ee(&( $*"6 #$1! $1!*
what wa+ %y )u/+e two day+ ao "6#$ #$1! 10000
&e +tarted eer-&+&( $$00 "#$1 $1!*
%y )u/+e &+ +ee(ty e&ht "*00 #000 $1!*
)/ea+e re)eat that 92!* 10000 10000
do & hae a(yth&( +et 'or e&ht )% today "!1! "#$1 #$1!
)/ea+e +tart the a&r -o(d&t&o(&( $!00 #$1! 10000
hae & re-e&ed a(y -a//+ today $9$1 2"#$ 1!29
what wa+ %y /y-e%&a /a+t wee. $0!* 21!* 2"#$
)/ea+e +ay that aa&( "21! 92"6 10000
whe( d&d & /a+t wor. out "$"6 1!29 !2"6
d&d & 'oret to ta.e %y )&//+ today "929 $1!* "#$1
re%oe a// the re%&(der+ 6##$ 92"6 2"#$
&e 3u+t '&(&+hed eer-&+&( "6"6 $"#$ "#$1
tha(. you "0"6 $1!* "#$1what wa+ %y %ea( /ood +uar /ee/ th&+ wee. 9*1! 666$ $1!*
![Page 16: LucrareSesiuneComunicari Final](https://reader030.vdocumente.com/reader030/viewer/2022021222/577c78331a28abe0548f0f87/html5/thumbnails/16.jpg)
7/25/2019 LucrareSesiuneComunicari Final
http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 16/19
SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ STUDEN E TIȚ Ș , 2016
do (ot tur( o( the a&r -o(d&t&o(&( at +ee( "100 #$1! $1!*
& d&d(t a+. that 9"00 92"6 10000
(o & 'orot 6"1! "#$1 "#$1)/ea+e )ut a re%&(der to ta.e %y )&//+ at e&htth&+ ee(&( $200 2"#$ 2"#$
he//o 10000 10000 10000
what wa+ %y aerae )u/+e th&+ wee. "600 #2*" "#$1
-a( you )/ea+e tur( o'' the /&ht+ &( the roo% "6"6 690# "#$1
)/ea+e tur( o( the /&ht+ "900 "#$1 "#$1
d&d & ta.e %y %ed&-at&o(+ today "#1! #$1! "#$1
-erta&(/y !1#$ 1!29 #$1!
tur( o( the a&r -o(d&t&o(&( 91#$ "#$1 "#$1d&d & '&(&+h %y eer-&+&( ye+terday "2!* *#$1 2"#$
tur( o( the a&r -o(d&t&o(&( at +ee( $!$1 #2*" $1!*
-/o+e the /&(d+ at (&(e th&+ ee(&( 9*!* $"#$ 10000
what wa+ %y /a+t %ea+ured /ood )re++ure $9"6 !2"6 $1!*
)/ea+e +to) the a&r -o(d&t&o(&( 69!* 6!29 "#$1
ra&+e the /&(d+ 9!29 92"6 10000
hae & ta.e( %y )&//+ today $*#$ 6190 #$1!
do & hae a(yth&( to do today 96$1 10000 10000
what wa+ %y heart rate ye+terday 91#$ $"#$ "#$1
)/ea+e o)e( the /&(d+ "*!* "#$1 "#$1
&e '&(&+hed ta.&( )&//+ today $6"6 **** 1!29
& too. %y )&//+ 3u+t (ow $#!* **** !2"6
%y /ood )re++ure &+ +&tee( a(d (&(e ""!* $1!* "#$1
& ea( wor.&( out "6$1 6!29 10000
%y /y-e%&a &+ o(e hu(dred 6!00 2"#$ !2"6
)/ea+e )ut a re%&(der to eer-&+e at +ee( $*1! $"#$ !2"6
tur( o'' the /&ht+ 9#29 92"6 10000re%&(d %e to %ea+ure %y /y-e%&a a'ter & wor.out $##$ ""10 10000
how %a(y )&//+ do & hae to ta.e today $1#$ #$1! $1!*hae & eer-&+ed at a// th&+ wee. "000 !$62 $1!*
ye+ o' -our+e & d&d 9#"6 $"#$ 10000
/ower the /&(d+ 91!* "#$1 "#$1
+to) re%&(d&( %e eeryth&( 6"29 "#$1 $1!*
what d&d you +ay 10000 10000 10000
ye+ 9229 "#$1 "#$1
d&d & re-e&e a(y -a//+ today $6#$ !2"6 !2"6
&e 3u+t ta.e( %y )&//+ "9!* $1!* $1!*
& d&d(t u(der+ta(d "$$1 1!29 1!29
at what t&%e +hou/d & (et ta.e %y )&//+ $$!* 2619 #$1!tur( o'' the a&r $#$1 #000 "#$1
![Page 17: LucrareSesiuneComunicari Final](https://reader030.vdocumente.com/reader030/viewer/2022021222/577c78331a28abe0548f0f87/html5/thumbnails/17.jpg)
7/25/2019 LucrareSesiuneComunicari Final
http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 17/19
SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ STUDEN E TIȚ Ș , 2016
-a( you te// %e &' &e ta.e( %y )&//+ e&ht day+
ao ""#$ 666$ "#$1oodye 9*"6 6!29 10000
%y /ood +uar &+ /ow 90$1 "#$1 "#$1
& d&d(t et that 911! $1!* $1!*
&e eer-&+ed today 'or th&rty %&(ute+ $"1! !#2! "#$1
whe( +hou/d & ta.e %y )&//+ (et "9!* 6!29 10000
re%&(d %e to %ea+ure %y /ood )re++ure 6$#$ "#$1 "#$1
(eat&e $*1! "#$1 #$1!
& wo(t e eer-&+&( th&+ ee(&( $*$1 21!* 2"#$
do & hae a(y -a//+ )/a((ed 'or today $6$1 21!* 1!29
hae & eer-&+ed ye+terday #""6 1!29 1!29%y -urre(t )u/+e &+ (&(ety $"29 1!29 2"#$
re%oe the re%&(der & had at e&ht $$!* !2"6 #$1!
o' -our+e "2$1 $1!* $1!*
do & hae a(yth&( to do at (&(e )% to%orrow 91#$ 90!" 10000
-ha(e the re%&(der 'ro% +ee( to (&(e $#"6 $1!* $1!*
& do(t (eed to ta.e %y )&//+ at (&(e a(y%ore 901! $1!* 10000
-a( you re)eat that "##$ "09# $1!*
-a( you -he-. &' &e ta.e( %y )&//+ /a+t wee. "$29 $1!* 1!29
%y heart rate &+ +ee(ty (&(e 91#$ $1!* 2"#$
(o 90$1 "#$1 "#$1
%y /ood +uar &+ h&h 91"0 9!$" $1!*
Average percentages 82.79 66.4 72.87
![Page 18: LucrareSesiuneComunicari Final](https://reader030.vdocumente.com/reader030/viewer/2022021222/577c78331a28abe0548f0f87/html5/thumbnails/18.jpg)
7/25/2019 LucrareSesiuneComunicari Final
http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 18/19
SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ STUDEN E TIȚ Ș , 2016
Appendix
Intent escription 6xamples
6nvironment Interactions &his intent is used for mIinteractions.
switching on@off the lights,turning on@off the airconditioning, raising@lowering
the blinds and others
6nvironment Kueries +sed for environmental informationretrieval.
room temperature, roomhumidity, room luminosity andothers.
Medical AactsInformation
+sed for adding new informationregarding medical facts.
0eart rate, blood sugar level, blood pressureMedical Aacts Kueries +sed for #ueries regarding medical
facts.
Medication Information +sed for adding new informationsabout medical treatments. &aking medication on time,
forgetting to take medications.
Medication Kueries +sed for #ueries regarding themedical treatment.
5hysical 6xercisingInformation
+sed for adding new informationabout physical exercising.
&ime and duration of the work out.5hysical 6xercising
Kueries+sed for receiving informationsabout physical exercising.
*epeat +sed for asking the system to
repeat something.
)
'ocial Interactions +sed for understanding generalsocial interactions.
0ello, goodbye, see you later.
![Page 19: LucrareSesiuneComunicari Final](https://reader030.vdocumente.com/reader030/viewer/2022021222/577c78331a28abe0548f0f87/html5/thumbnails/19.jpg)
7/25/2019 LucrareSesiuneComunicari Final
http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 19/19
SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ STUDEN E TIȚ Ș , 2016
$eferences
[1] M.)%. :ourguet, Qesigning and 5rototyping Multimodal <ommandsF. 5roceedings of
0uman)<omputer Interaction,Q Kueen Mary, +niversity of %ondon, Mile 6nd *oad,
%ondon, 61 3-', +8, %ondon, BB2.
[] *. %Rpe>)<R>ar, S. <alleLas, . riol and ;. A. Kuesada, Q*eview of spoken dialogue
systems,Q Louens ! vol. 1, no. , B13.
[2] %. =livier and 5. =liver, ata)riven Methods for daptive 'poken ialogue 'ystems,
%ondon( 'pringer -ew Oork, B1.[3] -. ?ebb, *ule):ased ialogue Management 'ystems, 'heffield, +8( +niversity of
'heffield, +8, BBB.
[4] A. Matthew and %. =livier, Q*ecent research advances in reinforcement learning in
spoken dialogue systems,Q in The Dno6led'e En'ineerin' 5evie6, <ambridge,
<ambridge +niversity 5ress, BBC, p. 294T3BD.
[7] 6. 0. 0ovy, Q5ragmatics and -atural %anguage eneration,Q in Artificial Intelli'ence,
'outhern <alifornia, 6lsevier :.P., 1CDC, pp. 142)1C9.
[9] ;. *aheLa, . <haudhary and 8. 'ingal, &racking of Aingertips and <entre of 5alm using
8I-6<&, Malaysia( 2 rd I666 International <onference on <omputational Intelligence,
Modelling and 'imulation, B) 'ep, B11.
[D] Qhttp(@@wiki.speech.cs.cmu.edu@[email protected]@&utorialU1,Q [=nline].[C] . <osgun, M. :unger and 0. <hristensen, Qccuracy nalysis of 'keleton &rackers for
'afety in 0*I,Q eorgia &ech, tlanta, , +', B12.
[1B] '. 8arel, P. 'chindler, M. 'ykora and =. ostal, Q8alman Ailter Improvement for
yroscopic,Q IntlJ Kournal of Comutin'! Communications Instrumentation En''&
=IKCCIE> ""N ,.12*14 EI""N ,.12*188 ! vol. 1, no. 1, pp. C7)C9, B13.
[11] . M. <haikin, Qeorge Merrill <haikin,Q in Comuter 7rahics and Ima'e Processin' ,
vol. 2, -ew Oork, 1C93, pp. 237)23C.
[1] 8. I. ;oy, Q<0I8I-/' %=*I&0M' A=* <+*P6',Q epartment of <omputer
'cience, +niversity of <alifornia, avis, <alifornia, avis, B1.