lucraresesiunecomunicari final

19
SESIUNEA DE COMUNICĂRI  ȘTIIN IFI CE Ț  STUDEN E TI Ț Ș , 2016 Multimodal Interface Alexandru-Flor in Gavril University POLITEHNICA of Bucharest  Faculty of Automatic Contro l and Comuters! Comuter "cience #eartment  Emails$ ale%andru&'avril(cti&u)&ro Keywords: conversational agents, ambient intelligence, ambient assisted living, speec recognition, natural language understanding !" Introduct ion Multimodal interaction provides the user numerous ways of interacting with a system. It is usually possible to be naturally integrated with systems for ambient intelligence, ambient assisted living, personal assistants and many others while maintaining a good user experience. Interactions with the system include the natural modes of communication like handwriting, speech, body gestures and graphics, as described in [1], and the classical graphic user interface or command line interface. mbient Intelligent !mI" systems use data ac#uisition devices and actuators to offer an environment which is sensitive and reactive to the user$s presence. mb ient ssisted %iving !%" aims to make use of the mI technologies to provide better living conditions for older adults and people unable to sustain an independent way of living. &his paper proposes a multimodal interface for such systems, which allows interaction through hand control, spoken language and a visual interface. #" $elated %or&  'poken language interactions are usually built using five main modules( ) utom atic 'pee ch *ecogniti on ! '*" ) 'poken %anguage +nde rst andi ng !'%+" ) ialo g Ma na gemen t! M" ) -a tu ra l %ang ua ge enerat ion !-%" ) &ext )to) 'pee ch sy nt he si s !&&'" utomatic speech recognition module has the goal of providing a recognition hypothesis of the user/s speech. Many approaches have been investigated to solve the problem of speech recognition. &he most commonly used today is the stochastic method, which is based on acoustic and language models corresponding to a given language. 0idden Markov Models are usually used in representing the acoustic models. &he language models are usually automatically generated from processing large corpus of data. nother approach to '* is the  -)best recognition. &his method provides additional hypotheses. &his method is commonly used by speech dialog systems because it enables the s ystem to choose the utterance that fits  best as described in []. '* module then outputs the utterance to the '%+ module. &he goal of the '%+ module is to obtain the user$s intent and key entities from the utterance. nother goal of the '%+ module is to correct errors made by the '*, for which multiple techni#ues can be used like relaxing the grammars, focusing on key entities and employing statistical approaches  [2]. &his simplest way of intent and entities extraction is using regex expressions.

Upload: alexandru-gavril

Post on 01-Mar-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LucrareSesiuneComunicari Final

7/25/2019 LucrareSesiuneComunicari Final

http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 1/19

SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ  STUDEN E TIȚ Ș , 2016

Multimodal Interface

Alexandru-Florin Gavril

University POLITEHNICA of Bucharest 

 Faculty of Automatic Control and Comuters! Comuter "cience #eartment 

 Emails$ ale%andru&'avril(cti&u)&ro

Keywords: conversational agents, ambient intelligence, ambient assisted living, speec

recognition, natural language understanding

!" Introduction

Multimodal interaction provides the user numerous ways of interacting with a system. It is

usually possible to be naturally integrated with systems for ambient intelligence, ambient

assisted living, personal assistants and many others while maintaining a good user experience.

Interactions with the system include the natural modes of communication like handwriting,

speech, body gestures and graphics, as described in [1], and the classical graphic user

interface or command line interface.

mbient Intelligent !mI" systems use data ac#uisition devices and actuators to offer an

environment which is sensitive and reactive to the user$s presence. mbient ssisted %iving!%" aims to make use of the mI technologies to provide better living conditions for older

adults and people unable to sustain an independent way of living.

&his paper proposes a multimodal interface for such systems, which allows interaction

through hand control, spoken language and a visual interface.

#" $elated %or&  

'poken language interactions are usually built using five main modules(

) utomatic 'peech *ecognition !'*"

) 'poken %anguage +nderstanding !'%+"

) ialog Management!M"

) -atural %anguage eneration !-%") &ext)to)'peech synthesis !&&'"

utomatic speech recognition module has the goal of providing a recognition hypothesis of

the user/s speech. Many approaches have been investigated to solve the problem of speech

recognition. &he most commonly used today is the stochastic method, which is based on

acoustic and language models corresponding to a given language. 0idden Markov Models are

usually used in representing the acoustic models. &he language models are usually

automatically generated from processing large corpus of data. nother approach to '* is the

 -)best recognition. &his method provides additional hypotheses. &his method is commonly

used by speech dialog systems because it enables the system to choose the utterance that fits

 best as described in []. '* module then outputs the utterance to the '%+ module. &he goal

of the '%+ module is to obtain the user$s intent and key entities from the utterance. nothergoal of the '%+ module is to correct errors made by the '*, for which multiple techni#ues

can be used like relaxing the grammars, focusing on key entities and employing statistical

approaches [2]. &his simplest way of intent and entities extraction is using regex expressions.

Page 2: LucrareSesiuneComunicari Final

7/25/2019 LucrareSesiuneComunicari Final

http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 2/19

SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ  STUDEN E TIȚ Ș , 2016

ialog management receives the output from the '%+ and decides what the system should do

next in response to the user/s input. It usually #ueries a local database and keeps a dialoghistory. -umerous approaches can be found in the literature for carrying out dialogue

management such as rule)based systems [3], plan)based or based on statistical reinforcement

learning [4]. -atural language generators were considered the easiest part of a spoken dialog system in the

fifties, since a generator based on user input was easy to do using sentence templates. &he

complexity of this task comes from providing the user different generated output. n example

of a system which produces stylistically appropriate texts from a single story representation

under various settings that model pragmatic circumstances is 5+%I-6 [7].

0and tracking using the Microsoft 8inect evice is facilitated by the skeleton data

ac#uisition. *esearches in tracking fingertips and palm center have been done by putting a

threshold on depth of the hand points and applying a big circular filter on the resulted image[9].

'" (ystem arcitecture

&he proposed interface is composed of three interaction modules( speech, hand tracking and a

graphical user interface. &he speech and hand tracking modules are intended to be used

together, most of the time, however, they are designed to function independently of each

other.

&he speech module is composed of five main parts( audio preprocessing, automated speech

recognition, natural language understanding !-%+", dialog manager and text to speech.

0and tracking is done using a Microsoft 8inect device which provides *: data, depthimaging and user tracking in the form of a 2 skeleton position in key points like arms, legs,

head and body.

 Fi'ure * + "ystem Architecture

&he system runs as a -ode ;' server, which provides the webpage in which the user can

interact with the system. &he -ode ;' servers is responsible with all 8inect interactions such

as reading the position of the hand, transforming it to <oordinate and broadcasting it, in

real)time, to all connected clients.

Page 3: LucrareSesiuneComunicari Final

7/25/2019 LucrareSesiuneComunicari Final

http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 3/19

SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ  STUDEN E TIȚ Ș , 2016

&he 8inect device is integrated using -ode)=pen-I1 which provides the position of the right

hand from the skeleton data ac#uired using =pen-I. &he position is then processed to confera more stable coordinate and broadcasted to all connected. 6ach client will process the

coordinate to show the same position of the cursor related to its screen si>e.

If the 8inect is used for speech recognition a custom made driver  must be used for installing

the Microphone rray.

&he system integrates two similar 5Is for 'peech *ecognition( ?it.ai2 and Microsoft :ing

'peech *ecognition @ %uis.ai3. :oth use *6'& 5Is re#uests for speech recognition and

 provide a natural language processing of the provided utterance. 'peech recognition is done

on the client side such that the system can be used on portable devices with internet

connection. &he processed sentence is then sent to the Microsoft :ot Aramework, which is a

matching system that will process the #uery, finds the necessary information, then output the

answer to the user re#uest as text and speech.

&he system also integrates two text to speech systems( Mary&&' which runs offline and

Microsoft &ext to 'peech 5I for online #ueries.

)" (po&en *anguage Interaction

&he ability of interacting using spoken language re#uires less effort for communication and

improves the user experience. lthough speech recognition is a complex task, the natural

language understanding and the dialog manager both play a big part in enabling interactions

with the system.

 Fi'ure , + "o-en Lan'ua'e Architecture

)"!" Automatic (peec $ecognition +A($

'peech interaction is very common nowadays and very used in smartphones, call steering,

user authentication and many others. speech recognition system used in mbientIntelligence should not be only limited to a domain specific knowledge base, even though the

interaction with the system might be restricted to simple tasks like turning on the lights,

raising the drapes and so on. +sing a limited grammar based speech recognition system will

lead to a lower user experience. n example of limited grammar speech recognition system is

<M+ 'phinx. It is an open)source speech recognition system first developed by <arnegie

Mellon +niversity in BBB. It has been tested with different configurations and increasing the

grammar to provide a better user experience was decreasing the success of the recognition.

&he response time however is very good since <M+ 'phinx was developed for low)resource

1  https(@@github.com@pgte@node)openni

https(@@git.ao.it@kinect)audio)setup.git@2 https(@@www.wit.ai@

3 https(@@www.luis.ai@

Page 4: LucrareSesiuneComunicari Final

7/25/2019 LucrareSesiuneComunicari Final

http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 4/19

SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ  STUDEN E TIȚ Ș , 2016

 platforms. ood results were reported using <M+ 'phinx by training the acoustic model of

the speech recognition system, yet this was outside of scope.'peech recognition 5Is are now broadly used. 'ome examples of such 5Is are oogle

'peech 5I, ?it.ai and Microsoft :ing 'peech *ecognition 5I. &hey all offer good response

time while maintaining a high success rate of recognition. Most of them offer a #uery limited

free to use subscription and '8s for speech ending detection.

A($ valuation

&ests have been done on a set of CC user utterances from D different users on the ?it.ai 5I.

&he testing procedure is done using the fu>>ywu>>y 4 python library which uses %evenshtein

istance to calculate the differences between the speech recognition result and the phrase

text. &he results of the testing can be seen in ppendix . ?it.ai had a successful recognition

 percentage of D.

&he results show that long sentences such as Eplease put a reminder to take my pills at eight

this eveningF provided a low average detection score. 'ome words such as FtakenF, FworkedF

were sometimes interpreted as similar phonetic words EtakeF, EworkF but did not influence

the correct interpretation of the utterance. &he best batch of detections !C4G correct '*

detections" provided an DD.DG accuracy in the interpretation of the user/s intent and D2.9G

accuracy in the entity extraction, while the lowest rate of '* !9G" provided a 73.7G

accuracy in intent interpretation and a 43.G in correct entity extraction.

)"#" (po&en *anguage .nderstanding +(*.

 -atural language processing is necessary for the speech interaction by offering language

understanding support. &he system receives a plain text utterance, determine the intent of the

user and extracting the relevant entities, then outputs it to the ialog 'ystem. &he simplest

way of processing the phrase is by using re'e% expressions to extract the relevant information.

&his incurs a limitation in user expressivity, an overhead in the matching system and low

flexibility to changes in the training data. 6nsuring a better user experience re#uires a more

complex natural language processing of the utterance. &wo natural language processing 5Is,

?it.ai and %uis.ai, were trained with % and MI domain specific #ueries. &he intents and

entities are described in ppendix :. &he training is done by labeling each sentence with the

user intent and each relevant word for the understanding of the utterance with the specific

entity that defines it. &he dialog system will be triggered by the intent of the utterance and

will match the entities so that it can correctly process the phrase.

(*. valuation

&ests have been done on the set in ppendix . &he utterance recogni>ed by the '* was

 processed by the ?it.ai '%+ system and compared to the results received with the original

text #uery. &he average percentage of successful intent recognition was, 9G while the

 percentage of successful entity recognition was 77G.

4 https(@@pypi.python.org@pypi@fu>>ywu>>y

Page 5: LucrareSesiuneComunicari Final

7/25/2019 LucrareSesiuneComunicari Final

http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 5/19

SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ  STUDEN E TIȚ Ș , 2016

&he dialog manager first matches the user intent, then matches the entities. &his implies that a

low detection accuracy in intent recognition will mean wrong interpretation of the wholeutterance, while missing entity extraction will simply make the manager ask for the missed

entities. &he results show that the accuracy of the '* system is proportional with the

accuracy of intent interpretation and entity extraction as can be seen in Aigures [2,3,4,7].

Page 6: LucrareSesiuneComunicari Final

7/25/2019 LucrareSesiuneComunicari Final

http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 6/19

SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ  STUDEN E TIȚ Ș , 2016

 Fi'ure . + Avera'e Levenshtein "core and /Entities 0atched + First )atch

 Fi'ure 1 2 Avera'e Levenshtein "core and /Intent detected + First )atch

 Fi'ure 32 Avera'e Levenshtein "core and /Entities 0atched 2 "econd )atch

 Fi'ure 4 2 Avera'e Levenshtein "core and /Intent detected + "econd )atch

Page 7: LucrareSesiuneComunicari Final

7/25/2019 LucrareSesiuneComunicari Final

http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 7/19

SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ  STUDEN E TIȚ Ș , 2016

)"'" /ialog (ystem

&he ialog 'ystem is the core of the speech interaction. It is responsible with extracting the

relevant data after matching the processed utterance and offer a response to the user. =nce the

intent is determined by the natural language processing module, the system should be able to

differentiate between a #uery, in which it should gather the necessary data and provide a text

output to the user and@or to the &ext)&o)'peech 'ynthesis module, a command in which it

should trigger the MI 'ystem Interface and data insertion in which the system should save

the relevant data about the user. &hree main ialog 'ystems were evaluated( *aven<law,

=penial and Microsoft :ot Aramework.

)"'"!" General description

 5avenCla6 + Carne'ie 0ellon University 4 

*aven<law is a dialogue manager build by <M+ and it is part of the =lympus dialogue

system. It manages the dialog using two data structures, the task tree and the agenda. &he task 

tree is a plan for performing specified tasks in a given domain. &he task tree should include

each of the activities that humans may reasonably want to undertake using the system. It also

 permits the user to dynamically modify the task tree that is usually constructed by the

developer but it still is a limited way of planning.

*aven<law is placed above <M+ 'phinx for speech recognition and 5hoenix for natural

language understanding, that uses <A grammars for parsing the user utterance, in the

=lympus complete framework for spoken dialog systems.

Oen#ial + Lan'ua'e Technolo'y 7rou! University of Oslo 8 

=pen)ial is a dialogue manager that uses :ayesian -etworks to represent the dialogue

states. It combines the benefits of logical and statistical approaches of dialogue modelling. It

relies on probabilistic rules to represent the domain models.

 0icrosoft Bot Frame6or- 9

Microsoft :ot Aramework is a newly launched platform for building interactive chat bots. It

easily integrates powerful I Arameworks like %+I' or ?it.ai for speech recognition and

natural language preprocessing.

)"'"#" 0ontext matcing and spo&en language understanding +(*.

In *aven<law the mapping of the context is done by using grammar mappings. grammar

mapping is a list of one or more grammar mapping element, each with a single grammar slot

name, and optional scope and binding filters or normali>ations. &his is done by 5hoenix

framework from the =lympus system.

7  http(@@wiki.speech.cs.cmu.edu@[email protected]@*aven<law9  http(@@www.opendial)toolkit.net@

D  https(@@bots.botframework.com@

Page 8: LucrareSesiuneComunicari Final

7/25/2019 LucrareSesiuneComunicari Final

http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 8/19

SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ  STUDEN E TIȚ Ș , 2016

=penial uses a more advanced pattern matching on the user utterance for the system to

correctly understand it. 6ach state contains a set of utterances that can be matched withdifferent probabilities. It identifies entities in the phrase by matching them to variables in the

text, for example in the phrase E&ake the H=:;F the system can match the =:; word to

anything and pass it to the next state in the system.

&he bots build with the bot framework are stateless which helps them easily scale. &he

context mapping is done by ?it.ai @ %+I' by using the Model &raining. :oth ?it.ai and %+I'

are trained using labeled user utterances. 6ach utterance is categori>ed into Intents and

labeled with key entities that the developer wants the system to understand !e.g. Eid I take

my medications todayJF could be interpreted by a Kuery Intent, with entities like

=bLectMedications, ate&ime &oday, ction&ake. fter training the bot will receive a

labeled user utterance with Intent , Entities and "core in the form of a ;'=- that will be

matched against the predefined rules. &he benefits of this approach is that the system need a

limited number of training utterances, unlike a 5=M5, yet is limited by the number of

defined rules of matching.

)"'"'" 1ransition between te dialog states

In *aven<law transitions between the dialog phases are done by the agents in the agenda. &he

agenda is a list of agents that is used to dispatch inputs to appropriate agents in the task tree.

&he agenda is recomputed on every turn, placing the current agent !the focus of the

conversation" on the top of the list. ?hen a match occurs the corresponding agent is activated.

 Fi'ure 8 + 5avenCla6 0yBus Tas- Tree :9;

In =penial the transitions between states is done using a :ayesian -etwork. 6ach state has a

limited number of utterances that the user can tell the system. fter the system correctly

matches the utterance, it will go to the next state.

&he :ot Aramework is stateless. 6ach utterance will be matched against a predefined rule

regardless the state, yet it can be extended to add states for having a continuous dialog by

limiting the types of rules to be matched in a given state(

Page 9: LucrareSesiuneComunicari Final

7/25/2019 LucrareSesiuneComunicari Final

http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 9/19

SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ  STUDEN E TIȚ Ș , 2016

 << User "tarts #ialo' 

2ello

 << "ystem resonse + the system 6ill no6 6ait for a name of the user! forcin' the user to only

use one tye of utterance that can )e matched as a Name entity

2i3 %at is your name4

5on

 << If the user 6ould have not used a valid name =matched )y the NLP rocessor> the system

6ould have as-ed a'ain

2ello 5on3

 E%amle * + 0icrosoft Bot Frame6or- #ialo' E%amle

&he bot has three different types of states. &he simplest one is the Closure state in which the

system will answer the same way regardless of what the user said. &his is useful for ending

the dialog, or a different state for answering the user phrases like ?7lad I@ve could )e

useful&! ?7ood)ye etc. &he second one is the aterfall  state in which the system processes

a certain phrase as a continuous dialog in which the answer of the user will be the input to the

next rule !Aig.". &he third and most powerful one is the #ialo' O)ect  state that is using the

%anguage 5rocessing to match the phrase. &his could be combined with the aterfall  state for 

a more complex ialog 'ystem.

)"'")" Adaptability to context canges

*aven<law uses the genda as a list of ordered agents, yet it is limited to the way the task

tree was created. If the user is not answering the #uestion addressed by the system and

suddenly changes context, the system will not be able to adapt itself.

=penial cannot be adapted to sudden context changes. &he user has to finish the started

dialog before starting a new one, because the system is limited to the defined utterances it can

understand in the given state !the actions it can do in the state of the :ayesian -etwork".

:ecause the :ot framework is stateless, even if the user suddenly changes context, the system

can process the new #uery independently. If the user is in a ?aterfall state, the system will

choose between answering the #uery as a newly matched #uery or by answering the default

answer from the ?aterfall state.

)"'"6" Integrating user feedbac& 

*aven<law is a part of a larger framework and can be hardly adapted to any changes to the

way it was build. Integrating user feedback can only be used as a way of reordering the

genda. If the user is not satisfied with an answer, the agenda could bring the next agent in

the front to react to the user phrase.

If the user is not satisfied with the answer of the system, the system will get stuck in a state

that it could not recover from.

+ser feedback is not part of the bot framework, yet it can be adapted to match some phrases

with a higher probability.

Page 10: LucrareSesiuneComunicari Final

7/25/2019 LucrareSesiuneComunicari Final

http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 10/19

SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ  STUDEN E TIȚ Ș , 2016

)"'"7" rror andling

In *aven<law, if the system has answered in a wrong way, the system cannot recover until the

task tree was fully processed, but if for example the system is in a state where it should get the

=rigin and the estination of the user !Aigure etKuery'pecs 'tate", the user can choose to

answer by telling the =rigin, estination or both and the agent will adapt itself and ask the

user the rest.

=penial have a default action that the system always chooses when it cannot match the

utterance. &his state usually is a state in which the system will ask the user to repeat himself,

yet as the *aven<law framework, it can process the information by filling the information

one by one.

In the bot framework, if the system cannot understand the phrase of the user, it will simply

move to a default state !a <losure state" that will ask the user to rephrase, or it will choose adefault action, based on what it could understand from the phrase.

6" 2and trac&ing interactions

0and tracking interaction is done using a 8inect device, which provides the 2 coordinate of

the right hand using -ode)=pen-I library. &he coordinate is then transformed to a

coordinate using classical viewport transformations and affine transformations. &he

coordinate is then broadcasted to all connected clients. 6ach client scales the position received

to the screen dimensions and will provide a cursor to interact with the +I.

 Fi'ure 9 + Hand Trac-in' Interactions

6"!" Initial coordinate smooting

'keleton tracking algorithms provide positions of important Loints on the human body. &he 0icrosoft Dinect for indo6s * device can track up to B Loints. 'tudies [C] shows that the

error in skeleton movement for hand tracking can go up to 13. cm. +sing the hand as a

cursor needs a way of smoothing the movements, especially in still position for clicking and

interacting with the +I.

Page 11: LucrareSesiuneComunicari Final

7/25/2019 LucrareSesiuneComunicari Final

http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 11/19

SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ  STUDEN E TIȚ Ș , 2016

90

95

100

105

110

115

120

125

Kinect Still Hand Coordinate on X

 Fi'ure + Dinect still hand coordinate on a%is

102

104

106

108

110

112

114

Kinect Still Hand Coordinate on Y

 Fi'ure *G2 Dinect still hand coordinate on a%is

'moothing the coordinate is done using a buffer of the last 1B coordinates. s a baseline, the

average of the coordinates in the buffer was computed. &he results can be seen in Aigure 11 )

8inect N coordinate and average computed coordinate.

Page 12: LucrareSesiuneComunicari Final

7/25/2019 LucrareSesiuneComunicari Final

http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 12/19

SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ  STUDEN E TIȚ Ș , 2016

-200

-100

0

100

200

300

400

500

600

700

Average Coordinate

X Kinect X Average

 Fi'ure ** 2 Dinect coordinate and avera'e comuted coordinate

&hen a 8alman filter on N and O axis was used to smooth the coordinates. &his level of

smoothing was still not enough for a proper user experience. 8arel 'lavicek, Pladimir

'chindler, Martin 'ykora, and =tto ostal described a method of using a 'moothed 8alman

Ailter [1B] for improving the gyroscopic mouse movements. &he usual 8alman filter is

modified by using an average of five consecutive values for the predicted value. 'tarting from

the method they described, I used the Chai-in "moothin' Al'orithm [11] for smoothing theresults of the 8alman filter. <haikin/s method uses the corner)cutting paradigm for generating

curves directly from a set of control points by cutting the corners in the initial se#uence.

&he algorithm was applied twice, once on the initial point

se#uence and then on the resulting points. Arom the

resulted set of points, the middle point was used as a final

 point. <haikin/s corner cutting algorithm can be easily

computed and can be used for the real)time coordinates

received from the 8inect. &he final results can be seen in

Aigure 12) 8alman filter with <haikin smoothing and

Aigure 13 ) 8alman filter with <haikin smoothing on O

axis. Fi'ure *, 2 Chai-in smoothin' al'orithm e%amle :*,;

Page 13: LucrareSesiuneComunicari Final

7/25/2019 LucrareSesiuneComunicari Final

http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 13/19

SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ  STUDEN E TIȚ Ș , 2016

-200-100

0

100

200

300

400

500

600

700

Kalan !ilter "it# C#ai$in Soot#ing

X Kinect X Cali%rated

 Fi'ure *.2 Dalman filter 6ith Chai-in smoothin' on a%is

-300

-200

-100

0

100

200

300

400

500

Kalan !ilter "it# C#ai$in Soot#ing

 Y Kinect Y Cali%rated

 Fi'ure *1 2 Dalman filter 6ith Chai-in smoothin' on a%is

6"#" 8iewport transformations

7" 0onclusions

&he paper presented a multimodal interface, which uses spoken language, hand gestures and a

visual interface to provide the user an improved experience.

&he spoken language interactions are done using '*, -%+, a dialog management module

and a &&' module. &he results shown that *6'& 5Is can successfully accomplish '* and

 -%+, while multiple dialog management techni#ues can be used for matching the user

sentence and provide a valid answer.

Page 14: LucrareSesiuneComunicari Final

7/25/2019 LucrareSesiuneComunicari Final

http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 14/19

SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ  STUDEN E TIȚ Ș , 2016

0and tracking is done using a 8inect device, which provides 2 coordinates for the hand

 position in real)time. &he 2 coordinate is then converted to a coordinate and works as acursor device for interacting with the +I.

9" Future wor&  

&o provide a better spoken dialog module the dialog manager should be able to match

advanced #ueries to the system.

&he hand tracking interaction needs to be extended with a hand calibration module for 

stabili>ing the hand position.

&he interface can be integrated as part of the MI)%ab in +niversity 5=%I&60-I< of:ucharest Aaculty of utomatic <ontrol and <omputers, <omputer 'cience epartment. &he

laboratory acts like a monitoring % system for elderly people.

Page 15: LucrareSesiuneComunicari Final

7/25/2019 LucrareSesiuneComunicari Final

http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 15/19

SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ  STUDEN E TIȚ Ș , 2016

Appendix A (peec 1esting $esults

UtteranceAverage Lev.

Score% Entitiesmatched

% IntentDetected

how are you today 9100 !2"6 "#$1

a% & 'ree at (&(e )% today "1$1 $1!* $1!*

th&+ &+ +tu)&d 9""6 92"6 10000

do (ot re%&(d %e a(yth&( 6"00 21!* 2"#$

tur( o'' the a&r -o(d&t&o(&( at e&ht today $900 #$1! "#$1

re%&(d %e to ta.e %y )&//+ a'ter & wor. out "0"6 $1!* "#$1do you .(ow what wa+ %y )u/+e ye+terday "*00 *#$1 #$1!

+ay &t aa&( $*29 "#$1 $1!*

+w&t-h o'' the /&ht+ at (&(e th&+ ee(&( $"1! #$1! "#$1

(o & d&d (ot "!00 "#$1 "#$1

ye+ & d&d "!$1 "#$1 "#$1

a''&r%at&e #1#$ "#$1 #$1!

-a( you )/ea+e tur( o'' the a&r -o(d&t&o(&( "6#$ 6!29 "#$1

re%&(d %e to ta.e %y )&//+ at (&(e ""00 666$ "#$1

(o tha(. you 10000 10000 10000

-/o+e the /&(d+ 9"1! 9#2! 10000o)e( the /&(d+ at (&(e a% today $61! 90!" "#$1

)/ea+e re%&(d %e to eer-&+e th&+ ee(&( ""!* 90!" $1!*

+ee you /ater 99!* 92"6 10000

)/ea+e +w&t-h o( the /&ht+ 9*00 10000 10000%oe the re%&(der 'ro% +& a% to (&(e th&+ee(&( $*"6 #$1! $1!*

what wa+ %y )u/+e two day+ ao "6#$ #$1! 10000

&e +tarted eer-&+&( $$00 "#$1 $1!*

%y )u/+e &+ +ee(ty e&ht "*00 #000 $1!*

)/ea+e re)eat that 92!* 10000 10000

do & hae a(yth&( +et 'or e&ht )% today "!1! "#$1 #$1!

)/ea+e +tart the a&r -o(d&t&o(&( $!00 #$1! 10000

hae & re-e&ed a(y -a//+ today $9$1 2"#$ 1!29

what wa+ %y /y-e%&a /a+t wee. $0!* 21!* 2"#$

)/ea+e +ay that aa&( "21! 92"6 10000

whe( d&d & /a+t wor. out "$"6 1!29 !2"6

d&d & 'oret to ta.e %y )&//+ today "929 $1!* "#$1

re%oe a// the re%&(der+ 6##$ 92"6 2"#$

&e 3u+t '&(&+hed eer-&+&( "6"6 $"#$ "#$1

tha(. you "0"6 $1!* "#$1what wa+ %y %ea( /ood +uar /ee/ th&+ wee. 9*1! 666$ $1!*

Page 16: LucrareSesiuneComunicari Final

7/25/2019 LucrareSesiuneComunicari Final

http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 16/19

SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ  STUDEN E TIȚ Ș , 2016

do (ot tur( o( the a&r -o(d&t&o(&( at +ee( "100 #$1! $1!*

& d&d(t a+. that 9"00 92"6 10000

(o & 'orot 6"1! "#$1 "#$1)/ea+e )ut a re%&(der to ta.e %y )&//+ at e&htth&+ ee(&( $200 2"#$ 2"#$

he//o 10000 10000 10000

what wa+ %y aerae )u/+e th&+ wee. "600 #2*" "#$1

-a( you )/ea+e tur( o'' the /&ht+ &( the roo% "6"6 690# "#$1

)/ea+e tur( o( the /&ht+ "900 "#$1 "#$1

d&d & ta.e %y %ed&-at&o(+ today "#1! #$1! "#$1

-erta&(/y !1#$ 1!29 #$1!

tur( o( the a&r -o(d&t&o(&( 91#$ "#$1 "#$1d&d & '&(&+h %y eer-&+&( ye+terday "2!* *#$1 2"#$

tur( o( the a&r -o(d&t&o(&( at +ee( $!$1 #2*" $1!*

-/o+e the /&(d+ at (&(e th&+ ee(&( 9*!* $"#$ 10000

what wa+ %y /a+t %ea+ured /ood )re++ure $9"6 !2"6 $1!*

)/ea+e +to) the a&r -o(d&t&o(&( 69!* 6!29 "#$1

ra&+e the /&(d+ 9!29 92"6 10000

hae & ta.e( %y )&//+ today $*#$ 6190 #$1!

do & hae a(yth&( to do today 96$1 10000 10000

what wa+ %y heart rate ye+terday 91#$ $"#$ "#$1

)/ea+e o)e( the /&(d+ "*!* "#$1 "#$1

&e '&(&+hed ta.&( )&//+ today $6"6 **** 1!29

& too. %y )&//+ 3u+t (ow $#!* **** !2"6

%y /ood )re++ure &+ +&tee( a(d (&(e ""!* $1!* "#$1

& ea( wor.&( out "6$1 6!29 10000

%y /y-e%&a &+ o(e hu(dred 6!00 2"#$ !2"6

)/ea+e )ut a re%&(der to eer-&+e at +ee( $*1! $"#$ !2"6

tur( o'' the /&ht+ 9#29 92"6 10000re%&(d %e to %ea+ure %y /y-e%&a a'ter & wor.out $##$ ""10 10000

how %a(y )&//+ do & hae to ta.e today $1#$ #$1! $1!*hae & eer-&+ed at a// th&+ wee. "000 !$62 $1!*

ye+ o' -our+e & d&d 9#"6 $"#$ 10000

/ower the /&(d+ 91!* "#$1 "#$1

+to) re%&(d&( %e eeryth&( 6"29 "#$1 $1!*

what d&d you +ay 10000 10000 10000

ye+ 9229 "#$1 "#$1

d&d & re-e&e a(y -a//+ today $6#$ !2"6 !2"6

&e 3u+t ta.e( %y )&//+ "9!* $1!* $1!*

& d&d(t u(der+ta(d "$$1 1!29 1!29

at what t&%e +hou/d & (et ta.e %y )&//+ $$!* 2619 #$1!tur( o'' the a&r $#$1 #000 "#$1

Page 17: LucrareSesiuneComunicari Final

7/25/2019 LucrareSesiuneComunicari Final

http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 17/19

SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ  STUDEN E TIȚ Ș , 2016

-a( you te// %e &' &e ta.e( %y )&//+ e&ht day+

ao ""#$ 666$ "#$1oodye 9*"6 6!29 10000

%y /ood +uar &+ /ow 90$1 "#$1 "#$1

& d&d(t et that 911! $1!* $1!*

&e eer-&+ed today 'or th&rty %&(ute+ $"1! !#2! "#$1

whe( +hou/d & ta.e %y )&//+ (et "9!* 6!29 10000

re%&(d %e to %ea+ure %y /ood )re++ure 6$#$ "#$1 "#$1

(eat&e $*1! "#$1 #$1!

& wo(t e eer-&+&( th&+ ee(&( $*$1 21!* 2"#$

do & hae a(y -a//+ )/a((ed 'or today $6$1 21!* 1!29

hae & eer-&+ed ye+terday #""6 1!29 1!29%y -urre(t )u/+e &+ (&(ety $"29 1!29 2"#$

re%oe the re%&(der & had at e&ht $$!* !2"6 #$1!

o' -our+e "2$1 $1!* $1!*

do & hae a(yth&( to do at (&(e )% to%orrow 91#$ 90!" 10000

-ha(e the re%&(der 'ro% +ee( to (&(e $#"6 $1!* $1!*

& do(t (eed to ta.e %y )&//+ at (&(e a(y%ore 901! $1!* 10000

-a( you re)eat that "##$ "09# $1!*

-a( you -he-. &' &e ta.e( %y )&//+ /a+t wee. "$29 $1!* 1!29

%y heart rate &+ +ee(ty (&(e 91#$ $1!* 2"#$

(o 90$1 "#$1 "#$1

%y /ood +uar &+ h&h 91"0 9!$" $1!*

Average percentages 82.79 66.4 72.87

Page 18: LucrareSesiuneComunicari Final

7/25/2019 LucrareSesiuneComunicari Final

http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 18/19

SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ  STUDEN E TIȚ Ș , 2016

Appendix

Intent escription 6xamples

6nvironment Interactions &his intent is used for mIinteractions.

switching on@off the lights,turning on@off the airconditioning, raising@lowering

the blinds and others

6nvironment Kueries +sed for environmental informationretrieval.

 

room temperature, roomhumidity, room luminosity andothers.

Medical AactsInformation

+sed for adding new informationregarding medical facts.

0eart rate, blood sugar level, blood pressureMedical Aacts Kueries +sed for #ueries regarding medical

facts.

Medication Information +sed for adding new informationsabout medical treatments. &aking medication on time,

forgetting to take medications.

Medication Kueries +sed for #ueries regarding themedical treatment.

5hysical 6xercisingInformation

+sed for adding new informationabout physical exercising.

&ime and duration of the work out.5hysical 6xercising

Kueries+sed for receiving informationsabout physical exercising.

*epeat +sed for asking the system to

repeat something.

)

'ocial Interactions +sed for understanding generalsocial interactions.

0ello, goodbye, see you later.

Page 19: LucrareSesiuneComunicari Final

7/25/2019 LucrareSesiuneComunicari Final

http://slidepdf.com/reader/full/lucraresesiunecomunicari-final 19/19

SESIUNEA DE COMUNICĂRI ȘTIIN IFICEȚ  STUDEN E TIȚ Ș , 2016

$eferences

[1] M.)%. :ourguet, Qesigning and 5rototyping Multimodal <ommandsF. 5roceedings of

0uman)<omputer Interaction,Q Kueen Mary, +niversity of %ondon, Mile 6nd *oad,

%ondon, 61 3-', +8, %ondon, BB2.

[] *. %Rpe>)<R>ar, S. <alleLas, . riol and ;. A. Kuesada, Q*eview of spoken dialogue

systems,Q Louens ! vol. 1, no. , B13.

[2] %. =livier and 5. =liver, ata)riven Methods for daptive 'poken ialogue 'ystems,

%ondon( 'pringer -ew Oork, B1.[3] -. ?ebb, *ule):ased ialogue Management 'ystems, 'heffield, +8( +niversity of

'heffield, +8, BBB.

[4] A. Matthew and %. =livier, Q*ecent research advances in reinforcement learning in

spoken dialogue systems,Q in The Dno6led'e En'ineerin' 5evie6, <ambridge,

<ambridge +niversity 5ress, BBC, p. 294T3BD.

[7] 6. 0. 0ovy, Q5ragmatics and -atural %anguage eneration,Q in Artificial Intelli'ence,

'outhern <alifornia, 6lsevier :.P., 1CDC, pp. 142)1C9.

[9] ;. *aheLa, . <haudhary and 8. 'ingal, &racking of Aingertips and <entre of 5alm using

8I-6<&, Malaysia( 2 rd I666 International <onference on <omputational Intelligence,

Modelling and 'imulation, B) 'ep, B11.

[D] Qhttp(@@wiki.speech.cs.cmu.edu@[email protected]@&utorialU1,Q [=nline].[C] . <osgun, M. :unger and 0. <hristensen, Qccuracy nalysis of 'keleton &rackers for

'afety in 0*I,Q eorgia &ech, tlanta, , +', B12.

[1B] '. 8arel, P. 'chindler, M. 'ykora and =. ostal, Q8alman Ailter Improvement for

yroscopic,Q IntlJ Kournal of Comutin'! Communications Instrumentation En''&

=IKCCIE> ""N ,.12*14 EI""N ,.12*188 ! vol. 1, no. 1, pp. C7)C9, B13.

[11] . M. <haikin, Qeorge Merrill <haikin,Q in Comuter 7rahics and Ima'e Processin' ,

vol. 2, -ew Oork, 1C93, pp. 237)23C.

[1] 8. I. ;oy, Q<0I8I-/' %=*I&0M' A=* <+*P6',Q epartment of <omputer

'cience, +niversity of <alifornia, avis, <alifornia, avis, B1.