APVV-0437-11
The presented project links
closely with other nationwide and, more notably, European projects. The research
team will utilize their experience and knowledge gained at several EU projects’
research (i.e. FP7 HBB-NEXT 2011-14, DAAD Audio-Speech Interface for Mobile
Devices (ASIMD)) as well as the 7 running nationwide projects. In the mentioned
projects, the research team solves partial problems which will be utilized in
the presented project. On the other hand, the project’s outcomes will be used
in other research team’s EU projects (i.e. LdV IMProVET 2011-14 and EU projects
under preparation: FP7 NEWTON, CIP Europeana for Education E4E, InterReg
FIDICE, etc.) by direct use of the presented project’s results to pursue the
goals of the mentioned projects.
Various types of hybrid
networks are currently used within the converged networks. Hybrid communication
networks arose by connecting various network types which utilize metallic,
optical and radio transfer media for transmission. Each network is
characterised by its data transfer parameters (reflecting the quality of
transfer media). In contrast with fixed networks, wireless networks use
transfer channel with significantly worse properties (packet loss, packet
delay). This massively influences performance and efficiency of the data
transfer, and ultimately the quality of services provided. The rise in capabilities
and capacities of the networks brings new multimedia services [T-1] which, on
the other hand, put strain on communication networks to provide services on
specific level of quality. These expectations rely on effective operation of
protocols on all layers of the protocol stack and their ability to cope with
different transfer parameters of the used communication technologies. The two
basic transport protocols of the current Internet - TCP and UDP - belong to
these protocols.
The TCP protocol offers many
communication services in the field of connection-oriented transfer, reliable
data delivery as well as flow and overload management. New types of networks and network
technologies (heterogeneous networks, hybrid networks, asymmetric networks,
wireless and satellite networks, optical transport networks and their
interconnections) lower the performance of this rather old protocol. Thus, new
solutions to improve its robustness against the factors mentioned and maintain
the required transfer quality are to be investigated [T-2].
The first aim of the project
concerning the field of effective transfer on the transport layer is
identification of main problems of the TCP protocol in various types of hybrid
networks and capability and drawback analysis of the latest of the TCP
protocol, i.e. TCP Veno, Westwood+, Adaptive Reno [T-3]. Concurrent to the
first aim, new simulation environment will be created capable of creating and
simulating various hybrid network scenarios and different modifications of the
TCP protocol (or, eventually, other transport protocols). When drawback of the
latest TCP protocol modifications are identified, an innovative modification
compensating limitations will be suggested. The modification will be
implemented to the simulation environment and tested over hybrid network and
provided services scenarios.
The basic method to solve the
outlined objectives is using the simulated application of the suggested
modification and its testing. Many scientists use this method all over the
world. An advantage is that the method is an affordable way to solve the
project's aims since physical realisation would require access to various
network types, end devices and complex firmware alteration.
Currently, a WEB interface is
a standard interface provided by different NGN network end devices. With
various types of HCI (i.e. audio and video) integrated into the interface
allows for considerable extension of interaction options between
services/applications and the user.
The converged networks of the
future is the most current topic concerning research of the Future Internet and
new concepts of network architecture. European Commission gives great
importance to solving the challenge of future networks. It is also expected to
be one of key topics for research support within the FP7 projects (see Call8
Topics). It is also part of Digital Agenda for Europe within the ICT research
in the horizon 2020. Several standardization institutions address the issue,
ITU-T and ETSI to mention a few.
The field of multimedia services
over converged future networks leads to fast, comfortable and secure sharing of
multimedia content over these networks. The user of these services expects,
apart from multimedia content sharing, exploitation of modern communication
services, i.e. voice, video and instant messaging, and reliable and secure and
trustworthy access to personal information (e-mail, e-bank, e-store, etc.). The
mentioned needs create a basis for requirements focused on:
- Security of the multimedia content provided
- User identification
- Trust of the service user
Identity management is an
essential part of every network platform. It is also a cardinal security
element of a network helping to build users' confidence toward the platform. It
may be observed currently that existing hybrid networks require existence of
several copies (or versions) of user service profiles, especially when the user
employs multiple HBB end devices or several services on different service
servers, respectively. One of the project's objectives is to suggest and
develop a framework which will allow service providers to create and deploy new
interactive applications.
We can divide the field of
multimodal interface into image and speech parts. The goal of the project is to
analyse the possibilities of a multimodal interface for natural communication
with the end devices. Attention will be paid to user identification, device
control, and the feedback from the device towards the user.
The scope of face recognition
is a live and investigated topic [1], [2], [3], [4]. Static face-image
recognition in controlled environment is successfully in use for biometric
applications. There are still many problems to be coped with when identifying
in uncontrolled environment (changes in lighting, pose, and face expressions),
recognising non-frontal face images, etc.
Methods based on face
recognition have high potential for they may be applied in greater distances
and lower image definition of the face and biometric features.
During the last ten years, the telecommunication
environment has changed. Convergence of the computer and telecommunication
networks brings the demand of new flexible multimedia services, which can allow
fast and comfortable access to the information needed. Modern trends in this
domain show that natural human speech will become preferred medium of the new
multimedia and multimodal telecommunication services in a relatively short
time. The modality is related to the sensor perception (sight, audition,
traction, gustation, osphresis). The modality is the communication channel
between the electronic device and the human user. Then, the multimodal
interface is the combination of several forms of sensor perception. Asset of
this project will be the elaboration and solving of the communication problems
between human and machine for Slovak speech. The scientific level of the
project corresponds with the state-of-the-art of the world research in the
current domain.
Systems for automatic speech recognition are growing
more popular, thus their direct employment in commercial application is
becoming more realistic. This research domain is very challenging due to its
complexity. Thus there are many famous research centres and laboratories
involved in it. At present, systems based on continuous hidden Markov models applied
to an isolated word recognition problem, working with vocabulary sizes of
several thousand words are well studied (especially for English) and are
achieving the best result. There are several modification of HMM (discrete,
continuous, semi-continuous, two or multistage HMM, HMMs with explicit time
modeling, ML, MAP, MMI learning rules, hybrid solution with NN, SVM, etc.), but
the most spread and successful ones are CD-HMM with tied context - dependent
models of phonemes. Nowadays one of the most challenging tasks is the
construction of recognition systems designed to cope with continuous speech in
real conditions. Even despite huge effort that has been done over last several
decades there are lot of open issues. More details can be found [1].
The general problem of speaker recognition can be
divided into several categories like: speaker identification (closed group) and
speaker verification (open group), text dependent and text independent. Another
way how the systems are classified is according to the information they are
using: acoustical information, prosody, and linguistic information (dialect,
social background, etc.). The most frequently used speech features are: MFCC,
In the domain of speech synthesis, the goal is to
interconnect the high-level synthesis with the low-level synthesis [K-4]. After
that, to adapt and optimize the compression method [K-5] for corpus speech
synthesis and to propose the method for prosodic modification of Slovak speech
and for prosody prediction [K-3]. Next step is to design a method for speech
synthesis with emotions. For natural and intelligible speech, it is also
necessary to incorporate the system of the distribution of abbreviations,
acronyms and numerals. In the perspective of effectiveness and particularity of
Slovak language, it is needed to search for new functional methods to implement
this part of the project. The proposed goals serve as the basis for multimodal
communication interface.
Yet another interface topic is control of the device by
brain (brain-computer interface – BCI). In
the present time BCI represents a new and promising interface to extend
possibilities of multimodal end devices, i.e. intelligent TVs. These may be
used as end devices for NGN networks. BCI integration into HCI is a significant
extension of HCI possibilities in the scope of user-service and user-application
interaction.
Publication references
[1] Rabia Jafri, Hamid R. Arabnia: A Survey of Face Recognition
Techniques, Journal of Information Processing Systems, vol. 5, no. 2, pp.
41-68, 2009
[2] Delac, K.; Grgic, M. & Bartlett, M., S. (Eds.) (2008).
Recent Advances in Face Recognition, IN-TECH, Vienna, Retrieved from
http://intechweb.org/book.php?id=101
[3] Li, S. Z. & Jain, A. K. (Eds.) (2005). Handbook of Face
Recognition, Springer, ISBN# 0-387-40595-x, New York
[4] Oravec, M.; Mazanec, J.; Pavlovicova, J.; Eiben, P. &
Lehocki, F. (2010). Face Recognition in Ideal and Noisy Conditions Using
Support Vector Machines, PCA and LDA, In: Face Recognition, Milos Oravec (Ed.),
ISBN: 978-953-307-060-5, INTECH, available from: http://sciyo.com/articles/show/title/face-recognition-in-ideal-and-noisy-conditions-using-support-vector-machines-pca-and-lda
[V-1] Jacko, A., J., (editor), Proceedings of 12th International
Conference, HCI International 2007, Beijing, China, July 22-27, 2007, ISBN-10
3-540-73108-3, Springer
[V-2] Corralejo, R., Hornero, R., Álvarez, D., A Domotic Control System
Using Brain-Computer Interface (BCI), Advances in Computational Intelligence,
Lecture Notes in Computer Science, 2011, Volume 6691/2011, 345-352
[V-3] Li, C., S., Wang,H.,
Wavelet transform for on-off switching BCI device, 7th Asian-Pacific
Conference on Medical and Biological Engineering, IFMBE Proceedings, 2008,
Volume 19, Part 9, 363-365
[V-4] Trejo, L., J., Rosipal, R., Matthews, B., Brain-computer
interfaces for 1-D and 2-D cursor control: designs using volitional control of
the EEG spectrum or steady-state visual evoked potentials, IEEE Trans Neural
Syst Rehabil Eng. 2006 Jun;14(2):225-9.
[T-1] Sanjoy, P., Digital Video Distribution in Broadband, Television,
Mobile and Converged Networks: Trends, Challenges and Solutions, Wiley, 384 s.,
2010, ISBN 978-0470746288
[T-2] Barsocchi, P., G. Oligeri, and F. Potortì., Packet Loss in
TCP Hybrid Wireless Networks. Proceedings of Advanced Satellite Mobile Systems
Conference (ASMS '06), Herrsching, 2006
[T-3] Marcondes, C., Sanadidi, M.Y., Gerla, M., Shimonishi, H., TCP
Adaptive Westwood - Combining TCP Westwood and Adaptive Reno: A Safe Congestion
Control Proposal, IEEE International Conference on Communications, ICC '08,
Beijing, 2008
[K-1] J. Benesty, M. Sondhi, Y. Huang, Handbook of speech Processing,
Springer, 2008
[K-2] F. Bimbot, J-F Bonastre, C. Fredouille, G. Gravier, I. Magrin-
Changolleau, S. Meignier, T. Merlin, J.
[K-3] Huang, X., A. Acero, et al. (2001). Spoken language processing : a
guide to theory, algorithm, and system development. Upper Saddle River, N.J.,
Prentice Hall PTR.
[K-4] Psutka, J. Komunikace s počítačem mluvenou řečí. 1. vyd.: 1995.
ISBN 80-200-0203-0
[K-5] Spanias, A. S. Speech Coding: A Tutorial Review. proc. IEEE, 1994,
vol. 82, no. 10, pp. 1541-1582.
[K-6] Stylianou, Y.: Applying the Harmonic plus Noise Model in
Concatenative Speech Synthesis, IEEE Trans. Acoustics, Speech and Signal
Processing, vol. 9, no. 1, pp. 21--29, Jan. 2001.
[K-7] Yi, J. (2003). Corpus-Based Unit Selection for Natural-Sounding
Speech Synthesis. MIT Department of Electrical Engineering and Computer
Science.
Assoc. Prof. Ing.
Gregor Rozinaj, PhD.
Prof. Ing. Pavol Podhradský, PhD.
Assoc. Prof. Ing. Jarmila Pavlovičová, PhD.
Assoc. Prof. Dr. Ing. Miloš
Oravec
Assoc. Prof. Ing. Ivan Kotuliak, PhD.
Ing. Eugen
Mikóczy, PhD.
Ing. Juraj Kačur, PhD.
Ing. Peter Trúchly, PhD.
Ing. Martin Turi Nagy, PhD
Ing. Radoslav Vargic, PhD.