MultiNet

Multimedia Services in the Converged Future Networks Environment

APVV-0437-11

The presented project links closely with other nationwide and, more notably, European projects. The research team will utilize their experience and knowledge gained at several EU projects’ research (i.e. FP7 HBB-NEXT 2011-14, DAAD Audio-Speech Interface for Mobile Devices (ASIMD)) as well as the 7 running nationwide projects. In the mentioned projects, the research team solves partial problems which will be utilized in the presented project. On the other hand, the project’s outcomes will be used in other research team’s EU projects (i.e. LdV IMProVET 2011-14 and EU projects under preparation: FP7 NEWTON, CIP Europeana for Education E4E, InterReg FIDICE, etc.) by direct use of the presented project’s results to pursue the goals of the mentioned projects.

Various types of hybrid networks are currently used within the converged networks. Hybrid communication networks arose by connecting various network types which utilize metallic, optical and radio transfer media for transmission. Each network is characterised by its data transfer parameters (reflecting the quality of transfer media). In contrast with fixed networks, wireless networks use transfer channel with significantly worse properties (packet loss, packet delay). This massively influences performance and efficiency of the data transfer, and ultimately the quality of services provided. The rise in capabilities and capacities of the networks brings new multimedia services [T-1] which, on the other hand, put strain on communication networks to provide services on specific level of quality. These expectations rely on effective operation of protocols on all layers of the protocol stack and their ability to cope with different transfer parameters of the used communication technologies. The two basic transport protocols of the current Internet - TCP and UDP - belong to these protocols.

The TCP protocol offers many communication services in the field of connection-oriented transfer, reliable data delivery as well as flow and overload management. New types of networks and network technologies (heterogeneous networks, hybrid networks, asymmetric networks, wireless and satellite networks, optical transport networks and their interconnections) lower the performance of this rather old protocol. Thus, new solutions to improve its robustness against the factors mentioned and maintain the required transfer quality are to be investigated [T-2].

The first aim of the project concerning the field of effective transfer on the transport layer is identification of main problems of the TCP protocol in various types of hybrid networks and capability and drawback analysis of the latest of the TCP protocol, i.e. TCP Veno, Westwood+, Adaptive Reno [T-3]. Concurrent to the first aim, new simulation environment will be created capable of creating and simulating various hybrid network scenarios and different modifications of the TCP protocol (or, eventually, other transport protocols). When drawback of the latest TCP protocol modifications are identified, an innovative modification compensating limitations will be suggested. The modification will be implemented to the simulation environment and tested over hybrid network and provided services scenarios.

The basic method to solve the outlined objectives is using the simulated application of the suggested modification and its testing. Many scientists use this method all over the world. An advantage is that the method is an affordable way to solve the project's aims since physical realisation would require access to various network types, end devices and complex firmware alteration.

Currently, a WEB interface is a standard interface provided by different NGN network end devices. With various types of HCI (i.e. audio and video) integrated into the interface allows for considerable extension of interaction options between services/applications and the user.

The converged networks of the future is the most current topic concerning research of the Future Internet and new concepts of network architecture. European Commission gives great importance to solving the challenge of future networks. It is also expected to be one of key topics for research support within the FP7 projects (see Call8 Topics). It is also part of Digital Agenda for Europe within the ICT research in the horizon 2020. Several standardization institutions address the issue, ITU-T and ETSI to mention a few.

The field of multimedia services over converged future networks leads to fast, comfortable and secure sharing of multimedia content over these networks. The user of these services expects, apart from multimedia content sharing, exploitation of modern communication services, i.e. voice, video and instant messaging, and reliable and secure and trustworthy access to personal information (e-mail, e-bank, e-store, etc.). The mentioned needs create a basis for requirements focused on:

- Security of the multimedia content provided

- User identification

- Trust of the service user

Identity management is an essential part of every network platform. It is also a cardinal security element of a network helping to build users' confidence toward the platform. It may be observed currently that existing hybrid networks require existence of several copies (or versions) of user service profiles, especially when the user employs multiple HBB end devices or several services on different service servers, respectively. One of the project's objectives is to suggest and develop a framework which will allow service providers to create and deploy new interactive applications.

We can divide the field of multimodal interface into image and speech parts. The goal of the project is to analyse the possibilities of a multimodal interface for natural communication with the end devices. Attention will be paid to user identification, device control, and the feedback from the device towards the user.

The scope of face recognition is a live and investigated topic [1], [2], [3], [4]. Static face-image recognition in controlled environment is successfully in use for biometric applications. There are still many problems to be coped with when identifying in uncontrolled environment (changes in lighting, pose, and face expressions), recognising non-frontal face images, etc.

Methods based on face recognition have high potential for they may be applied in greater distances and lower image definition of the face and biometric features.

During the last ten years, the telecommunication environment has changed. Convergence of the computer and telecommunication networks brings the demand of new flexible multimedia services, which can allow fast and comfortable access to the information needed. Modern trends in this domain show that natural human speech will become preferred medium of the new multimedia and multimodal telecommunication services in a relatively short time. The modality is related to the sensor perception (sight, audition, traction, gustation, osphresis). The modality is the communication channel between the electronic device and the human user. Then, the multimodal interface is the combination of several forms of sensor perception. Asset of this project will be the elaboration and solving of the communication problems between human and machine for Slovak speech. The scientific level of the project corresponds with the state-of-the-art of the world research in the current domain.

Systems for automatic speech recognition are growing more popular, thus their direct employment in commercial application is becoming more realistic. This research domain is very challenging due to its complexity. Thus there are many famous research centres and laboratories involved in it. At present, systems based on continuous hidden Markov models applied to an isolated word recognition problem, working with vocabulary sizes of several thousand words are well studied (especially for English) and are achieving the best result. There are several modification of HMM (discrete, continuous, semi-continuous, two or multistage HMM, HMMs with explicit time modeling, ML, MAP, MMI learning rules, hybrid solution with NN, SVM, etc.), but the most spread and successful ones are CD-HMM with tied context - dependent models of phonemes. Nowadays one of the most challenging tasks is the construction of recognition systems designed to cope with continuous speech in real conditions. Even despite huge effort that has been done over last several decades there are lot of open issues. More details can be found [1].

The general problem of speaker recognition can be divided into several categories like: speaker identification (closed group) and speaker verification (open group), text dependent and text independent. Another way how the systems are classified is according to the information they are using: acoustical information, prosody, and linguistic information (dialect, social background, etc.). The most frequently used speech features are: MFCC, PLP and the pitch period, however there are more experimental and auxiliary ones. In the terms of classification methods the most common are: GMM, KNN, NN, and decision trees. It is usual to achieve accuracy over 90% for text independent scenarios for group of 100 speakers. It is important to choose proper set of features and classification techniques so that they match together and reflect particular application. More detail can be found i.e. [2].

In the domain of speech synthesis, the goal is to interconnect the high-level synthesis with the low-level synthesis [K-4]. After that, to adapt and optimize the compression method [K-5] for corpus speech synthesis and to propose the method for prosodic modification of Slovak speech and for prosody prediction [K-3]. Next step is to design a method for speech synthesis with emotions. For natural and intelligible speech, it is also necessary to incorporate the system of the distribution of abbreviations, acronyms and numerals. In the perspective of effectiveness and particularity of Slovak language, it is needed to search for new functional methods to implement this part of the project. The proposed goals serve as the basis for multimodal communication interface.

Yet another interface topic is control of the device by brain (brain-computer interface – BCI). In the present time BCI represents a new and promising interface to extend possibilities of multimodal end devices, i.e. intelligent TVs. These may be used as end devices for NGN networks. BCI integration into HCI is a significant extension of HCI possibilities in the scope of user-service and user-application interaction.

Publication references

[1] Rabia Jafri, Hamid R. Arabnia: A Survey of Face Recognition Techniques, Journal of Information Processing Systems, vol. 5, no. 2, pp. 41-68, 2009

[2] Delac, K.; Grgic, M. & Bartlett, M., S. (Eds.) (2008). Recent Advances in Face Recognition, IN-TECH, Vienna, Retrieved from http://intechweb.org/book.php?id=101

[3] Li, S. Z. & Jain, A. K. (Eds.) (2005). Handbook of Face Recognition, Springer, ISBN# 0-387-40595-x, New York

[4] Oravec, M.; Mazanec, J.; Pavlovicova, J.; Eiben, P. & Lehocki, F. (2010). Face Recognition in Ideal and Noisy Conditions Using Support Vector Machines, PCA and LDA, In: Face Recognition, Milos Oravec (Ed.), ISBN: 978-953-307-060-5, INTECH, available from: http://sciyo.com/articles/show/title/face-recognition-in-ideal-and-noisy-conditions-using-support-vector-machines-pca-and-lda

[V-1] Jacko, A., J., (editor), Proceedings of 12th International Conference, HCI International 2007, Beijing, China, July 22-27, 2007, ISBN-10 3-540-73108-3, Springer

[V-2] Corralejo, R., Hornero, R., Álvarez, D., A Domotic Control System Using Brain-Computer Interface (BCI), Advances in Computational Intelligence, Lecture Notes in Computer Science, 2011, Volume 6691/2011, 345-352

[V-3] Li, C., S., Wang,H., Wavelet transform for on-off switching BCI device, 7th Asian-Pacific Conference on Medical and Biological Engineering, IFMBE Proceedings, 2008, Volume 19, Part 9, 363-365

[V-4] Trejo, L., J., Rosipal, R., Matthews, B., Brain-computer interfaces for 1-D and 2-D cursor control: designs using volitional control of the EEG spectrum or steady-state visual evoked potentials, IEEE Trans Neural Syst Rehabil Eng. 2006 Jun;14(2):225-9.

[T-1] Sanjoy, P., Digital Video Distribution in Broadband, Television, Mobile and Converged Networks: Trends, Challenges and Solutions, Wiley, 384 s., 2010, ISBN 978-0470746288

[T-2] Barsocchi, P., G. Oligeri, and F. Potortì., Packet Loss in TCP Hybrid Wireless Networks. Proceedings of Advanced Satellite Mobile Systems Conference (ASMS '06), Herrsching, 2006

[T-3] Marcondes, C., Sanadidi, M.Y., Gerla, M., Shimonishi, H., TCP Adaptive Westwood - Combining TCP Westwood and Adaptive Reno: A Safe Congestion Control Proposal, IEEE International Conference on Communications, ICC '08, Beijing, 2008

[K-1] J. Benesty, M. Sondhi, Y. Huang, Handbook of speech Processing, Springer, 2008

[K-2] F. Bimbot, J-F Bonastre, C. Fredouille, G. Gravier, I. Magrin- Changolleau, S. Meignier, T. Merlin, J.

[K-3] Huang, X., A. Acero, et al. (2001). Spoken language processing : a guide to theory, algorithm, and system development. Upper Saddle River, N.J., Prentice Hall PTR.

[K-4] Psutka, J. Komunikace s počítačem mluvenou řečí. 1. vyd.: 1995. ISBN 80-200-0203-0

[K-5] Spanias, A. S. Speech Coding: A Tutorial Review. proc. IEEE, 1994, vol. 82, no. 10, pp. 1541-1582.

[K-6] Stylianou, Y.: Applying the Harmonic plus Noise Model in Concatenative Speech Synthesis, IEEE Trans. Acoustics, Speech and Signal Processing, vol. 9, no. 1, pp. 21--29, Jan. 2001.

[K-7] Yi, J. (2003). Corpus-Based Unit Selection for Natural-Sounding Speech Synthesis. MIT Department of Electrical Engineering and Computer Science.

Research team

Assoc. Prof. Ing. Gregor Rozinaj, PhD.

Prof. Ing. Pavol Podhradský, PhD.

Assoc. Prof. Ing. Jarmila Pavlovičová, PhD.

Assoc. Prof. Dr. Ing. Miloš Oravec

Assoc. Prof. Ing. Ivan Kotuliak, PhD.

Ing. Eugen Mikóczy, PhD.

Ing. Juraj Kačur, PhD.

Ing. Peter Trúchly, PhD.

Ing. Martin Turi Nagy, PhD

Ing. Radoslav Vargic, PhD.