Authors
Aggoun, AmarTsekleves, Emmanuel
Swash, M.R.
Zarpalas, D.
Dimou, A.
Daras, P.
Nunes, P.
Soares, L.D.
Issue Date
2013-02Subjects
3D imaging3D video
glassless 3D video
holoscopic video
multimedia
image reconstruction
three-dimensional television
video coding
Metadata
Show full item recordAbstract
We demonstrated a 3D holoscopic video system for 3DTV application. We showed that using a field lens and a square aperture significantly reduces the vignetting problem associated with a relay system and achieves over 95 percent fill factor. The main problem for such a relay system is the nonlinear distortion during the 3D image capturing, which can seriously affect the reconstruction process for a 3D display. The nonlinear distortion mainly includes lens radial distortion (intrinsic) and microlens array perspective distortion (extrinsic). This is the task of future work. Our results also show that the SS coding approach performs better than the standard HEVC scheme. Furthermore, we show that search and retrieval performance relies on the depth map's quality and that the multimodal fusion boosts the retrieval performance.Citation
Aggoun, A., Tsekleves, E., Swash, M.R., Zarpalas, D., Dimou, A., Daras, P., Nunes, P., Soares, L.D. (2013) 'Immersive 3D Holoscopic Video System' IEEE MultiMedia 20 (1):28 -37Publisher
IEEEJournal
IEEE MultiMediaType
ArticleLanguage
enISSN
1070-986XSponsors
We acknowledge the support of the European Commission under the Seventh Framework Programme (FP7) project 3D Vivant (Live Immerse Video-Audio Interactive Multimedia)ae974a485f413a2113503eed53cd6c53
10.1109/MMUL.2012.42
Scopus Count
Related items
Showing items related by title, author, creator and subject.
-
User-action-driven view and rate scalable multiview video codingChakareski, Jacob; Velisavljević, Vladan; Stankovic, Vladimir (IEEE, 2013-09)We derive an optimization framework for joint view and rate scalable coding of multi-view video content represented in the texture plus depth format. The optimization enables the sender to select the subset of coded views and their encoding rates such that the aggregate distortion over a continuum of synthesized views is minimized. We construct the view and rate embedded bitstream such that it delivers optimal performance simultaneously over a discrete set of transmission rates. In conjunction, we develop a user interaction model that characterizes the view selection actions of the client as a Markov chain over a discrete state-space. We exploit the model within the context of our optimization to compute user-action-driven coding strategies that aim at enhancing the client's performance in terms of latency and video quality. Our optimization outperforms the state-of-the-art H.264 SVC codec as well as a multi-view wavelet-based coder equipped with a uniform rate allocation strategy, across all scenarios studied in our experiments. Equally important, we can achieve an arbitrarily fine granularity of encoding bit rates, while providing a novel functionality of view embedded encoding, unlike the other encoding methods that we examined. Finally, we observe that the interactivity-aware coding delivers superior performance over conventional allocation techniques that do not anticipate the client's view selection actions in their operation.
-
Embedded FIR filter design for real-time refocusing using a standard plenoptic video cameraHahne, Christopher; Aggoun, Amar; University of Bedfordshire (SPIE - the international society for optics and photonics, 2014-02-03)A novel and low-cost embedded hardware architecture for real-time refocusing based on a standard plenoptic camera is presented in this study. The proposed layout design synthesizes refocusing slices directly from micro images by omitting the process for the commonly used sub-aperture extraction. Therefore, intellectual property cores, containing switch controlled Finite Impulse Response (FIR) filters, are developed and applied to the Field Programmable Gate Array (FPGA) XC6SLX45 from Xilinx. Enabling the hardware design to work economically, the FIR filters are composed of stored product as well as upsampling and interpolation techniques in order to achieve an ideal relation between image resolution, delay time, power consumption and the demand of logic gates. The video output is transmitted via High-Definition Multimedia Interface (HDMI) with a resolution of 720p at a frame rate of 60 fps conforming to the HD ready standard. Examples of the synthesized refocusing slices are presented. -
An investigation into double-marking methods: comparing live, audio and video rating of performance on the IELTS Speaking TestNakatsuhara, Fumiyo; Inoue, Chihiro; Taylor, Lynda (The IELTS Partners: British Council, IDP: IELTS Australia and Cambridge English Language Assessment, 2017-03-01)This study compared IELTS examiners’ scores when they assessed test-takers’ spoken performance under live and two non-live rating conditions using audio and video recordings. It also explored examiners’ perceptions towards test-takers’ performance in the two non-live rating modes. This was a mixed-methods study that involved both existing and newly collected datasets. A total of six trained IELTS examiners assessed 36 test-takers’ performance under the live, audio and video rating conditions. Their scores in the three modes of rating were calibrated using the multifaceted Rasch model analysis. In all modes of rating, the examiners were asked to make notes on why they awarded the scores that they did on each analytical category. The comments were quantitatively analysed in terms of the volume of positive and negative features of test-takers’ performance that examiners reported noticing when awarding scores under the three rating conditions. Using selected test-takers’ audio and video recordings, examiners’ verbal reports were also collected to gain insights into their perceptions towards test-takers’ performance under the two non-live conditions. The results showed that audio ratings were significantly lower than live and video ratings for all rating categories. Examiners noticed more negative performance features of test-takers under the two non-live rating conditions than the live rating condition. The verbal report data demonstrated how having visual information in the video-rating mode helped examiners to understand test-takers’ utterances, to see what was happening beyond what the test-takers were saying and to understand with more confidence the source of test-takers’ hesitation, pauses and awkwardness in their performance. The results of this study have, therefore, offered a better understanding of the three modes of rating, and a recommendation was made regarding enhanced double-marking methods that could be introduced to the IELTS Speaking Test.
