Text, Speech, and Dialogue
Editat de Elmar Nöth, Ale¿ Horák, Petr Sojkaen Limba Engleză Paperback – sep 2024
The 50 revised full papers presented in these deadline proceedings were carefully reviewed and selected from 103 submissions.
The papers are organized in the following topical sections:
Part I: Text
Part II: Speech, Dialogue
| Toate formatele și edițiile | Preț | Express |
|---|---|---|
| Paperback (2) | 377.08 lei 6-8 săpt. | |
| Springer – sep 2024 | 432.78 lei 3-5 săpt. | |
| Springer – sep 2024 | 377.08 lei 6-8 săpt. |
Preț: 432.78 lei
Preț vechi: 540.97 lei
-20%
Puncte Express: 649
Preț estimativ în valută:
76.53€ • 89.81$ • 66.38£
76.53€ • 89.81$ • 66.38£
Carte disponibilă
Livrare economică 16 februarie-02 martie
Specificații
ISBN-13: 9783031705656
ISBN-10: 3031705653
Pagini: 344
Ilustrații: XX, 312 p.
Dimensiuni: 155 x 235 x 19 mm
Greutate: 0.52 kg
Ediția:2024
Editura: Springer
Locul publicării:Cham, Switzerland
ISBN-10: 3031705653
Pagini: 344
Ilustrații: XX, 312 p.
Dimensiuni: 155 x 235 x 19 mm
Greutate: 0.52 kg
Ediția:2024
Editura: Springer
Locul publicării:Cham, Switzerland
Cuprins
.- Speech.
.- Retrieval Augmented Spoken Language Generation for Transport Domain.
.- Adapting Audiovisual Speech Synthesis to Estonian.
.- Dysphonia Diagnosis Using Self-Supervised Speech Models in Mono- and Cross-Lingual Settings.
.- Sentences vs Phrases in Neural Speech Synthesis.
.- Zero-Shot vs. Few-Shot Multi-Speaker TTS Using Pre-trained Czech SpeechT5 Model.
.- Deep Speaker Embeddings for Speaker Verification of Children.
.- Improved Alignment for Score Combination of RNN-T and CTC Decoder for Online Decoding.
.- Attention to Phonetics: A Visually Informed Explanation of Speech Transformers.
.- Effects of Training Strategies and the Amount of Speech Data on the Quality of Speech Synthesis.
.- Stream-Based Active Learning for Speech Emotion Recognition via Hybrid Data Selection and Continuous Learning.
.- Data Alignment and Duration Modelling in VITS.
.- Multiword Expressions Resources for Italian: Presenting a Manually Annotated Spoken Corpus.
.- Generating High-Quality F0 Embeddings Using the Vector-Quantized Variational Autoencoder.
.- Anonymizing Dysarthric Speech: Investigating the Effects of Voice Conversion on Pathological Information Preservation.
.- X-vector-based Speaker Diarization Using Bi-LSTM and Interim Voting-driven Post-processing.
.- A Paradigm for Interpreting Metrics and Measuring Error Severity in Automatic Speech Recognition.
.- Enhancing Speech Emotion Recognition Using Transfer Learning From Speaker Embeddings.
.- Dialogue.
.- Investigating Low-Cost LLM Annotation for Spoken Dialogue Understanding Datasets.
.- PiCo-VITS: Leveraging Pitch Contours for Fine-grained Emotional Speech Synthesis.
.- Improving and Understanding Clarifying Question Generation in Conversational Search.
.- Explainable Multimodal Fusion for Dementia Detection From Text and Speech.
.- Robust Classification of Parkinson’s Speech: an Approximation to a Scenario With Non-controlled Acoustic Conditions.
.- Leveraging Conceptual Similarities to Enhance Modeling of Factors Affecting Adolescents’ Well-Being.
.- Joint-Average Mean and Variance Feature Matching (JAMVFM) Semi-supervised GAN with Additional-Objective Training Function for Intent Detection.
.- Capturing Task-Related Information for Text-Based Grasp Classification Using Fine-Tuned Embeddings.
.- StepDP: A Step Towards Expressive and Pervasive Dialogue Platforms .
.- Automatic Classification of Parkinson’s Disease Using Wav2vec Embeddings at Phoneme, Syllable, and Word Levels.
.- Retrieval Augmented Spoken Language Generation for Transport Domain.
.- Adapting Audiovisual Speech Synthesis to Estonian.
.- Dysphonia Diagnosis Using Self-Supervised Speech Models in Mono- and Cross-Lingual Settings.
.- Sentences vs Phrases in Neural Speech Synthesis.
.- Zero-Shot vs. Few-Shot Multi-Speaker TTS Using Pre-trained Czech SpeechT5 Model.
.- Deep Speaker Embeddings for Speaker Verification of Children.
.- Improved Alignment for Score Combination of RNN-T and CTC Decoder for Online Decoding.
.- Attention to Phonetics: A Visually Informed Explanation of Speech Transformers.
.- Effects of Training Strategies and the Amount of Speech Data on the Quality of Speech Synthesis.
.- Stream-Based Active Learning for Speech Emotion Recognition via Hybrid Data Selection and Continuous Learning.
.- Data Alignment and Duration Modelling in VITS.
.- Multiword Expressions Resources for Italian: Presenting a Manually Annotated Spoken Corpus.
.- Generating High-Quality F0 Embeddings Using the Vector-Quantized Variational Autoencoder.
.- Anonymizing Dysarthric Speech: Investigating the Effects of Voice Conversion on Pathological Information Preservation.
.- X-vector-based Speaker Diarization Using Bi-LSTM and Interim Voting-driven Post-processing.
.- A Paradigm for Interpreting Metrics and Measuring Error Severity in Automatic Speech Recognition.
.- Enhancing Speech Emotion Recognition Using Transfer Learning From Speaker Embeddings.
.- Dialogue.
.- Investigating Low-Cost LLM Annotation for Spoken Dialogue Understanding Datasets.
.- PiCo-VITS: Leveraging Pitch Contours for Fine-grained Emotional Speech Synthesis.
.- Improving and Understanding Clarifying Question Generation in Conversational Search.
.- Explainable Multimodal Fusion for Dementia Detection From Text and Speech.
.- Robust Classification of Parkinson’s Speech: an Approximation to a Scenario With Non-controlled Acoustic Conditions.
.- Leveraging Conceptual Similarities to Enhance Modeling of Factors Affecting Adolescents’ Well-Being.
.- Joint-Average Mean and Variance Feature Matching (JAMVFM) Semi-supervised GAN with Additional-Objective Training Function for Intent Detection.
.- Capturing Task-Related Information for Text-Based Grasp Classification Using Fine-Tuned Embeddings.
.- StepDP: A Step Towards Expressive and Pervasive Dialogue Platforms .
.- Automatic Classification of Parkinson’s Disease Using Wav2vec Embeddings at Phoneme, Syllable, and Word Levels.