============================================================================ EMNLP 2016 Reviews for Submission #170 ============================================================================ Title: Joint Transition-based Dependency Parsing and Disfluency Detection for Automatic Speech Recognition Texts Authors: Masashi Yoshikawa, Hiroyuki Shindo and Yuji Matsumoto ============================================================================ REVIEWER #1 ============================================================================ --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Appropriateness: 5 Clarity: 3 Originality: 3 Soundness / Correctness: 4 Meaningful Comparison: 3 Substance: 3 Impact of Ideas / Results: 3 Impact of Accompanying Software: 1 Impact of Accompanying Dataset / Resource: 3 Recommendation: 3 Reviewer Confidence: 4 --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- This paper describes an approach to handle disfluencies and ASR errors in dependency parsing. The main contribution is to include three new actions (Edit, LeftArcError and RightArcError) to the parser. In general the paper is well motivated. However, there are elements of the work that could benefit from more detail in the description. I believe the work is sound, but in places it is missing useful details for comparison. 1. it would be useful to clearly enumerate the contributions of this work. Where does it differ from previous approaches to this task? 2. Specifically what features are used. In both cases where features are described, hedges are used "The feature set is _mainly_ based on (Honnibal and Johnson, 2014)" and "We use WCN-based features _such as_ the mean and standard deviation of slot arc posteriors". For reproducabilty and for understanding this work, the features used should be concretely described. 3. What data was the ASR system trained on? I believe it's Fischer. But if it's SWB some care would need to be taken when assessing its performance on this data. (Regardless, its training material should be described in this paper.) 4. Throughout the paper this approach is implicitly and explicitly compared to Honnibal and Johnson 2014. Given that, it is rather surprising that the ArcEager parser is used as a point of comparison. A more informative comparison would be to the Honnibal and Johnson approach. It makes it very difficult to assess the impact of this approach without such a comparison. The combination of a vague description of what was done and a limited comparison to other approaches to this task make it difficult to be enthusiastic about this paper, despite thinking that there is (probably) good work underlying it. ============================================================================ REVIEWER #2 ============================================================================ --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Appropriateness: 5 Clarity: 4 Originality: 4 Soundness / Correctness: 5 Meaningful Comparison: 5 Substance: 4 Impact of Ideas / Results: 4 Impact of Accompanying Software: 1 Impact of Accompanying Dataset / Resource: 4 Recommendation: 4 Reviewer Confidence: 4 --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- This paper presents a new transition-based dependency parsing approach to address text with disfluencies. The authors specifically focus on noisy text that is generated from an ASR system. The paper offers two main contributions: a new model - which incorporates additional action types which are designed specifically to identify disfluency "errors," and a dataset which is generated over ASR output of the disfluent data. The paper is relatively easy to understand, leaving open only a couple of questions about the approach. Regarding training, are the backoff scores used during training as well as at inference, or is this done only at inference? If the models are trained with a typical on-line learning algorithm (MIRA, structured perceptron, arrow, etc.) you can apply the score during training and learning. The description of the WCN features is very brief. They offer a good bit of improvement for disfluency prediction (but actually hurt dependency accuracy). The details of these features make it difficult to understand exactly what was used in these models. References are not correct in this paper - specifically in section 3 and section 4.2. You cannot cite a paper by Zhang and Nivre as Zhang. It is clear in the Bibliography, but the references throughout the text are incorrect and must be fixed. ============================================================================ REVIEWER #3 ============================================================================ --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Appropriateness: 5 Clarity: 5 Originality: 4 Soundness / Correctness: 4 Meaningful Comparison: 3 Substance: 4 Impact of Ideas / Results: 4 Impact of Accompanying Software: 1 Impact of Accompanying Dataset / Resource: 3 Recommendation: 4 Reviewer Confidence: 3 --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- This paper presents an approach for joint dependency parsing with disfluency detection. The methodology is clearly described and the results based on different train-test settings are informative. The main deficiency with the paper is the lack of connection with previous disfluency detection research that uses ASR output. In the Introduction, the authors state the following: "However, the authors assume that the input texts to parse are transcribed by human annotators, which, in practice, is unrealistic." This applies to the references that they provided, and is generally true for researchers who have focused on the end task of dependency parsing. However, many speech researchers have conducted disfluency detection research based on ASR output using a combination of lexical and signal-based features. This body of research is omitted by the study and should at least be mentioned, for example, Liu, Shriberg & Stolcke (2003). For readers who aren't familiar with the annotated Switchboard corpus, it would be helpful to briefly mention the types of disfluencies that are contained in the corpus (filled pauses, repetitions, false starts). I was not able to find a description of the columns contained in the attached data sets, which limited their usefulness. References Liu, Y., Shriberg, E. & Stolcke, A. (2003). Automatic Disfluency Identification in Conversational Speech Using Multiple Knowledge Sources. In Proceedings of Interspeech, pp. 957-960.