============================================================================ ACL 2017 Reviews for Submission #440 ============================================================================ Title: A* CCG Parsing with a Supertag and Dependency Factored Model Authors: Masashi Yoshikawa, Hiroshi Noji and Yuji Matsumoto ============================================================================ REVIEWER #1 ============================================================================ --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- APPROPRIATENESS: 5 CLARITY: 4 ORIGINALITY: 4 EMPIRICAL SOUNDNESS / CORRECTNESS: 4 THEORETICAL SOUNDNESS / CORRECTNESS: 4 MEANINGFUL COMPARISON: 4 SUBSTANCE: 4 IMPACT OF IDEAS OR RESULTS: 4 IMPACT OF ACCOMPANYING SOFTWARE: 3 IMPACT OF ACCOMPANYING DATASET: 1 RECOMMENDATION: 4 REVIEW DATASET: Yes --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- This paper describes a state-of-the-art CCG parsing model that decomposes into tagging and dependency scores, and has an efficient A* decoding algorithm. Interestingly, the paper slightly outperforms Lee et al. (2016)'s more expressive global parsing model, presumably because this factorization makes learning easier. It's great that they also report results on another language, showing large improvements over existing work on Japanese CCG parsing. One surprising original result is that modeling the first word of a constituent as the head substantially outperforms linguistically motivated head rules. Overall this is a good paper that makes a nice contribution. I only have a few suggestions: - I liked the way that the dependency and supertagging models interact, but it would be good to include baseline results for simpler variations (e.g. not conditioning the tag on the head dependency). - The paper achieves new state-of-the-art results on Japanese by a large margin. However, there has been a lot less work on this data - would it also be possible to train the Lee et al. parser on this data for comparison? - Lewis, He and Zettlemoyer (2015) explore combined dependency and supertagging models for CCG and SRL, and may be worth citing. ============================================================================ REVIEWER #2 ============================================================================ --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- APPROPRIATENESS: 5 CLARITY: 4 ORIGINALITY: 3 EMPIRICAL SOUNDNESS / CORRECTNESS: 4 THEORETICAL SOUNDNESS / CORRECTNESS: 5 MEANINGFUL COMPARISON: 5 SUBSTANCE: 4 IMPACT OF IDEAS OR RESULTS: 4 IMPACT OF ACCOMPANYING SOFTWARE: 4 IMPACT OF ACCOMPANYING DATASET: 1 RECOMMENDATION: 4 REVIEW DATASET: Yes --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- - Strengths: This paper presents an extension to A* CCG parsing to include dependency information. Achieving this while maintaining speed and tractability is a very impressive feature of this approach. The ability to precompute attachments is a nice trick. I also really appreciated the evaluation of the effect of the head-rules on normal-form violations and would love to see more details on the remaining cases. - Weaknesses: I'd like to see more analysis of certain dependency structures. I'm particularly interested in how coordination and relative clauses are handled when the predicate argument structure of CCG is at odds with the dependency structures normally used by other dependency parsers. - General Discussion: I'm very happy with this work and feel it's a very nice contribution to the literature. The only thing missing for me is a more in-depth analysis of the types of constructions which saw the most improvement (English and Japanese) and a discussion (mentioned above) reconciling Pred-Arg dependencies with those of other parsers. ============================================================================ REVIEWER #3 ============================================================================ --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- APPROPRIATENESS: 5 CLARITY: 4 ORIGINALITY: 2 EMPIRICAL SOUNDNESS / CORRECTNESS: 3 THEORETICAL SOUNDNESS / CORRECTNESS: 4 MEANINGFUL COMPARISON: 5 SUBSTANCE: 4 IMPACT OF IDEAS OR RESULTS: 3 IMPACT OF ACCOMPANYING SOFTWARE: 1 IMPACT OF ACCOMPANYING DATASET: 1 RECOMMENDATION: 4 REVIEW DATASET: No --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- This paper propose a CCG parsing model that brings together recent work on supertagging-based CCG parsing and dependency parsing. Combining both approaches in a single model achieves a modest gain over the existing state of the art for English. Much larger gains are observed for Japanese with a simple adaptation for converting CCG to dependency trees, While the model is a straightforward adaptation of previous work, achieving good performance with a simplified dependency conversion rule and the exclusion of normal form constraints is a surprising and interesting result. Generalizing recent parsing improvements to other languages is also an important contribution. There are number of missing experiments that would verify the modeling contribution of this paper, which is the multitask dependency/supertagging setup and the use of the most probably head to predict the supertag. These contributions warrant additional ablations and comparisons: (1) What if a simple MLP over the hidden vector is used to predict the supertag instead? (2) What if the gold head is used at training time instead of the most probable head? (3) What if the two tasks were trained separately? Does the parameter sharing matter? (4) What if the architecture is reversed, and the most probably tag is used to predicting the head? Another natural question that is perhaps out of scope is the impact of modeling tree-structured dependencies rather than the graph-structured CCG dependencies proposed in Lewis et al. (2016).