VoC-DL: Revisiting Voice Of Customer Using Deep Learning
Paper Tracks	Emerging Applications or Methodologies (maximum 6 pages), AAAI 18

Reviews
Review 1
Significance	Excellent customer service can make or break company reputations not matter how good the product, so it’s always a problem of paramount importance. This paper discusses techniques that can classify user feedback into several feedback categories and goes beyond the standard positive/negative sentiment analysis to make customer service more effective. The techniques used (word embeddings, deep learning classifiers) to accomplish this are state of the art and have also been successfully applied to problems in other domains.
AI Technology	The paper makes a good case for the need of the AI tools and techniques it uses by citing relevant related work well as similar work for purchase intent (upon which this works builds on) and the impact that the techniques have made in industry and academia. It also gives a concise overview of the techniques so that the reader has a basic-to-intermediate level understanding of the techniques to clearly comprehend the papers’ contributions.
Innovation	While it may seem like everyone in the AI/machine learning space has jumped on the word embedding and deep learning bandwagon, the reality is that these techniques still need to find application in many practical problem domains. Although the word2vec+LSTM combination was recently used for a similar problem of purchase intent (and some recent works have tried this combination for various problems), the paper builds on this already state-of-the-art approach by extending it to a multi-class classification problem.
Content	This is a well-written paper that describes the problem, motivates the need for a word2vec+deep learning based solution, cites relevant related work, gives a good overview of the technical details of the techniques used, and evaluates the proposed approach against challenging baselines. There is no discussion of development cost or business metrics but all the bases are covered for an emerging track paper and it leaves the reader looking forward to seeing how this approach would perform in a production scale environment/in-the-wild.
Technical Quality	The paper gives a good overview of the techniques (word2vec, deep learning models) without getting bogged down into too many unnecessary details and provides clear motivation for their use. The experimental results are thorough and illustrate the advantage of the approach although I would have liked to see some discussion on the error analysis: it’s possible that hand-crafted features + the existing generic model approach might give even more accurate results but this depends on error analysis.
Clarity	The paper is easy to read, well organized, and makes good use of figures and tables to illustrate key concepts and explain experiment results. The second paragraph of the Introduction section has a source for Garner missing. It’s not clear why the final paragraph of the Conclusion section is focused on describing CNNs when they are not a component of the main solution. There is a typo in the paragraph that introduces the Korpusik 2016 reference (that Than). Figure 5 can be made a bit bigger so that it’s easier to read.
Evaluation (Emerging Track Papers Only)	An important problem in industry that’s tackled by making use of state-of-the art AI/Machine learning techniques. Both the problem and the proposed solutions are explained in good detail and the experimental results are promising. Going forward, it would be good to see how the solution performs when deployed to a production system.
Task or Problem Description	
Application Description	
Uses of AI Technology	
Application Use and Payoff	
Application Development and Deployment	
Maintenance	
Comments to the Authors	This paper discusses techniques that can classify user feedback into several feedback categories and goes beyond the standard positive/negative sentiment analysis to make customer service more effective. The techniques used (word embeddings, deep learning classifiers) to accomplish this are state of the art and have also been successfully applied to problems in other domains.


The experimental results are thorough and illustrate the advantage of the approach although I would have liked to see some discussion on the error analysis: it’s possible that hand-crafted features + the existing generic model approach might give even more accurate results but this depends on error analysis.

The second paragraph of the Introduction section has a source for Garner missing. It’s not clear why the final paragraph of the Conclusion section is focused on describing CNNs when they are not a component of the main solution. There is a typo in the paragraph that introduces the Korpusik 2016 reference (that Than). Figure 5 can be made a bit bigger so that it’s easier to read.
Review 2
Significance	This paper proposes the use of classification to detect user intent from online communication (voice-of-customer). Detecting user intent from short communications is getting more and more important with the advent of intelligent assistants.
AI Technology	The paper evaluates the use of classification to the task of user intent detection.
Innovation	The paper uses known techniques (text classification) but builds a new dataset and provides an evaluation on it.
Content	Good description of the system.
Technical Quality	Decent technical quality.
Clarity	Clearly written, but has some typos:
- abstract ”can be used in a variety of use cases where text mining is involved with ease” - what does this mean?
- Table 1 caption: “example text and it’s correct classification” —> its
Evaluation (Emerging Track Papers Only)	An evaluation of 4 different classifiers is provided. Even though 10 fold cross validation was performed, no standard deviation is reported.
Also, an error analysis would have been beneficial to understand the shortcomings of each of these models.

The proposed classifiers were not evaluated on other standard datasets so there is no way to compare them for a different classification task.
Task or Problem Description	
Application Description	
Uses of AI Technology	
Application Use and Payoff	
Application Development and Deployment	
Maintenance	
Comments to the Authors	The paper proposes a classification approach to use intent detection from short online messages. It uses well known techniques (classifiers and word embeddings) for this. The novelty is that is builds a new dataset in the Voice-of-customer domain. Will this be made available to use as benchmark?
I would suggest extending the evaluation to include other standard datasets, as well as an error analysis.
Also, a further development would be to extract more information from each of these intents: name of product, issue being reported, etc.