While ending the Survey of SVS 2022

Introduction

Last November, I released a survey with the goal to be a great survey like the Stack Overflow’s developer survey in the field of singing voice synthesis. Until this January, I promote this survey to several communities such as Discord, Reddit, and some forum websites. In addition, I collected countless opinions, for example, how the project is reasonable, what should I do consider the marketing component when I plan a business model, and what is the fandom’s characteristics, and so on.

The survey is the first practice in my life to plan a big questionnaire for an unspecified number of the people, not a small questionnaire to release my friends and family for a just school assessment. So, there are not only many mistakes, but also many items what I should modify.

For that reason, I decided to write a memoir about the survey.

Feedback from the survey

First, as an objective component, I analyzed the feedback on the survey.

The items below were summarized feedback.

  • Some words are too hard, e.g., cross-lingual feature.
  • The survey is not structured in detail enough.
  • Some questions are inappropriate.
  • Some sentences are ambiguous.

Even though I already tried asking some advice from many people and tried to cross-validation check, many respondents mentioned the wording issues.

Of course, It was caused by my lack of the skill as a researcher. I just finished my Bachelor’s degree and I do not have any experience about how to research and how to analyze the result.

In addition, I missed considering some components, such as the knowledge level of respondents. Actually, I expected that many people already know about the word, cross-lingual features because Dreamtoics released that function a year ago. (While I write this, I found the word again, Japanese translation was different. The official Twitter account said “多言語歌声合成”, multilingual vocal synthesis.) But most of the respondents did not know about that word, and some respondents did not have any interest in these technologies either. Hence, they had a lot of confusion in the progress of the survey.

Additionally, the most surprising feedback for me personally was that the language wording was ambiguous. As I already asked a review to over one person when I get advice about the survey. But that is probably my fault because I prepared the survey in such a hurry that I had too little time to review it.

As a result, most feedback were related to language, to solve these feedbacks, I will need to study more English.

Self-feedback

Reading the feedback from the survey and developing the result report web page, I took the time to reflect on myself, too. At that time, I could understand why the respondents said that.

The issues of questionnaire structure

The purpose of the questionnaire is broadly threefold: To provide an overall analysis of the Vocaloid’s (or similar software’s) fandom, To create an introductory guide by analyzing the experts’ experiences, and To justify my research project. Moreover, I would like to do avoid including too many questions over 30 to reduce the burden on the respondents.

But well… Was I too greedy…

I thought that 30 questions were too much for a public survey, but simultaneously, it was not enough to structure the questions for each intent. So, I cannot select enough of the required questions for each section.

Many questions did not connect with other questions, and some questions were too blatant about their intentions. By its very nature, a questionnaire should not directly reveal its intent, but should only appear the intent in the report. However, I think some questions are too blatant or even straightforward.

It would be more useful for both respondents and analysts if I structured different questionnaires for each important component and had a more detailed structure.

User-input fields

The problem I found exists not only in the questions, but also in the answers. It was the last item, the user-changeable item. This gave the respondents complete freedom in answering, but also made a difficulty to analyze data.

It is one of the main reasons why the results report took so long to open. It made the responses discard or merge with others. Of course, some feedback I received suggested adding more user input or multiple-choice questions to allow for more freedom of response, but I disagree with it.

Too many duplicate answers reduce the clarity of the data and scatter the responses. In the case of multiple-choice questions, the trend can be kept even if the responses are scattered. So, it is possible to infer the appropriate direction from this, but it can be easily discarded in the analysis, as new items are added as respondents choose them.

If I were a competent data analyst, I could properly organize responses and process them into meaningful data, but since most of the user-input is completely independent, I decided it would be impossible to keep them.

An example of responses’ dispersion

This is a question about the inconvenience with the software you use.
This is a question about the inconvenience with the software you use.

The 12 items were added via user-input item, which is about 12.5%. I grouped them as the others for data cleaning when I analyzed this. It is hard to manually organize them, and it also made the response useless.

An example of multiple responses

This is a question about their community group.
This is a question about their community group.

When I set a single-choice phrase, I expected a single, unambiguous response rather than a range of responses. When they want to select many answers, I anticipated people to select the closest one.

But, contrary to my expectations, respondents sometimes used the user-input filed to enter all the items manually that applied to them.

Perhaps an explicit “other” option instead of user-input fields would avoid this problem as much as possible.

To be honest, it was picked as an example that seemed extreme, and it can be seen as a case that caused wording issue because the English translation is different from other languages.

An issue of figuring out the respondents’ knowledge level

In the previous section, I mentioned the knowledge level. Before starting to talk about that, I want to explain what is that means, obviously. Actually, it does not mean how smart are you. I already said that me also not too different from you. Me also a normal person who got many lacks. So, the knowledge level means how many interests you have.

Actually, I have been interested in the Vocaloid for about 11 years. And, I am also one of its fans like you. I love my character, Kagamine Len (鏡音レン), and occasionally, I enjoy my favorite musician’s song while hugging the character doll too. At the same time, I have also interested in the technical aspect. Whenever new vocal synthesis software was released, I tried to understand how it worked. Recently, I learned what is voice synthesis principle. Moreover, I read the code of some open-source projects such as NNSVS (A Neural Network-Based Singing Voice Synthesis Toolkit) or DiffSVC (A Diffusion Probabilistic Model for Singing Voice Conversion), and study it as well. Especially, the code of world4utau gave me inspiration. For me, the technology is so fun rather than the knowledge learned in school or internship, it is enough to make me excited. However, other people were different from me. They were not interested in the technical, mathematical, linguistic, or acoustic aspects of speech synthesis technology.

The knowledge level I said is that.

Actually, I want to implement reasonable progress for all of us by discussing technically, but most people are not concerned about that. They seemed that they were only interested in the result. I think that maybe it is one of the reasons why most software companies do not conduct a survey that includes technical elements.

Through this, I learned the importance of understanding about the survey’s target.

Other

Fortunately, the lecture that has researched a similar field, NLP (Natural Language Process), assigned my graduation assignment. Having a conversation with him, we discussed my idea and tried to fulfill what the school wants. In this progress, two questions are reminded in my memory: So, what is the cross-lingual exactly, and How to explain the singing voice synthesis?

For my supervisor, It was an unfamiliar subject. He also confused voice synthesis with voice conversion because the Deepfake was widely known in the early stage of generative models. Moreover, about the cross-lingual feature, he asked me that is it works not only to speak other languages but also to translate the meaning too. And the school officer who visited the class to check the student’s progress misunderstood too like him. At the time, I thought that the trending voice synthesis technology was not the field I was studying, but the technology they mentioned. In fact, while I was writing this article, Microsoft announced a model called VALLE-X, which is a cross-language speech synthesis model. I know that it is not a method using phonemes like me, but using a BERT language model.

Recently, I got project offers about the voice conversion from acquaintances, and me also have been interested in the technology too. In a way, it might be better to more focus on research topics that are of interest to the public, rather than insisting on the singing voice synthesis with my stubbornness. Since these two technologies are not too different as you might think, I do not think it would be impossible to study both.

Solution

With these self-reflections, I derived some solutions to reflect on feedback.

Distinguish the respondents clearly

This survey dealt with a wide range of people, such as the fans of Vocaloid, artists who use Vocaloid, and technicians who are interested in Vocaloid. Unlike my expectation, it caused several problems. First, while I structure questions for all subjects in one questionnaire, I cannot include many detailed questions because there were plenty of questions. And I cannot group questions related. For that reason, I failed to create better data as well. As a result, I could not get professionalism, versatility, or usability because of my greed.

Therefore, next time, these themes will be clearly divided so that each questionnaire can get more useful information.

Compose organization for the survey

In the process of constructing and analyzing the survey, I felt many deficiencies in my ability to construct questions. It can be solved by more studying, but if I have an opportunity, I wish to discuss the questions with many companions. I believe that there are many benefits if I make the survey together Instead of solo play. For example, I do not have to miss a good idea, and it would be easy to translate into a variety of languages. It can be seen as dangerous to work with unknown people, but I believe that it can help my growth.

Not use Google Form

Google Form is one of good tools to write a survey. Actually, whenever I have to make a survey, I have used the tool occasionally. In addition, it can work on the cloud using a web browser so that it is enough to choose it instead of Microsoft Sway and Survey-monkey. However, I found several critical issues.

1. Support multi-language content

I translated it into three languages: English, Korean, and Japanese so that people from as many countries as possible can easily access it. Of course, the translation also has many problems, but the point I said is readability. Because Google Forms does not have multi-language tools, I had to decide to make a separate survey or input the 3 languages in the same survey. I selected the latter because I was concerned that distributing it to multiple files would make it difficult to analyze the data later. As a result, the overall readability decreased, and the text was cut off on the result page, making it difficult to analyze intuitively.

2. CSV data issue

The most critical issue is the CSV (Comma Separated Values) file exported. Every so often, I export CSV files for secondary processing of the chart for my report. But the CSV file’s structure was different from my expectation. It was not a categorized number, but it was just string data. The user-input items were also a big problem. It was also added as a new item. If I created it as a divided survey for multi-language content, the analysis would be very difficult.

The example of CSV file.
The example of CSV file.

In addition, in a special survey such as MOS (Mean Opinion Score), it was difficult to use it for a private MOS survey since it was not possible to present an audio file to the question.

To solve this problem, I am considering developing a new survey service or making a WordPress page.

Wrap up

Unlike my plan, during the research period, there were numerous events, so the project was very postponed. Most schools give enough time while doing the graduation assessment, but my school not only gives enough time, but also it piled up another module. Since even the overlapping module was a hard lecture because it was for senior students, it was not easy. To make matters worse, I had to return to my country while getting through my international student life, and I have many meetings after that. So, I still could not take a rest yet.

Furthermore, the results were not good either.

As for the hardware issues, there were several hindrances, and I could not make an achievement starting from the survey structuring and ending with analyzing the result because I do not have enough training or practice in the research process. During this process, I sometimes lamented my lack of skills. I even regretted choosing this research project.

Nevertheless, I believe that this progress is not only very hard and painful work, but also it is a practice to introduce as a researcher and a trial of my growth. With more practice and study, I will be able to become a better and more influential researcher.

Finally, I would like to acknowledge the 96 respondents who participated in the survey, the 7 friends who reviewed the questionnaire, and the readers who are reading this article.

Thank you.


Posted

in

by

Tags:

Comments

Leave a comment

Create a website or blog at WordPress.com

Design a site like this with WordPress.com
Get started