After several days of intensive development, I have completed a preliminary speaker labeling for the game strings. The current scripts and raw data are available in the output/results/ directory (refer to all_chapters_speaker_results.csv and all_chapters_dump_aligned_results.csv).
Currently, the automated labeling accuracy maintains a baseline of 50%–60%. To further refine this dataset, I wish we can implement a community-driven maintenance workflow. This would allow contributors to verify existing labels, correct misidentifications, or submit their own labeled datasets to improve the overall model precision.
https://drive.google.com/file/d/1WZeFz4ouzA3zf1D3Y4TKI8ni1AziPaKE/view?usp=drive_link
(I don't know why I cannot attach files)
After several days of intensive development, I have completed a preliminary speaker labeling for the game strings. The current scripts and raw data are available in the
output/results/directory (refer toall_chapters_speaker_results.csvandall_chapters_dump_aligned_results.csv).Currently, the automated labeling accuracy maintains a baseline of 50%–60%. To further refine this dataset, I wish we can implement a community-driven maintenance workflow. This would allow contributors to verify existing labels, correct misidentifications, or submit their own labeled datasets to improve the overall model precision.
https://drive.google.com/file/d/1WZeFz4ouzA3zf1D3Y4TKI8ni1AziPaKE/view?usp=drive_link
(I don't know why I cannot attach files)