Well, that’s a discussion for another time. But for now, we should all accept that human voices differ in both content and timbre, which means that it shouldn’t be easy to recreate them from scratch. In some cases, that is the case. We’ve all encountered some audio that sounds exactly like a robot, or a woman’s voice with a distinctive timbre. When those voices are placed into a recorded audio file, they create a different, completely unnatural result.
You might think that some people would love to hear the voices of other people, but there’s a catch. There isn’t a simple, straightforward way to get the voices from scratch. Each voice requires a certain amount of time and concentration to get right, and there is also the problem that not every person has the exact same voice and timbreā€”so every voice will be different.

There are at least two other approaches to getting our voices as close to human as possible. One is to have an artificial voice synthesis system put an artificial voice inside a human voice, and that way the voice of the human would mimic the natural one, making us sound like a machine. The drawback is that an artificial voice may not sound natural at all; the result will still sound unnatural. When you ask your listeners how it’s going when you imitate their voice, they’ll reply that it sounds like they’re talking “to themselves.”

A technique that tries to avoid the problem is to use a computer with a synthetic version of our human voice inside it, and then the computer itself synthesizes a voice to mimic. This may work in some kinds of content, but not others. (The only exception may be for music sounds; I’ll discuss that in my review of the upcoming album by a very talented producer, Alex G.) The problem with using machines to build voice-replacements is that those machines are extremely complex and expensive, which means that someone has to make the artificial voice; in other words, it might not be a person. (As I mentioned in the introduction, there is a possibility that artificial voice synthesis could someday lead to the creation of new forms of artificial intelligence.)

This leaves one of the other approaches to voice synthesis: to use humans as a source of voice. Human voice will always be the easiest answer when creating voice synthesis, because it’s an easier topic than the other two methods. But as I mentioned at the beginning, there are some problems with human voices too. For example, if we want to try and use a

