The trouble with VoiceXML (part 1)

Following up on the previous entry I thought I talk about more technical details on how, at the Web Foundation, we’re designing our radio-platform.

In general, voice application share the same architecture as standard websites. Just replace “browser” with “voice browser” and “HTML” with “VoiceXML” (the most widespread language for voice applications). Also don’t put the browser on the user’s computer but on the web, usually not where the application server is since it’s often provided by a third-party, like a telco.

Voice apps vs Web apps

Because VoiceXML is the HTML of Interactive Voice Response applications you can do just as you would in a standard web application and generate the files served using PHP.

Here’s a basic (simplified) VoiceXML file:

    <field name="year">
      <prompt>Please say the year you were born</prompt>
      <grammar src="year.srgs"/>
      <noinput>You did not say anything</noinput>
      <nomatch>I did not understand</noinput>
        <if cond="year &gt; 1980">
          <submit next="senior.vxml.php" namelist="year"/>
          <submit next="senior.vxml.php" namelist="year"/>

Unsurprisingly there is, unlike standard HTML, some logic in the application. In fact a large portion of the VoiceXML specification describes the Form Interpretation Algorithm, which goes far beyond simple <if> statements, but includes features like error recovery, events and exceptions. Things that are barely visible in the language’s syntax, but are rather complex. Barely visible, that is, when you’re writing simple examples. But in a real application, things becomes quite complex and the resulting VoiceXML files can be hard to read (a bit like XSLT).

And you can add to that the complexity of PHP, because server-side logic is mandatory. Indeed, a VoiceXML application being just a set of forms, each one has to <submit> its contents back to the server, which then generates and serves the next VoiceXML file.

And little by little you end up with code like what I put at the end of this post. What was originally a simple VoiceXML file has become a horrible mix of two languages. Despite the ugliness it’s still code that looks familiar to many PHP developers. But again, this isn’t just PHP generating HTML, this is PHP generating VoiceXML, itself a programming language. (Yes, HTML can also contain JavaScript. Guess what, so can VoiceXML).

I’m not the first to notice it. In 2007 the W3C’s Voice Browser Working Group released VoiceXML 2.1, which adds a small number of features that can help us, the <data> tag, which lets you do XMLHttpRequest stuff, and <foreach> to loop over a variable. <data> is great, because instead of having to submit a form back to the server and receive another VoiceXML file, you can send the data over but remain in the same file. And <foreach> also removes some dependency on server-side logic. However, I know of no VoiceXML browser that implements the specification completely, including the one I’m stuck with (Voice Glue). Seven years after the release of the specification.

Are things going to improve? Are implementations going to catch up, especially FOSS ones? Unlikely. For the reason that VoiceXML is dying. I’ll write about it, and the present and future of voice applications, in another entry.

And now the ugly code (which is not too bad, actually, but you can see how it quickly gets much uglier). Nothing but code-generating code; imagine the debugging, especially when all the error reporting you have from the VoiceXML interpreter is a message on the phone saying “A serious error has occurred. Exiting.”

// authorization: get callerId, try and match it against the user list
// if it checks, go ahead. If it doesn't, create a new user
// input variables: callerId


Log::write("starting auth-callerId");

if (isset($_REQUEST['callerId'])) {
  $callerId = $_REQUEST['callerId'];
} else {
  $callerId = 'unknown';

$sessionId = $_REQUEST['sessionId'];

// fetch user list
$users = RadioPlatform::getUsers();

// search user with correct callerId
$userFound = false;
foreach ($users as $user) {
  if (phoneNumbersMatch($user['phone'], $callerId)) {
    $userFound = $user;
    $userId = $user['id'];
    $userRadioId = $userFound['radios'][0];

if ($userFound) {
  $userLang = $userFound['lang'][0];
  Log::write("User: $userId");
} else {
  Log::write("No user found.");

header('Content-Type: application/voicexml+xml; charset=utf-8');
print('<?xml version="1.0" encoding="utf-8"?>');

<vxml xmlns="" version="2.1">
  <property name="inputmodes" value="dtmf"/>
  <var name="sessionId" expr="'<?php echo $sessionId ?>'"/>

if($userFound) {
  $radios = RadioPlatform::getRadios();
  <var name="userId" expr="'<?php echo $userId ?>'"/>
  <var name="userRadioId" expr="'<?php echo $userRadioId ?>'"/>
  <var name="userLang" expr="'<?php echo $userLang ?>'"/>
<?php prompt($userLang, 'welcome') ?>
    <audio src="<?php echo $radios[$userRadioId]['audio']?>"/>
    <submit next="main-menu.vxml.php" method="get" namelist="userLang userId userRadioId sessionId"/>

} else { // No user found through callerID. Create new user.

    <?php prompt('bam','welcome'); ?>
    <?php prompt('fr','welcome'); ?>
  <field name="userLang">
    <?php prompt('bam','select_bam_1'); ?>
    <?php prompt('fr','select_fr_2'); ?>
    <option dtmf="1" value="bam">Bambara</option>
    <option dtmf="2" value="fr">French</option>
      <var name="callerId" expr="'<?php echo $callerId ?>'"/>
      <submit next="auth-new.vxml.php" namelist="userLang callerId sessionId"/>

<?php } ?>

// tries to fix bad callerIds, removing leading whitespace, '+' or '0'
function clean_phone_id($caller_id) {
  return $ph;
// returns true if both numbers match
function phoneNumbersMatch($n1, $n2) {
  if ($n1 === $n2) return true;
  return clean_phone_id($n1) === clean_phone_id($n2);
function prompt($lang,$msg) {
  $xmllang = IvrPlatform::xmllang($lang);
  echo "<prompt xml:lang='$xmllang'>".I18N::say($lang,$msg)."</prompt>\n";
This entry was posted in General. Bookmark the permalink.