Tabale, a meeting organiser with an IVR interface.

One of the deliverables of the VOICES project is a meeting organiser web application. With, unsurprisingly, database tables for users, meetings, participants, time slots, etc; nothing a competent hacker couldn’t write in a few days using any web app framework.

Except for one thing: the system has to be accessible to meeting participants through an interactive voice response (IVR) system: they can call, or be called, to hear about the meetings and say if they’re going to attend or not. And it has to work in multiple languages: French, Bambara and other languages from Mali, where the system is currently deployed for Sahel Eco, a local NGO that works to improve the lives of people living in rural areas. That instance of the application has users that speak multiple languages, meetings and announcements that can also be in various languages. Everything has to work so that users (and administrators) can use the system in their favourite language, or at least one they understand.

The application is called Tabale (a popular musical instrument, shown left). Although it uses a lot of different technologies one the web-facing and server sides (I especially liked using RedbeanPHP, Handlebars), the interesting part is the IVR, written in VoiceXML and running on the VoiceGlue+Asterisk setup offered by Orange’s Emerginov Platform. It’s not because phone calls are linear and have low information bandwidth (compared to graphical or textual interfaces) that writing IVR applications is easy. Especially for an audience that has never encountered speech applications before and only use their phone to talk to humans. For instance, the application developer has to keep in mind that timing is essential: the user will be switching from putting the phone to their ear, to listen or speak, and in front of their eyes to type dial tones. They will have to know when to speak to record a message, and how to end the recording. And there are many other rules that, if not followed, will make your IVR very irritating: menus (like “press 1 for listening to the meeting announcements, 2 for leaving a message, …“) shouldn’t be have more than a few items. Or you shouldn’t have your system say “press 1 for …, 2 for …” but instead “for …, press 1. for …, press 2“. Etc.

Language support has to be done right, too. The first time a user calls, they should be asked what language they want to use, and remember that choice for subsequent calls. A meeting announcement should be played in that language, and if it’s not available there should be a fallback language that’s likely to be understood by most. And by far the hardest challenge was debugging the IVR. VoiceGlue’s support for VoiceXML is random and not very well specified, and the Emerginov platform doesn’t provide call logs. Writing unit tests is also very hard, if you want to go beyond simply testing if your PHP-generated VoiceXML files are valid.

As the VOICES project comes to an end and Tabale is pretty much complete, I’m releasing the source on github. I might deploy a test instance somewhere using Evolution for the telephony features, although I wouldn’t want people to use the system to spam others with automated phone calls. But I’m more than happy to help anybody interested in deploying it for themselves or their own organisation.

Posted in General | Leave a comment

Simple street maps in SVG

OpenStreetMap Brussels

Simplified (click to enlarge)

As part of some data visualisation I’m currently working on (more on that soon), I needed a way to show simple street maps in the background of the data to be visualised. I looked around for existing SVG map renderers, but didn’t find much that was easy enough (to be honest I didn’t look that much, as I thought it would be a fun thing to do myself, and I’d already done something very similar with Maporizer).

The map data come from OpenStreetMaps (using their custom XML format), and I’ve used XSLT to transform them into SVG.

Everything is available on github. The README file explains what needs to be done to obtain the maps from OpenStreetMap and run the converter.

Notes:

  • It’s not very hard. XSLT makes XML-to-XML transformations easier to write than any other language.
  • What’s tricky is that, on large scales, the osm files can be huge. The osm map of Brussels as shown on the image above is a 110MB file, for instance. That’s why there are two XSLT transforms: one to select the elements that should be shown on the SVG, and another to transforms those elements into SVG markup
  • The params.xml file shows how to simply select the map elements to select. And the styling in the SVG is done using a CSS stylesheet whose classes correspond to the types of elements shown on the SVG. Both are straightforward, although you have to know a little bit of how openstreetmap names map elements (like ‘primary highway’)

For now I don’t need anything much more complicated, but I believe that without much effort it’s possible to render more map elements, like buildings, street names, etc. One would just need to select the element to be added in params.xml and its appearance in style.css.

Posted in General | Leave a comment

Camera Capture Works Wonders

I’m pondering whether to write a textorizer app. (I know I should stop beating that dead horse, but I still get feedback about it. It’s a good testbed, too). It would be nice to be able to have it on your mobile device, take a picture, see the results, share them, make money, etc.

However I remain more interested in the possibility of continuing to run it as an HTML5 app. But right now, the missing elements are:

  • support for the File API in some of the mobile browsers
  • offline: I need to investigate that. Saving a page “to read later” in the Android browser doesn’t save the JavaScript code, but it may be possible otherwise.
  • camera capture. This one’s no longer missing, see below.
  • make the page itself mobile-friendly (not a missing feature, just something I need to do).
  • no easy way to sell the app

Indeed, if you take your Android device, go to textorizer, and click on “select file”, it will now offer to take a picture (or select one from the device). I just tried it on my phone and it was jolly good fun. It worked with the default Android browser, Chrome and Firefox (shown below).

The only change required was to add “capture=camera” on the file upload control:

<input id="file_selector" type="file" accept="image/*;capture=camera" class="image_selector"/>

I didn’t expect it would work when you don’t actually upload a file but instead retrieve it in JavaScript, but in fact it does work, and we’re one step closer to webapps everywhere. I know it’s not yet a WHATW3CG standard, but it’s still a step forward.

Posted in General | Leave a comment

The trouble with VoiceXML (part 2)

(continued from part 1)

VoiceXML remains to this day one of the most successful standards to come out of W3C. Even though the Voice Browser Working Group, the committee that designed it, never got the same visibility as the ones that designed CSS or XML, it continues to be one of the biggest working groups at W3C. Most of the IVR industry is a member, and it has managed to produce a true industry standard. The working is currently designing VoiceXML 3.0. However, VoiceXML will probably never enjoy the success it once had.

With online access becoming ubiquitous, people don’t call IVRs so much anymore. It has become far easier to order pizza or book a flight online, rather than to run through the lenghtier process of doing it on the phone. In fact, there is not that much innovation in the area of IVR applications any more (maybe better speech recognition allowing mixed initiative dialogue, but that’s rarely found), and the the most popular applications like voicemail or simple menu-based information services, are sufficiently stable and familiar to users that there is no need for real innovation (this is not true everywhere, though — see below). You don’t find so many job ads for IVR designers these days, and there hasn’t been a book published on VoiceXML since 2002.

That does not mean that voice applications aren’t dead, though. Open source telephony software is going strong with Asterisk or Freeswitch, originally PABXs, but which also integrate IVR functionality through proprietary scripting languages. Moreover, companies like Voxeo and Twillio offer cloud services for developing Speech and SMS apps, and are enjoying a certain amount of success. Interestingly, Voxeo offers two ways of designing applications: using VoiceXML, or with an API for various standard languages (PHP, Python, etc.). The latter is much more used, because there are many more hackers familiar with those languages than there are who prefer VoiceXML. The programmatic approach especially makes it easy to integrate voice in other Web applications, make mashups and everything Web 2.0. Even though VoiceXML 2.1 extended the language specifically for that purpose, it hardly changed things. Another recent evolution of speech applications is that they are now found on mobile devices: iOS’s Siri and Android’s Voice Search are the best known examples. Those applications aren’t written in VoiceXML either, also because they aren’t just based on voice, but combined with visual and tactile interaction.

VoiceXML is catching up, though. Version 3 will handle multimodal interaction, for instance. And the working group is working on complementary standards, such as SCXML, which can declaratively describe interactive applications independently on whether the application is visual, audio, or a combination or both. But until those standards are finished and implementations are available, the procedural approach offered by Tropo, Android, or Microsoft is gaining in popularity. Moreover, standardisation is also happening in that area too: the W3C’s HTML Speech Incubator Group and Speech API community group are very active in defining a speech API for web page, some of which is already implemented in browsers.

All in all, it will take three things to make VoiceXML return: the working group finishing version 3, implementations being released publicly (breaking the monopoly of mobile operators and their walled gardens), and VoiceXML supporters convincing developers that designing elaborate speech applications leads to extremely complex code, and can only be avoided using declarative markup.

 

Note: the above currently doesn’t apply to developing countries, by the way. Voice-based web access in Africa is at the heart of 5 projects that the Web Foundation (my employer) is running, and all the IVR we design use VoiceXML. For various reasons (illiteracy, cost of smartphones, culture) the voice channel, as opposed to SMS or internet access, remains one of the most important, for human-to-human communication but also human-to-machine. That means there is a great potential for voice-only online applications that’s yet to be tapped into. Many projects coming out of the Foundation’s “mobile entrepreneurship labs” are signs of that potential. Yet there is little doubt that, with the rise of the mobile web in Africa, the voice channel will eventually follow the trend described above.

Posted in General | Leave a comment

Typographic maps using OpenStreetMap, XSLT, JavaScript and SVG

Recently I came across Typographic Maps, and I thought I’d try something similar. Not by hand, like they do. Instead, grabbing some map data off OpenStreetMap and automatically transforming it into SVG, as shown below. Not as nice as the original, but I didn’t want to spend hours making it look better. (It would probably never look as nice as when done by hand, although the ability to choose any place is a plus.) Instead I learned a few interesting things along the way.

Brussels Centre (SVG version)

How it works:

A rectangular map is selected. I used GetLatLon to find the exact latitude and longitude of the test maps.

The OpenStreetMap data is retrieved (here, Hyde Park Corner):

http://api.openstreetmap.org/api/0.6/map?bbox=-0.1593017578125,51.49805708407405,-0.14591217041015625,51.50687269909403

This returns nice and well-formed XML.

The format is easy to understand, and so it was also easy to write a basic XSLT stylesheet to select features of interest (roads, parks, etc) and turn them into SVG. Again, if someone ever wanted to make it into a generic tool, there should be a GUI lets you choose your map and the tool would automatically adjust to the scale and filter out details that are too small and make the output a mess like now.

Here is the rendering of Hyde Park Corner (and in SVG):

Hyde Park Corner

And that’s it.

Now what’s really interesting is that while looking for the latest version of Saxon, I discovered SaxonCE, a JavaScript version of Saxon. It’s not even a wrapper for the Java version, it’s the actual Saxon code cross-compiled to JS (using GWT) and running in the browser. Very impressive. And so, you can now bypass your browser’s old and buggy XSLT engine and instead add to an HTML page something like:

<script type="text/javascript"
        language="javascript" 
        src="../Saxonce/Saxonce.nocache.js"></script>

(You can also use the old XML Stylesheet Processing Instruction, and there’s a JS API, too). Here’s an example using the <script> construct above, where the XSLT transform mentioned above inserts the SVG map into an HTML page, where the <script> is. It takes a while to run, but it works, and in all browsers I’ve tried. When I think about the fact that the XSLT transform itself contains some JavaScript to text nodes to the SVG DOM, my head starts spinning.

All the code and examples are at github/maxf/maporizer

Posted in General | Leave a comment

iOS Celtic

Earlier this month, jwz announced he’d finally got Apple to approve the iOS version of XScreenSaver. Koalie was very kind to send me these screenshots of my “celtic” hack running on her iPhone.

After Witali Aswolinskiy’s iTextorizer, that’s the second of my graphics hacks ported to iOS. Saves me doing it! A big thank you to their respective authors.

Posted in General | Tagged , , | Leave a comment

The trouble with VoiceXML (part 1)

Following up on the previous entry I thought I talk about more technical details on how, at the Web Foundation, we’re designing our radio-platform.

In general, voice application share the same architecture as standard websites. Just replace “browser” with “voice browser” and “HTML” with “VoiceXML” (the most widespread language for voice applications). Also don’t put the browser on the user’s computer but on the web, usually not where the application server is since it’s often provided by a third-party, like a telco.

Voice apps vs Web apps

Because VoiceXML is the HTML of Interactive Voice Response applications you can do just as you would in a standard web application and generate the files served using PHP.

Here’s a basic (simplified) VoiceXML file:

<vxml>
  <form>
    <field name="year">
      <prompt>Please say the year you were born</prompt>
      <grammar src="year.srgs"/>
      <noinput>You did not say anything</noinput>
      <nomatch>I did not understand</noinput>
      <filled>
        <if cond="year &gt; 1980">
          <submit next="senior.vxml.php" namelist="year"/>
        <else/>
          <submit next="senior.vxml.php" namelist="year"/>
        </if>
      </filled>
    </field>
  </form>
</vxml>

Unsurprisingly there is, unlike standard HTML, some logic in the application. In fact a large portion of the VoiceXML specification describes the Form Interpretation Algorithm, which goes far beyond simple <if> statements, but includes features like error recovery, events and exceptions. Things that are barely visible in the language’s syntax, but are rather complex. Barely visible, that is, when you’re writing simple examples. But in a real application, things becomes quite complex and the resulting VoiceXML files can be hard to read (a bit like XSLT).

And you can add to that the complexity of PHP, because server-side logic is mandatory. Indeed, a VoiceXML application being just a set of forms, each one has to <submit> its contents back to the server, which then generates and serves the next VoiceXML file.

And little by little you end up with code like what I put at the end of this post. What was originally a simple VoiceXML file has become a horrible mix of two languages. Despite the ugliness it’s still code that looks familiar to many PHP developers. But again, this isn’t just PHP generating HTML, this is PHP generating VoiceXML, itself a programming language. (Yes, HTML can also contain JavaScript. Guess what, so can VoiceXML).

I’m not the first to notice it. In 2007 the W3C’s Voice Browser Working Group released VoiceXML 2.1, which adds a small number of features that can help us, the <data> tag, which lets you do XMLHttpRequest stuff, and <foreach> to loop over a variable. <data> is great, because instead of having to submit a form back to the server and receive another VoiceXML file, you can send the data over but remain in the same file. And <foreach> also removes some dependency on server-side logic. However, I know of no VoiceXML browser that implements the specification completely, including the one I’m stuck with (Voice Glue). Seven years after the release of the specification.

Are things going to improve? Are implementations going to catch up, especially FOSS ones? Unlikely. For the reason that VoiceXML is dying. I’ll write about it, and the present and future of voice applications, in another entry.


And now the ugly code (which is not too bad, actually, but you can see how it quickly gets much uglier). Nothing but code-generating code; imagine the debugging, especially when all the error reporting you have from the VoiceXML interpreter is a message on the phone saying “A serious error has occurred. Exiting.”

<?php
// authorization: get callerId, try and match it against the user list
// if it checks, go ahead. If it doesn't, create a new user
// input variables: callerId

require_once('log.php');
require_once('i18n.php');
require_once('radio-platform.php');
require_once("ivr-platform.php");

Log::write("starting auth-callerId");
Log::write($_SERVER['REQUEST_URI']);

if (isset($_REQUEST['callerId'])) {
  $callerId = $_REQUEST['callerId'];
} else {
  $callerId = 'unknown';
}

$sessionId = $_REQUEST['sessionId'];

// fetch user list
$users = RadioPlatform::getUsers();

// search user with correct callerId
$userFound = false;
foreach ($users as $user) {
  if (phoneNumbersMatch($user['phone'], $callerId)) {
    $userFound = $user;
    $userId = $user['id'];
    $userRadioId = $userFound['radios'][0];
    break;
  }
}

if ($userFound) {
  $userLang = $userFound['lang'][0];
  Log::write("User: $userId");
} else {
  Log::write("No user found.");
}

header('Content-Type: application/voicexml+xml; charset=utf-8');
print('<?xml version="1.0" encoding="utf-8"?>');
?>

<vxml xmlns="http://www.w3.org/2001/vxml" version="2.1">
  <property name="inputmodes" value="dtmf"/>
  <var name="sessionId" expr="'<?php echo $sessionId ?>'"/>

<?php
if($userFound) {
  $radios = RadioPlatform::getRadios();
?>
<form>
  <var name="userId" expr="'<?php echo $userId ?>'"/>
  <var name="userRadioId" expr="'<?php echo $userRadioId ?>'"/>
  <var name="userLang" expr="'<?php echo $userLang ?>'"/>
  <block>
<?php prompt($userLang, 'welcome') ?>
    <audio src="<?php echo $radios[$userRadioId]['audio']?>"/>
    <submit next="main-menu.vxml.php" method="get" namelist="userLang userId userRadioId sessionId"/>
  </block>
</form>

<?php
} else { // No user found through callerID. Create new user.
?>

<form>
  <block>
    <?php prompt('bam','welcome'); ?>
    <?php prompt('fr','welcome'); ?>
  </block>
  <field name="userLang">
    <?php prompt('bam','select_bam_1'); ?>
    <?php prompt('fr','select_fr_2'); ?>
    <option dtmf="1" value="bam">Bambara</option>
    <option dtmf="2" value="fr">French</option>
    <noinput><reprompt/></noinput>
    <nomatch><reprompt/></nomatch>
    <filled>
      <var name="callerId" expr="'<?php echo $callerId ?>'"/>
      <submit next="auth-new.vxml.php" namelist="userLang callerId sessionId"/>
    </filled>
  </field>
</form>

<?php } ?>
</vxml>

<?php
// tries to fix bad callerIds, removing leading whitespace, '+' or '0'
function clean_phone_id($caller_id) {
  $ph=ltrim($caller_id);
  $ph=preg_replace('/\s*$/','',$ph);
  $ph=preg_replace('/^\s*/','',$ph);
  $ph=preg_replace('/^\+/','',$ph);
  $ph=preg_replace('/^0*/','',$ph);
  return $ph;
}
// returns true if both numbers match
function phoneNumbersMatch($n1, $n2) {
  if ($n1 === $n2) return true;
  return clean_phone_id($n1) === clean_phone_id($n2);
}
function prompt($lang,$msg) {
  $xmllang = IvrPlatform::xmllang($lang);
  echo "<prompt xml:lang='$xmllang'>".I18N::say($lang,$msg)."</prompt>\n";
}
?>
Posted in General | 3 Comments

Foroba Blon and our Radio Platform for Citizen Journalists

Radio Moutian in Tominian, Mali

This is a re-blog from an entry I wrote for the Web Foundation, my current employer. Not something I usually do, but I really care about the project and I’d like it to be more known. Not that this blog makes a difference, but who knows.

This week, a group of us are back in Mali, on a field trip for the Foroba Blon project, where we’ll be deploying the first prototype of our radio platform, testing it, and gathering feedback for the continuation of the project.

When I describe the platform and the project I often get the same questions, so I thought I’d write an FAQ.

Continue reading

Posted in General | 1 Comment

On-the-fly jslint validation in emacs

In the hope that someone will come across this trying not to waste time figuring things out, here’s how I managed to have emacs run jslint in my JavaScript buffers in real time. It’s not perfact, but it works where no other method I found online did.

We’ll use the jslint package from node.js1:

1. Install node.js. I use the homebrew package manager on OSX, so I just had to type brew install node.

2. Install the jslint module of node.js: npm install jslint

After this you should have a jslint command which you can test in a terminal (eg, jslint --terse test.js) to see something that looks like:

test.js
18n.js(2):Expected 'en' at column 9, not column 3.
i18n.js(3):Expected 'record' at column 13, not column 5.
...

By default (without --terse) the output is more verbose, but having one line per error is easier in emacs.

3. Add the following to your .emacs file:

(when (load "flymake" t)
  (defun flymake-jslint-init ()
    (let* ((temp-file (flymake-init-create-temp-buffer-copy
		       'flymake-create-temp-inplace))
           (local-file (file-relative-name
                        temp-file
                        (file-name-directory buffer-file-name))))
      (list "jslint" (list "--terse" local-file))))

  (setq flymake-err-line-patterns
	(cons '("^\\(.*\\)(\\([[:digit:]]+\\)):\\(.*\\)$"
		1 2 nil 3)
	      flymake-err-line-patterns))

  (add-to-list 'flymake-allowed-file-name-masks
               '("\\.js\\'" flymake-jslint-init))

  (require 'flymake-cursor)
)

(add-hook 'js2-mode-hook
	  (lambda ()
      (flymake-mode 1)
      (define-key js2-mode-map "\C-c\C-n" 'flymake-goto-next-error)))

flymake-cursor replaces the default behaviour of popping up a menu with the error message by just putting it in the minibuffer.

Emacs should of course be set up so that it knows where jslint is, in order to invoke it on the javascript buffers3.

Notes:

1. I also tried jslint4java but it turned out to be too slow for real-time invocation with flymake. Other Java-based methods (using Rhino) are, unsurprisingly, reported to be equally slow, and I failed to get spidermonkey installed.

3. In OS X, applications get their environment variables (like PATH, here) from the environment.plist in ~/.MacOSX. See this emacswiki page for instructions.

[updates 9 Oct 2012]:

  • Step 2 is no longer necessary, as npm is now included in homebrew’s node package
  • At step 3, removed -g
  • With the --terse option, it is no longer necessary to modify files inside the jslint module.
  • The lisp code has been updated to include --terse and match the new jslint error syntax

Thanks to the people who commented, leading to the update.

Posted in General | 8 Comments

Efficient Calendars Redux: Gosper curve

One Mr Scott posted a comment on a previous entry on calendars and plane-filling curves, where he suggests using the Gosper curve as the calendar path. I was intrigued and adapted the code in order to give it a try. But I can’t say it produced great results:

While it looks pretty and shows much compactness (properly analysed in the 2008 article by Herman Haverkort and Freek van Walderveen previously mentioned), it is perhaps too compact in the sense that there’s not much space to scribble anything in the cells or around (not that I expect anybody to use it for that purpose). More importantly, the way the L-system rendering is implemented in SVG doesn’t lend itself to hexagonal cells, as this close-up shows:

The issue is that the way the calendar is rendered is that a single thick polyline is the space-filling curve and is used as a mask through which the cells show. While it masks square cells pretty well, it sort of fails for hex, as it doesn’t follow the edges of the cells but instead cuts right through.

It’s my own fault, and the solution would be to use a custom type of polyline, which means that I’d have to write my own curve-rendering code. While this would have the extra advantage that I could then port the code to HTML5, I’m not really bothered.

I’m not bothered publishing the Gosper code either, but I’ll do it if anybody asks.

Posted in General | Leave a comment