Skip navigation.

Interview with Sjoerd Simons of Empathy

Gnome Multimedia
Gnome Multimedia

This is the third in a series of interviews about open source multimedia, the previous interviews were about Jokosher and Totem. For this interview we check in with Sjoerd Simons who works on the Empathy client, an which combines instant messaging, video conferencing and voice over IP into one application. Sjoerd will talk to us about the current status of Empathy and where it is going.

Could you give us an introduction on who you are and what projects you been working on over the years?

I'm a dutch guy working for Collabora Ltd. in Cambridge, UK. I started my involvement with the open source/free software as a Debian Maintainer. I'm one of the people maintaining GNOME for Debian, but most of my work has been on the stuff things below the desktop (Avahi, D-Bus, Hal, etc), both for debian and to a lesser extent upstream. The last few years i've been working for Collabora on various bits and pieces of the Telepathy ecosystem, with as biggest project I've done being telepathy-salut (the link-local XMMP connection manager) of which i'm the main author.

For those not familiar with Empathy, what type of application is it and what are its features?

Empathy is an instant messaging client build on top of Telepathy. Currently it supports presence, chatting (both p2p and chatrooms), voice and video calling for a variety of protocols, including but not limited to XMPP, link-local XMPP, MSN, SIP, Yahoo, ICQ etc..

Who are the current contributors to Empathy apart from yourself?

Xavier Claessens is the main maintainer. Guillaume Desmottes has been polishing file transfer support recently and works on MUC support from time to time. Jonny Lamb works on a various things, some of his recent work was the importer for Pidgin accounts and the initial support for file tranfers. Cosimo Cecchi recently joined us here at Collabora and is mostly working on user interface things, he's currently working on integrating libcanberra for sound events and support for notifications.

Empathy uses GStreamer, Farsight and Telepathy to support VoIP and video calling. Could you please explain what each of those components provide Empathy?

Empathy uses things called Telepathy connection managers (or CMs for short) as the backend for the various protocols it speaks. A CM is responsible for actually connecting to the server (if applicable) and sending your chat messages over the wire. So empathy itself doesn't actually speak any of the protocols directly, it just tells the CM what it wants to do.

Now to make a VoIP or video call you actually need two things. The first part is the signalling, e.g. you tell the other side you want to call them, which audio/video codecs you support, etc. This part is protocol specific and is done by the Telepathy CM. The second part is the actual streaming of the audio/video data over the network and recording/displaying it. Most of that work is basically the same for all protocols and is what Farsight does for us.

Farsight itself is built on top of Gstreamer and uses it for all the multimedia handling.

As someone developing a non-playback application using GStreamer what is your opinion on GStreamer for these kind of applications?

With the 0.10 series GStreamer finally got mature enough for these kind of applications. As always when using a piece of software in an entirely new way, you are bound to hit bugs, but the basic design is very capable. The only thing that I'd like to see improved is dynamically changing the pipeline while it's playing. It's currently doable (and we actually do), but quite tricky and you need to know a fair bit about how GStreamer internally works to get it right.

Empathys native protocol is XMPP, which is the same protocol used by Jabber and Google Talk. Could you please explain what XMPP is and what advantages it brings?

I wouldn't say XMPP is our native protocol, but it's definitely the one that is the best supported and the one that is tested the most. XMPP is a completely open standard, which is big advantage over say, MSN, where free software always needs to play catchup if they change something in their network. Also the standards process for XMPP is very open which has allowed us to participate in it and help writing new extensions for things like for video support.

Another big advantage is that it's decentralized, everyone can run their own XMPP server if they want to. All these servers communicate amongst themselves. This allows me to talk to people on Google Talk even when I'm using my own XMPP
server.

Google recently added video calling to their online Google mail service, will Empathy be able to interact with this video calling service?

With their new service google decided to not use the XMPPs Jingle extension for voice/video, but instead extend their own protocol, the protocol which they also used in Google Talk client. Luckily we already supported the older version of that protocol and with some small adjustments in telepathy-gabble (our XMPP backend) we were able to support the new version. Some users reported issues when trying to use it with Empathy though, which is something that's quite high on my todo list to get fixed.

Unfortunately Google decided to use H264-SVC as a video codec,for which there are currently no open source implementations available. So for now we're only able to support audio calls. Luckily Google seems to be planning an update with support for more conventional codecs like H264-AVC. Once that happens we should be able to support video calls without many problems.

The biggest problem of video calling on linux at the moment is that, apart from Theora and Dirac, all modern video codecs are heavily patented. So no distribution can ship encoders for those. It remains to be seen whether Google will introduce support for Theora, if not then users are still required to get the necessary codecs themselves to make things work :(

So based on your comment regarding codecs and free software I assume the best option for at least getting linux-to-linux client video calls going is the Speex and Theora codecs. How well suited for video conferencing are these two codecs and how well supported are they in Empathy?

Speex is one of the best codecs around for voice calls. It's well supported in Empathy and in various other clients.

Theora is a very nice codec as well, but not on par with the currently leading codecs unfortunately. According to the people behind Theora this is mostly due to limitations of the encoder, not the codec itself. Work is being done to improve the situation.

With that said, the current performance (both in terms of speed and quality) of Theora is perfectly adequate on recent machines. And being able to have video conferencing just work out of the box is definitely a huge win. Theora support in Empathy is currently not great, but it will be fully supported when we switch to using Farsight2 in the next couple of weeks.

Most linux users probably use a wide suite of applications for their communications need today, like Pidgin, X-Chat and Ekiga. How do you see Empathy in comparison with those today and what are your long term plans?

In the last GNOME release Empathy was accepted as part of the Desktop. Various distributions decided to keep Pidgin as default for now though, as Empathy still is or was missing various features. For GNOME 2.26 we hope to have most of these issue addressed. In the long term we're planning to make more and more parts of GNOME Telepathy aware. Some examples are sending files from nautilus, sharing links from Epiphany, changing your status to away when watching a movie in Totem, publishing the currently playing song from Rhythmbox etc.

Both X-Chat and Ekiga are more specialised for specific tasks (IRC and SIP calling). While Empathy might be able to replace them for the casual user, I expect users with more specific requirements to stick with them.

Empathy supports a wide variety of protocols in addition to XMPP/Jabber, like MSN, Yahoo and ICQ. I guess a lot of time goes into staying interoperable with these proprietary services and is there a lot of collaboration between the various open source efforts in terms of figuring out what the problem is when new versions of these services are rolled out and one has to figure out what the problem is?

For support of proprietary protocols we mostly depend on the work being done by other open source efforts. For MSN there is Telepathy Butterfly, which uses the pymsn library. Most other proprietary protocols are supported Telepathy haze which uses libpurple (the library also used by Pidgin) internally.

Empathy supports this thing called Salut. I am sure most people haven't heard about it before, so could you please explain to us what it is and what it does?

Salut is an implementation of XMPP's Serverless Messaging extension, it's basically the protocol that iChat calls Bonjour and is available in some other clients under that name as well. Salut uses Avahi to detect people on the local network so you can easily chat and in recent versions share files with them.

What about VoIP and videoconferencing support in Salut?

The long term plan is to merge Gabble and Salut into a common codebase. Which means we'll be able to share the videoconference code (among other things) between both projects.

Farsight2 has recently come out. What are the plans for its use in Empathy and what new features does this library make possible for Empathy?

I'm currently working on porting Empathy from telepathy-stream-engine to telepathy-farsight. Telepathy-farsight is a library that provides the basic glue between Telepathy and Farsight2. So we should see Farsight2 support in Empathy in one of the next releases.

Farsight2 supports full lip-sync in a video call, which is obviously a good thing to have. Furthermore it allows us to properly support Theora as a video codec, which we couldn't really do up to now. This means we will finally be able to support video conferencing on linux systems out of the box using completely free codecs with Empathy.

Another huge benifit is that it gives Empathy direct control over the media pipeline. Which will enable it to be much more flexible and resilient with respect to input and outputs. For example it should be able for a user to plug in their USB camera and start using it at any point in a voip call. Or to switch the audio output from the speakers to a headset on the fly. Another idea that has been going round is to use Clutter for the video widget to add some nice bling.

Telepathy-farsight isn't just a great step for empathy though. We've recently added python bindings to it, making all those advantages also available for programs using python. Specifically we know the Elisa people have been planning to add Video calling using Telepathy for some time. The problem for them was that they need to control the outputs for proper integration, which just wasn't possible before.

What is the development roadmap for Empathy going forward?

In the short term we plan to add quite a lot of user interface polish and some features we are still missing when compared to other clients. File transfer was a feature a lot of people were asking for that recently got merged in Empathy. For now it's only supported in Salut, but we're actively working on supporting it for other protocols as well.

In the long term, like i mentioned earlier, we'd like to see more integration with the GNOME desktop. Also we'd like to see more applications to use Telepathy for collaboration. At several occasions we've shown how application can collaborate using Telepathy, but unfortunately some of the basic framework needed for this on the desktop is still immature. In the last few months we added a lot of things to the Telepathy specification to improve this situation. Those improvements are currently being integrated in stack.

The jungle telegraph is buzzing about something called Mingle. Want can you reveal about this new technology and its use in Empathy?

Before I explain Mingle I think I need to start with Jingle. Jingle is an XMPP extension that is currently being standardised, which enables clients to make audio/video calls (among other things). Mingle builds on top of Jingle to provide the ability for audio/video conferences with more than two people. We've recently announced the first draft of the protocol and a small python client to demonstrate it. One of the big advantages of Mingle is that you won't need any special servers or infrastructure support to run a small conference (say 4 to 5 people). For bigger conferences some mixing infrastructure is probably needed as you might start hitting CPU and/or bandwidth limits, but we have some ideas how to do this in a mostly transparent manner. For more information about Mingle see its website. :)

In the next few months we'll start adding support for Mingle to Telepathy and Empathy, allowing people to easily set up a small conference with their friends :)

Which other multimedia applications do you use?

I use Totem for Video playback and Rhythmbox for Music playback. We also have a box in the house which acts as a media center running Elisa.

In terms of the big picture, what would you like to see happening in regards to multimedia in Linux and GNOME?

In general i'd just like things to just work when it comes the various inputs and outputs. GNOME is currently quite static in this regard. You configure your webcam, speakers and microphone once and that's it. But in reality people want to be able to plug in usb headsets, webcams, use bluetooth headsets etc and just use them in whatever they were doing. Pulseaudio is a big step in the right direction to make this happen for audio, but it needs more integration in the desktop.

To learn more about Empathy visit live.gnome.org/Empathy.

Interview done by Christian F.K. Schaller.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
Great! by Anonymous George (not verified)
Nice by Anonymous George (not verified)