empire-state

Postet til: English-articles, Gjestebloggere, Nettjenester, NRK & Radio

NRK P3 Spotify Playlists – The beauty of open access to data


Ved å bruke Spotifys nye Metadata API, kan vi nå lage en oppdatert side med Spotify-lenker for NRK P3s A, B og C-lister.

empire-state

Empire State of Mind med Jay-Z og Alicia Keys ligger på P3s A-liste denne uka.

Resten av artikkelen er på engelsk.

Two days ago Spotify announced a public Spotify Metadata API. This API makes it possible to lookup track names and get a reference to the Spotify track. I know that NRK P3 expose their playlists on a HTML page, and I decided to create a mashup; a page showing the NRK P3 playlists, where each item was a link that starts playing the song in Spotify.

The result is available at spotify.erlang.no

This is a story of how good things may happen when multiple sources make data available for easy access to the public.

The rest of the article is the full source code of the site above, with explanations.

Grabbing the NRK P3 playlists

The first part is actually the difficult part. The NRK P3 playlists are exposed on a web page with XHTML that does not validate. In general it is not a good idea to put a web page through an XML parser, even if it works when the web page is well-formed, only a minor error in the web page, will cause the parser to be unable to extract anything.

The script is written PHP, and is split in two;

  • cron.php that grabs the playlist, lookup using the Spotify Metadata API, and store the result in a JSON cache file. This script is supposed to run every night to keep the list updated as the Spotify library changes, and the playlists changes. It is important to not lookup the full list on every request, so we cache the result for a full day.
  • index.php grabs the JSON cache file, and presents it in a simple XHTML page with lists of links.

Letting a script act like a web browser, and then extract content from the web pages, is known as WebScraping. Through years of making handy web-tools, I’ve ended up with a personal WebScraper library, that put any web page through libTidy, and then ends up with a XML document. Then I have some helper-functions to extract content using XPath syntax.

Let’s start of by including my WebScraping utility:

// Include an WebScraping library with helper functions to extract content from HTML pages.
require('wscraper.php');

Now, let us create a function that grabs the A-, B-, and C-playlist from the P3 web page.

function getPlaylist() {
    $scraper = new WScraper();
    $htmlpage = $scraper->getURL('http://nrkp3.no/spillelister/');

    $playlists = array();

We iterate through all three lists on the page:

    for($i = 1; $i <= 3; $i++) {
        $playlists[] = $htmlpage->textMulti("//div[@class='postarea']/ul[" . $i . "]/li", TRUE, TRUE, TRUE);

We use XPath, and this expression //div[@class='postarea']/ul[1]/li means find a <div> element with a class attribute set to postarea, and then get the first <ul> child and its <li> list-item children. The textMulti converts the content of the <li> to plain text and creates a PHP array out of it.

    }
    return $playlists;
}

Now, we have a function that returns an array of three playlists, each being an array of track names.

Preparing the search phrase

Next problem that we need to solve, is that the text in the NRK P3 playlists include some information that makes the Spotify API fail to find the content. Here is an example of an item on the playlist:

  • Black Eyed Peas – I Gotta Feeling (ned fra A)

First, the ‘-’ (dash) separator causes problems for the Spotify search engine. Next, all text in parenthesis on the playlists seem to be meta-text not relevant to the track name it self. So we create a cleaning function that gets a track name which is more likely to be found.

function cleanSearch($in) {
    $out = preg_replace('/ – /', ' ', $in);
    $out = preg_replace('/\(.*?\)/', '', $out);
    $out = preg_replace('/feat[^\s]+/', '', $out);
    return $out;
}

We use regular expressions to remove:

  • The dash
  • Everything in parenthesis
  • The words ‘feat.’, because Spotify seldom uses the ‘featuring’ word, instead uses a ‘+’ sign or similar.

The output for the example above will be:

  • Black Eyed Peas I Gotta Feeling

Which, in fact result in perfect match using Spotify’s search engine.

Using the Spotify Metadata API

Using the Spotify Metadata API is really simple. The query is generated using an URL with a query string parameter, and the result is returned as a complex XML document with a lot of information.

function getSpotifyLink($search) {
    $clean = cleanSearch($search);
    $res = simplexml_load_string(
            file_get_contents('http://ws.spotify.com/search/1/track?q=' . urlencode($clean))
        );

We now have the result of the lookup as an XML document, and we check if we have less than one results, if so we just return the search phrase without any Spotify links:

    if (count($res->track) < 1) return array('search' => $search,   'clean' => $clean);

If we finds one or more hits, we return more information, including:

  • The artist name
  • The track name
  • The spotify link
  • The search phrase before and after cleaning

Everything is collected in an associative array and returned:

    return array(
        'artist' => (string)$res->track[0]->artist->name,
        'track' => (string)$res->track[0]->name,
        'link' => (string)$res->track[0]{'href'},
        'search' => $search,
        'clean' => $clean,
    );
}

Putting it together and storing the result

Now we put all the above together, start off by grabbing a playlist, and populating a new result array with spotify links for all of the playlists:

$playlists = getPlaylist();
$spotify = array();

foreach($playlists AS $playlist) {
    $newSpotifylist = array();
    foreach($playlist AS $track) {

Here we collect the spotify links using the Spotify Metadata API for an individual track:

        $newSpotifylist[] = getSpotifyLink($track);
    }
    $spotify[] = $newSpotifylist;
}

When we are completed, we store the resulting array in a file, encoded using JSON.

file_put_contents('cache.json', json_encode($spotify));

Presenting the result

We, now need to create index.php a script that reads the cached JSON file and presents it in a simple XHTML page.

We start by reading the cached file:

$spotify = json_decode(file_get_contents('cache.json'), TRUE);

Defines the human readable names on the playlists (in norwegian):

$listnames = array('A-lista', 'B-lista', 'C-lista');

Sends an appropriate HTTP header:

header('Content-type: text/html; charset=utf-8');

Then outputs the content (somewhat truncated):

echo '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
    <title>NRK P3 Spotify Playlists</title>
    <style>[..snipp...]
    </style>
</head>
<body><h1>NRK P3 Spotify Playlists</h1>';

We now have the header, and iterate through the lists, and output them:

foreach($spotify AS $k => $spotlist) {

We create a list header, such as ‘A-lista’:

    echo '<h2>' . $listnames[$k] . '</h2>';
    echo '<ul>';

And we iterate through the tracks on each playlist:

    foreach($spotlist AS $spotitem) {

If there is a spotify link found, we creates a HTML link that opens the track in Spotify:

        if (isset($spotitem['link'])) {
            echo '<li><a href="' . $spotitem['link'] . '">' . 
                $spotitem['artist'] . ' - ' . $spotitem['track'] . '<br /></a><span style="color: #999; font-size: 80%">' . $spotitem['clean'] . '</span></li>';
        } else {

If not, we just output the track name in grey:

            echo '<li><span style="color: #888">' . $spotitem['search'] . '</span></li>';
        }
    }
    echo '</ul>';
}
echo '</body></html>';

The author

Andreas Åkre Solberg is working as a scientist for UNINETT and is more than average interested in open access to data. He from time to time make useful web-services, open source software, and likes companies and data owners that make their data available to the public.

31 kommentarer

  1. Very nice!

    I see the track «Blog» by Jaa9 & OnklP isn’t included in the result, even if it is in the Spotify library. I think this is because the XHTML entity &amp; is being sent to the search instead of a simple &. It should probably be replaced in the cleanSearch() function?

    Svar på denne kommentaren

  2. Cool! Hopefully Spotify will publish an open api for creating playlists soon. That would make it possible to create dynamic spotify playlists based on playlists published on the web. Mixing that ability with something like last.fm’s api would be extremely cool.

    Svar på denne kommentaren

  3. Jarle Hammen Knudsen

    Tenk om man kunne publisere podkaster gjennom Spotify. Da kunne NRK gjort dem tilgjengelig med musikken intakt siden det er streaming, og jeg kunne hørt på dem med iPhone uten å være online.

    Svar på denne kommentaren

  4. Hva med å samle alle låtene i en og samme spilleliste

    Veldig enig med hva Endre H. sier. Hadde vært veldig kult om man kunne hatt en automatisk oppdaterende A-, B- og C-spilleliste!

    @Endre H: takk for linken til wearehunted forresten. Virker som en bra side :)

    Svar på denne kommentaren

  5. Er ikke mulig å lage ferdig spilleliste med Metadata API. Kanskje det er mulig med libspotify? Eventuelt så måtte man gjort noe manuelt, og det liker jeg ikke :P

    Jeg har forresten gjort noen oppdateringer. Jeg henter nå inn spillelistene fra NRK P3, Mp3, Radio1 og VG lista. Har lagt det inn i tabs, og samlet alle sangene (med fjerning av duplikater) under All tracks.

    http://spotify.erlang.no

    Svar på denne kommentaren

  6. A-ye, Andreas. Denne greia er genial!
    Så, neste gang du skal en tur ut får du slenge på en /msg på goog old IRC så skal jeg spandere en øl eller tre..

    - På vegne av en generasjon med musikk & lyd-avhengige som liker å være oppdatert på radio-låter: Takk! :-)

    Svar på denne kommentaren

  7. [...] NRKbeta har allerede testet dette med sine p3-lister, og jeg skal fremover prøve å finne andre nyttige lister. Jeg har laget en egen side til dette, og vil gjerne ha innspill til andre lister enn dem jeg har. Du finner siden under Spotifylister i toppen av lervaag.net. Jeg har gjort min første egne test på dette, og henter de 20 mest spilte sangen på last.fm. [...]

    Svar på denne kommentaren

Legg igjen en kommentar

Din e-postadresse vil ikke bli publisert. Obligatoriske felt er merket med *

Du kan bruke disse HTML-kodene og -egenskapene: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>