Ved å bruke Spotifys nye Metadata API, kan vi nå lage en oppdatert side med Spotify-lenker for NRK P3s A, B og C-lister.
Empire State of Mind med Jay-Z og Alicia Keys ligger på P3s A-liste denne uka.
Resten av artikkelen er på engelsk.
Two days ago Spotify announced a public Spotify Metadata API. This API makes it possible to lookup track names and get a reference to the Spotify track. I know that NRK P3 expose their playlists on a HTML page, and I decided to create a mashup; a page showing the NRK P3 playlists, where each item was a link that starts playing the song in Spotify.
The result is available at spotify.erlang.no
This is a story of how good things may happen when multiple sources make data available for easy access to the public.
The rest of the article is the full source code of the site above, with explanations.
Grabbing the NRK P3 playlists
The first part is actually the difficult part. The NRK P3 playlists are exposed on a web page with XHTML that does not validate. In general it is not a good idea to put a web page through an XML parser, even if it works when the web page is well-formed, only a minor error in the web page, will cause the parser to be unable to extract anything.
The script is written PHP, and is split in two;
cron.php
that grabs the playlist, lookup using the Spotify Metadata API, and store the result in a JSON cache file. This script is supposed to run every night to keep the list updated as the Spotify library changes, and the playlists changes. It is important to not lookup the full list on every request, so we cache the result for a full day.index.php
grabs the JSON cache file, and presents it in a simple XHTML page with lists of links.
Letting a script act like a web browser, and then extract content from the web pages, is known as WebScraping. Through years of making handy web-tools, I’ve ended up with a personal WebScraper library, that put any web page through libTidy, and then ends up with a XML document. Then I have some helper-functions to extract content using XPath syntax.
Let’s start of by including my WebScraping utility:
// Include an WebScraping library with helper functions to extract content from HTML pages.
require('wscraper.php');
Now, let us create a function that grabs the A-, B-, and C-playlist from the P3 web page.
function getPlaylist() {
$scraper = new WScraper();
$htmlpage = $scraper->getURL('http://nrkp3.no/spillelister/');
$playlists = array();
We iterate through all three lists on the page:
for($i = 1; $i <= 3; $i++) {
$playlists[] = $htmlpage->textMulti("//div[@class='postarea']/ul[" . $i . "]/li", TRUE, TRUE, TRUE);
We use XPath, and this expression //div[@class='postarea']/ul[1]/li
means find a <div>
element with a class
attribute set to postarea
, and then get the first <ul>
child and its <li>
list-item children. The textMulti
converts the content of the <li>
to plain text and creates a PHP array out of it.
}
return $playlists;
}
Now, we have a function that returns an array of three playlists, each being an array of track names.
Preparing the search phrase
Next problem that we need to solve, is that the text in the NRK P3 playlists include some information that makes the Spotify API fail to find the content. Here is an example of an item on the playlist:
Black Eyed Peas – I Gotta Feeling (ned fra A)
First, the ‘-’ (dash) separator causes problems for the Spotify search engine. Next, all text in parenthesis on the playlists seem to be meta-text not relevant to the track name it self. So we create a cleaning function that gets a track name which is more likely to be found.
function cleanSearch($in) {
$out = preg_replace('/ – /', ' ', $in);
$out = preg_replace('/\(.*?\)/', '', $out);
$out = preg_replace('/feat[^\s]+/', '', $out);
return $out;
}
We use regular expressions to remove:
- The dash
- Everything in parenthesis
- The words ‘feat.’, because Spotify seldom uses the ‘featuring’ word, instead uses a ‘+’ sign or similar.
The output for the example above will be:
Black Eyed Peas I Gotta Feeling
Which, in fact result in perfect match using Spotify’s search engine.
Using the Spotify Metadata API
Using the Spotify Metadata API is really simple. The query is generated using an URL with a query string parameter, and the result is returned as a complex XML document with a lot of information.
function getSpotifyLink($search) {
$clean = cleanSearch($search);
$res = simplexml_load_string(
file_get_contents('http://ws.spotify.com/search/1/track?q=' . urlencode($clean))
);
We now have the result of the lookup as an XML document, and we check if we have less than one results, if so we just return the search phrase without any Spotify links:
if (count($res->track) < 1) return array('search' => $search, 'clean' => $clean);
If we finds one or more hits, we return more information, including:
- The artist name
- The track name
- The spotify link
- The search phrase before and after cleaning
Everything is collected in an associative array and returned:
return array(
'artist' => (string)$res->track[0]->artist->name,
'track' => (string)$res->track[0]->name,
'link' => (string)$res->track[0]{'href'},
'search' => $search,
'clean' => $clean,
);
}
Putting it together and storing the result
Now we put all the above together, start off by grabbing a playlist, and populating a new result array with spotify links for all of the playlists:
$playlists = getPlaylist();
$spotify = array();
foreach($playlists AS $playlist) {
$newSpotifylist = array();
foreach($playlist AS $track) {
Here we collect the spotify links using the Spotify Metadata API for an individual track:
$newSpotifylist[] = getSpotifyLink($track);
}
$spotify[] = $newSpotifylist;
}
When we are completed, we store the resulting array in a file, encoded using JSON.
file_put_contents('cache.json', json_encode($spotify));
Presenting the result
We, now need to create index.php
a script that reads the cached JSON file and presents it in a simple XHTML page.
We start by reading the cached file:
$spotify = json_decode(file_get_contents('cache.json'), TRUE);
Defines the human readable names on the playlists (in norwegian):
$listnames = array('A-lista', 'B-lista', 'C-lista');
Sends an appropriate HTTP header:
header('Content-type: text/html; charset=utf-8');
Then outputs the content (somewhat truncated):
echo '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<title>NRK P3 Spotify Playlists</title>
<style>[..snipp...]
</style>
</head>
<body><h1>NRK P3 Spotify Playlists</h1>';
We now have the header, and iterate through the lists, and output them:
foreach($spotify AS $k => $spotlist) {
We create a list header, such as ‘A-lista’:
echo '<h2>' . $listnames[$k] . '</h2>';
echo '<ul>';
And we iterate through the tracks on each playlist:
foreach($spotlist AS $spotitem) {
If there is a spotify link found, we creates a HTML link that opens the track in Spotify:
if (isset($spotitem['link'])) {
echo '<li><a href="' . $spotitem['link'] . '">' .
$spotitem['artist'] . ' - ' . $spotitem['track'] . '<br /></a><span style="color: #999; font-size: 80%">' . $spotitem['clean'] . '</span></li>';
} else {
If not, we just output the track name in grey:
echo '<li><span style="color: #888">' . $spotitem['search'] . '</span></li>';
}
}
echo '</ul>';
}
echo '</body></html>';
The author
Andreas Åkre Solberg is working as a scientist for UNINETT and is more than average interested in open access to data. He from time to time make useful web-services, open source software, and likes companies and data owners that make their data available to the public.
Erol Haagenrud
Nice work! 🙂
Kristian J.
Very nice!
I see the track «Blog» by Jaa9 & OnklP isn’t included in the result, even if it is in the Spotify library. I think this is because the XHTML entity & is being sent to the search instead of a simple &. It should probably be replaced in the cleanSearch() function?
Andreas Solberg (NRK)
Thanks for the tip. I added
$out = html_entity_decode($out);
and now it looks like it found the Blog track.
Øyvind M
Cool! Hopefully Spotify will publish an open api for creating playlists soon. That would make it possible to create dynamic spotify playlists based on playlists published on the web. Mixing that ability with something like last.fm’s api would be extremely cool.
NRK P3/Spotify mashup | 73% geek, the rest is girly-bits
[…] P3 and also use Spotify then Andreas Solberg over at NRKBeta has got a present for you. He’s written some nifty code which uses Spotify’s new Metadata API to create a page showing NRK P3’s A, B and C […]
Marius S
Neimen, er det ikke tezla fra BeShare 😛
Har du prøvd Haiku? Fungerer ganske bra, bare 8 år for sent 🙂
Jarle Hammen Knudsen
Tenk om man kunne publisere podkaster gjennom Spotify. Da kunne NRK gjort dem tilgjengelig med musikken intakt siden det er streaming, og jeg kunne hørt på dem med iPhone uten å være online.
Palme
Fantastisk! Takker så meget for koden 🙂
Håkon
Kult! Kan det også legges ut som dynamiske spillelister på spotify.erlang.no/ og ikke bare som enkeltsanger?
Endre H.
Nice 🙂 Hva med å samle alle låtene i en og samme spilleliste (Tilsvarende We are hunted on Spotify: wearehunted.com/)?
Knut Helges verden » NRK P3s ABC-liste
[…] Oppdatering: Jeg ser at NRK har lagt ut en side med lenker til hver enkelt sang på P3-lista som er i Spotify, men dette er ikke dynamiske spillelister som oppdateres på noen måte slik mine lister gjør Siden deres finner dere her: spotify.erlang.no Artikkelen for siden finnere dere her: nrkbeta.no […]
Eirik
Thanks for sharing!
This is certainly the kind of post I appreciate the most. A problem, an idea, some tools and the explanation on how the problem is solved. And; the full source code as a download is great!
Månhus » Länksprutning – 30 October 2009
[…] NRK P3 Spotify Playlists – The beauty of open access to data […]
Ole K
Veldig enig med hva Endre H. sier. Hadde vært veldig kult om man kunne hatt en automatisk oppdaterende A-, B- og C-spilleliste!
@Endre H: takk for linken til wearehunted forresten. Virker som en bra side 🙂
Andreas Solberg (NRK)
Er ikke mulig å lage ferdig spilleliste med Metadata API. Kanskje det er mulig med libspotify? Eventuelt så måtte man gjort noe manuelt, og det liker jeg ikke 😛
Jeg har forresten gjort noen oppdateringer. Jeg henter nå inn spillelistene fra NRK P3, Mp3, Radio1 og VG lista. Har lagt det inn i tabs, og samlet alle sangene (med fjerning av duplikater) under All tracks.
http://spotify.erlang.no
akai
A-ye, Andreas. Denne greia er genial!
Så, neste gang du skal en tur ut får du slenge på en /msg på goog old IRC så skal jeg spandere en øl eller tre..
– På vegne av en generasjon med musikk & lyd-avhengige som liker å være oppdatert på radio-låter: Takk! 🙂
Ole
+1 for spille hele lister (f.eks. A-listen)
Andreas Solberg (NRK)
Takket være bidrag fra Kristian Klette har vi nå oppdaterte Spotify spillelister. Det er link til listene herfra:
spotify.erlang.no/
Kristian er en annen utvikler her hos oss på UNINETT som også er Spotify-bruker.
Kristian Klette
Siste sangen i listen kommer noen ganger opp to ganger i spillelisten enda dog, men går vel greit enn så lenge.
libspotify er forøvrig hat å jobbe med 😛
Simon
Er det mulig å få til dette på Wimp også??
bjorn
cool stuff 🙂
I’ve been pondering doing something similar with the Last.FM API + Spotify Metadata lookup to have a direct link to the songs I listen to in the sidebar on my blog.
Adrian Bengtson
Thanks for publishing this script!
I used it as start when I created p3spotify, a site that fetches tracklists from Swedish P3.
Ole K
Takk til Kristian Klette for å ha ordnet med spillelistene 🙂
» Kule ting med Spotify API – Blogg?
[…] NRKbeta har allerede testet dette med sine p3-lister, og jeg skal fremover prøve å finne andre nyttige lister. Jeg har laget en egen side til dette, og vil gjerne ha innspill til andre lister enn dem jeg har. Du finner siden under Spotifylister i toppen av lervaag.net. Jeg har gjort min første egne test på dette, og henter de 20 mest spilte sangen på last.fm. […]
Toak
Har lekt meg litt med Spotify sitt Medatdata API i det siste, og laget en lignende tjeneste. Så dere hadde lagt ut kildekoden, men ikke til scriptet som automatisk lager playlisten. Kunne dere delt denne koden også? 🙂
Luke
On the site you say that you are using libspotify to autogenerate playlists. How are you doing this?
Guruslask
Hey guys, this doesn’t seem to work anymore. I really had a lot of use of this playlist – any chance you’ll fix it?
activist
NRK P3 listen fungerer ikke lenger (dvs den inneholder bare en sang). De andre listene fikk jeg lagt til i Spotify.
jervi
Har laget spotify-integrasjon på Radio Revolts spillelister nå ved hjelp av koden her. Fungerer utmerket, flott artikkel 🙂
Lars
Bruk Wimp. Wimp er norskt. Spotify er svenskt. Støtt opp om det som lages her i Norge… Nrk som norsk statskanal bør gjøre det synes jeg…
Oer
open.spotify.com/user/oer/playlist/0w2CKzElQf8zCcp8yUZ4sN
Epic party playlist ;D