The other day, a client asked me to create something that would display a listing of cars on their website so that users didn’t have to leave their site to see their inventory. The big problem was that the website they were using to list cars didn’t have any kind of news feed or an API to pull from.
Luckily, jQuery is awesome and we can (easily) make this happen. We’re even going to go one step further and add a carousel to the data we retrieve. Click here to see a demo of what we’re going to do.
I’ve divided this up into sections as well. Some folks will only want the bare bones solution so they can add their own stuff, while some will want to use all of this for their projects. It’s not necessary to go beyond the first section to get the jQuery scraper to work. However, you might find some awesomeness.
First of all, it’s always good to have a few different tutorials to reference. No job is exactly the same, and occasionally you have to borrow ideas from a few sources to get what you want. Here are the two posts on this topic that I found the most useful. They also include source files.
How to Syndicate Content Without Utilizing a News Feed
Use jQuery and PHP to scrape page content
For the sake of this example, I’m just going to use Antithesis Comedy. Even though this website has an RSS feed, you can pretend it doesn’t. It’s important to note that this technique should only be used with permission from the owner and in compliance with the Terms of Use. If they don’t offer a syndication service, there’s probably a reason.
Build the jQuery Scraper
So, lets get this stuff out of the way. The first thing we need to do is include jQuery in between the <head> tags of your page. Then, to access information from another website, we need to use jQuery’s AJAX functions.
<script src="http://code.jquery.com/jquery-latest.js"></script>
<script type="text/javascript">
$("document").ready(function() {
$("#content").load("http://antithesiscomedy.com/”);
});
</script>
Okay, now we’ve loaded the latest version of jQuery and loaded Antithesis Comedy into an element with the id #content.” The problem is, this doesn’t work on modern browsers due to security settings. Never fear, there is a workaround.
Let’s make a separate php file which will use the cURL library to load a web page. Call this new file “curl.php” to keep things simple. Put the following code in it.
<?php
$ch = curl_init("http://antithesiscomedy.com/");
$html = curl_exec($ch);
echo $html;
?>
Now, go back to our original JavaScript in the head of our document and make the necessary changes.
<script src="http://code.jquery.com/jquery-latest.js"></script>
<script type="text/javascript">
$("document").ready(function() {
$("#content").load("curl.php .excerpt")
});
</script>
The .load() event now pulls from the curl.php file, allowing it to access files that are cross domain. What about the element with class .excerpt? Where did that come from? We only want to pull the excerpts from Antithesis Comedy, not everything on the page. A quick look at the site’s source code shows that each excerpt is wrapped with <div class=”excerpt”>. Adding this piece of code tells jQuery to grab those elements specifically.
Okay, now you need a place in your document to insert this content. jQuery is already looking for an element with id #content. So, in the <body> portion of your page, place the following code.
<div id="content"> <img src="ajax-loader.gif" alt="loading" /> </div>
It’ll replace the ajax-loader.gif image when it’s done loading. Let’s take a look at it now.

Styling the Content
So, it works, but it looks terrible. Scraping the content of the page doesn’t bring the style with it. However, that’s a good thing. Now we can style it ourselves to make it suit whatever website we’re putting this syndication on. When I was developing this for my client, I had to work within very strict width and height parameters.
By looking at the source code of the website you’re scraping, it’s easy to write some CSS. I threw this together really quick. Of course, it would look better if the author of the website would actually write some content. Nevertheless, here’s it is.
#content {
float:left;
width:600px;
}
.thumbnailleft {
float:left;
width:170px;
margin-right:10px;
}
.excerptcontent {
float:right;
width:420px;
min-height:126px;
margin-bottom:10px;
}
.excerptcontent h2 {
font-size:18px;
font-weight:bold;
margin:0;
padding:0;
}
.readmore {
width:600px;
clear:both;
margin:0 0 20px;
padding-bottom:20px;
border-bottom:1px solid #CCC;
}
This was just a quick styling. You talented folks out there will make things look much better. But I’m short on time, so now it looks like this.

For all intensive purposes, you’re done. But, if you’re like me, you’ll want to take it a few steps further. If you’re syndicating this content, chances are you’ll want the links to open in new tabs and you won’t have this much space to display the content.
Open Links In New Window / Tab
A lot of people don’t want users to leave their site when they open an external link, so they have their outgoing links open in a new tab. Even though we don’t have control over the source material, we can use the magic of jQuery to append attributes to links. In this case, we want to append target=”_blank” to every link in the element class .excerpt.
Go back to our original JavaScript in the header and add this line.
$('.excerpt a').attr('target', '_blank');
The code should now look like this. All links inside of <div class=”excerpt”> now open in new windows / tabs. Pretty nifty.
<script src="http://code.jquery.com/jquery-latest.js"></script>
<script type="text/javascript">
$("document").ready(function() {
$("#content").load("curl.php .excerpt",{},function(){
$('.excerpt a').attr('target', '_blank');
});
});
</script>
Adding A jQuery Carousel
Like I stated before, the website I was initially working on this project for had a very specifically sized space for me to work with. They wanted to use this scraper to list their five most recent cars. After I was done, I realized there simply wasn’t the space at all to make it look good. The solution? Display all of their cars, but in a jQuery Carousel that the user could scroll through with a nice animation. It would now fit in the space and exceed their desires, making me look good and them happy. That’s what it’s all about right?
I used bxSlider because it’s simple, awesome, and easy to use. Make sure to check out their website and their awesome product. For most jQuery sliders and carousels to work, we need to have our markup follow a particular format. For bxSlider, it’s like this.
<ul id="carousel_name"> <li>Item One</li> <li>Item Two</li> <li>Item Three</li> <li>Item Four</li> </ul>
Let’s prepare our HTML markup right now to fit with the format. It’s just a simple change.
<div id="content">
<ul id="listings">
<img src="ajax-loader.gif" alt="loading" />
</ul>
<div id="go-next" class="slidercontrol">Next <span style="font-size:10px;">▼</span></div>
<div id="go-prev" class="slidercontrol">Previous <span style="font-size:10px;">▲</span></div>
</div>
Our inherent problem right now is that when the excerpts from Antithesis Comedy are loaded, they’re not wrapped with <li> tags. Once again, jQuery is a badass with handling things like this. Let’s look back at the JavaScript in our header. We need to add another line.
$('.excerpt').wrap('<li />');
This will take every instance of element .excerpt and wrap it with <li> tags. Super convenient. Next, we need to add the code to start the bxSlider and give it some instructions on how to operate. You can find a whole list of options on their website That looks like this.
$(function(){
var slider = $('#listings').bxSlider({
controls: false,
mode: 'vertical',
displaySlideQty: 3,
moveSlideQty: 3,
infiniteLoop: false
});
$('#go-prev').click(function(){
slider.goToPreviousSlide();
return false;
});
$('#go-next').click(function(){
slider.goToNextSlide();
return false;
I don’t like it when tutorials don’t reiterate the code as a whole. So, here’s the whole JavaScript part from the header. You’ll notice that we linked to the bxSlider js file as well, which has to be in the same location as the webpage utilizing the slider. We also changed what element the .load() method loads into. In this case, it’s now the element id #listings, which is the <ul> tag that we added to the markup.
<script src="http://code.jquery.com/jquery-latest.js"></script>
<script src="jquery.bxSlider.js" type="text/javascript"></script>
<script type="text/javascript">
$("document").ready(function() {
$("#listings").load("curl.php .excerpt",{},function(){
$('.excerpt').wrap('<li />');
$('.excerpt a').attr('target', '_blank');
//after content is loaded, fire the carosuel
$(function(){
var slider = $('#listings').bxSlider({
controls: false,
mode: 'vertical',
displaySlideQty: 2,
moveSlideQty: 2,
infiniteLoop: false
});
$('#go-prev').click(function(){
slider.goToPreviousSlide();
return false;
});
$('#go-next').click(function(){
slider.goToNextSlide();
return false;
});
});
});
});
</script>
All that’s left to do now is add some more css to make it look better. Remember that you can change a bunch of the properties with the carousel as well. I currently have it displaying 2 excerpts and cycling two more on each click. You’ll want to add these new properties to your CSS. It’s important that the <ul> and <li> tags don’t have margins or padding.
#listings {
width:600px;
padding:0;
margin:0;
list-style-type:none;
}
#listings li {
padding:0;
margin:0;
list-style-type:none;
height:205px;
}
.slidercontrol {
float:right;
margin:0 10px;
color:#006;
cursor:pointer;
}
Cool, so now the entire thing is complete. Let’s take one last visual look at all of this.

Here’s the entire thing, one last time for you.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>Web Scrape Syndication</title>
<link rel="stylesheet" href="style.css" type="text/css" media="screen" />
<script src="http://code.jquery.com/jquery-latest.js"></script>
<script src="jquery.bxSlider.js" type="text/javascript"></script>
<script type="text/javascript">
$("document").ready(function() {
$("#listings").load("curl.php .excerpt",{},function(){
$('.excerpt').wrap('<li />');
$('.excerpt a').attr('target', '_blank');
$(function(){
var slider = $('#listings').bxSlider({
controls: false,
mode: 'vertical',
displaySlideQty: 2,
moveSlideQty: 2,
infiniteLoop: false
});
$('#go-prev').click(function(){
slider.goToPreviousSlide();
return false;
});
$('#go-next').click(function(){
slider.goToNextSlide();
return false;
});
});
});
});
</script>
</head>
<body>
<h1>Super Awesome Web Scraper! Now With Animations!</h1>
<div id="content">
<ul id="listings">
<img src="ajax-loader.gif" alt="loading" />
</ul>
<div id="go-next" class="slidercontrol">Next <span style="font-size:10px;">▼</span></div>
<div id="go-prev" class="slidercontrol">Previous <span style="font-size:10px;">▲</span></div>
</div>
</body>
</html>
And that’s it. Now you have what looks like syndicated content, even without an RSS feed or an API. It’s also very flexible. However, if the site you’re scraping changes their markup, it will break what you’re retrieving.
Feel free to leave suggestions or fixes in the comments. It helps us all get better. I’d also love to see what you are coming up with.
Nov 19th, 2011


415-378-1381