Close this form   Up Arrow

Please take a minute to fill out this form. This is the easiest way to contact me and help me understand what you need. The more detail you can provide, the faster we can get your project on the road. Thanks!

First and Last Name*

Your Email*

What best describes your situation*

Describe your project*

What is your budget?*

What do you need your project completed?*

Verify you're not a robot (like the guy at the bottom of the page)*
 captcha

Prefer to Call?

You can call me directly at 415-378-1381. Call at any time, if I don't pick up, I'll give you a call back as soon as possible. Make sure to let me know what type of project you're calling about.

The Network

From time to time, I collaborate with other designers and developers. I've built up a reliable network of talented individuals who can help out on more complicated projects. If you're looking for video production, I have an entire crew and set of equipment ready to go, just for you.

Hand Crafted

Every website, graphic, or video I make is hand crafted down to the last pixel. And it doesn't end after the product has been delivered because I'm always here to answer your questions or help you get comfortable with your new website.

No Bulls**t

I'm working for you, and that warrants my complete honesty and transparency. I'll give you my honest opinion, even if it might not be what you want to hear. I'll also keep you in the loop with frequent project updates. You're a human being and I'll treat you like one.

The other day, a client asked me to create something that would display a listing of cars on their website so that users didn’t have to leave their site to see their inventory.  The big problem was that the website they were using to list cars didn’t have any kind of news feed or an API to pull from.

Luckily, jQuery is awesome and we can (easily) make this happen.  We’re even going to go one step further and add a carousel to the data we retrieve.  Click here to see a demo of what we’re going to do.

I’ve divided this up into sections as well.  Some folks will only want the bare bones solution so they can add their own stuff, while some will want to use all of this for their projects.  It’s not necessary to go beyond the first section to get the jQuery scraper to work.  However, you might find some awesomeness.

First of all, it’s always good to have a few different tutorials to reference.  No job is exactly the same, and occasionally you have to borrow ideas from a few sources to get what you want.  Here are the two posts on this topic that I found the most useful.  They also include source files.

How to Syndicate Content Without Utilizing a News Feed
Use jQuery and PHP to scrape page content 

For the sake of this example, I’m just going to use Antithesis Comedy.  Even though this website has an RSS feed, you can pretend it doesn’t.  It’s important to note that this technique should only be used with permission from the owner and in compliance with the Terms of Use.  If they don’t offer a syndication service, there’s probably a reason.

Build the jQuery Scraper

So, lets get this stuff out of the way.  The first thing we need to do is include jQuery in between the <head> tags of your page.  Then, to access information from another website, we need to use jQuery’s AJAX functions.

  <script  src="http://code.jquery.com/jquery-latest.js"></script>
  <script type="text/javascript">
    $("document").ready(function()  {
      $("#content").load("http://antithesiscomedy.com/”);
    });
  </script>

Okay, now we’ve loaded the latest version of jQuery and loaded Antithesis Comedy into an element with the id #content.”  The problem is, this doesn’t work on modern browsers due to security settings.  Never fear, there is a workaround.

Let’s make a separate php file which will use the cURL library to load a web page.  Call this new file “curl.php” to keep things simple.  Put the following code in it.

<?php
$ch = curl_init("http://antithesiscomedy.com/");
$html = curl_exec($ch);
echo $html;
?>

Now, go back to our original JavaScript in the head of our document and make the necessary changes.

  <script  src="http://code.jquery.com/jquery-latest.js"></script>
  <script type="text/javascript">
        $("document").ready(function() {
        $("#content").load("curl.php .excerpt")
    });
  </script>

The .load() event now pulls from the curl.php file, allowing it to access files that are cross domain.  What about the element with class .excerpt?  Where did that come from?  We only want to pull the excerpts from Antithesis Comedy, not everything on the page.  A quick look at the site’s source code shows that each excerpt is wrapped with <div class=”excerpt”>.  Adding this piece of code tells jQuery to grab those elements specifically.

Okay, now you need a place in your document to insert this content.  jQuery is already looking for an element with id #content.  So, in the <body> portion of your page, place the following code.

<div id="content">
   <img src="ajax-loader.gif" alt="loading" />
</div>

It’ll replace the ajax-loader.gif image when it’s done loading.  Let’s take a look at it now.

unstyled scrape

Styling the Content

So, it works, but it looks terrible.  Scraping the content of the page doesn’t bring the style with it.  However, that’s a good thing.  Now we can style it ourselves to make it suit whatever website we’re putting this syndication on.  When I was developing this for my client, I had to work within very strict width and height parameters.

By looking at the source code of the website you’re scraping, it’s easy to write some CSS.  I threw this together really quick.  Of course, it would look better if the author of the website would actually write some content.  Nevertheless, here’s it is.

#content {
	float:left;
	width:600px;
}

.thumbnailleft {
	float:left;
	width:170px;
	margin-right:10px;
}

.excerptcontent {
	float:right;
	width:420px;
	min-height:126px;
	margin-bottom:10px;
}

.excerptcontent h2 {
	font-size:18px;
	font-weight:bold;
	margin:0;
	padding:0;
}

.readmore {
	width:600px;
	clear:both;
	margin:0 0 20px;
	padding-bottom:20px;
	border-bottom:1px solid #CCC;
}

This was just a quick styling.  You talented folks out there will make things look much better.  But I’m short on time, so now it looks like this.

Styled Scrape

For all intensive purposes, you’re done.  But, if you’re like me, you’ll want to take it a few steps further.  If you’re syndicating this content, chances are you’ll want the links to open in new tabs and you won’t have this much space to display the content.

Open Links In New Window / Tab

A lot of people don’t want users to leave their site when they open an external link, so they have their outgoing links open in a new tab.  Even though we don’t have control over the source material, we can use the magic of jQuery to append attributes to links.  In this case, we want to append target=”_blank” to every link in the element class .excerpt.

Go back to our original JavaScript in the header and add this line.

$('.excerpt a').attr('target', '_blank');

The code should now look like this.  All links inside of <div class=”excerpt”> now open in new windows / tabs.  Pretty nifty.

<script src="http://code.jquery.com/jquery-latest.js"></script>
<script type="text/javascript">
   $("document").ready(function() {
      $("#content").load("curl.php .excerpt",{},function(){
         $('.excerpt a').attr('target', '_blank');
      });
   });
</script>

Adding A jQuery Carousel

Like I stated before, the website I was initially working on this project for had a very specifically sized space for me to work with.  They wanted to use this scraper to list their five most recent cars.  After I was done, I realized there simply wasn’t the space at all to make it look good.  The solution?  Display all of their cars, but in a jQuery Carousel that the user could scroll through with a nice animation.  It would now fit in the space and exceed their desires, making me look good and them happy.  That’s what it’s all about right?

I used bxSlider because it’s simple, awesome, and easy to use.  Make sure to check out their website and their awesome product.  For most jQuery sliders and carousels to work, we need to have our markup follow a particular format.  For bxSlider, it’s like this.


<ul id="carousel_name">
   <li>Item One</li>
   <li>Item Two</li>
   <li>Item Three</li>
   <li>Item Four</li>
</ul>

Let’s prepare our HTML markup right now to fit with the format.  It’s just a simple change.

<div id="content">
   <ul id="listings">
      <img src="ajax-loader.gif" alt="loading" />
   </ul>
   <div id="go-next" class="slidercontrol">Next <span style="font-size:10px;">&#9660;</span></div>
   <div id="go-prev" class="slidercontrol">Previous <span style="font-size:10px;">&#9650;</span></div>
</div>

Our inherent problem right now is that when the excerpts from Antithesis Comedy are loaded, they’re not wrapped with <li> tags.  Once again, jQuery is a badass with handling things like this.  Let’s look back at the JavaScript in our header.  We need to add another line.

$('.excerpt').wrap('<li />');

This will take every instance of element .excerpt and wrap it with <li> tags.  Super convenient.  Next, we need to add the code to start the bxSlider and give it some instructions on how to operate.  You can find a whole list of options on their website  That looks like this.

$(function(){
   var slider = $('#listings').bxSlider({
   controls: false,
   mode: 'vertical',
   displaySlideQty: 3,
   moveSlideQty: 3,
   infiniteLoop: false
});

$('#go-prev').click(function(){
   slider.goToPreviousSlide();
   return false;
});

$('#go-next').click(function(){
   slider.goToNextSlide();
   return false;

I don’t like it when tutorials don’t reiterate the code as a whole.  So, here’s the whole JavaScript part from the header.  You’ll notice that we linked to the bxSlider js file as well, which has to be in the same location as the webpage utilizing the slider.  We also changed what element the .load() method loads into.  In this case, it’s now the element id #listings, which is the <ul> tag that we added to the markup.

<script  src="http://code.jquery.com/jquery-latest.js"></script>
<script src="jquery.bxSlider.js" type="text/javascript"></script>
<script type="text/javascript">
    $("document").ready(function() {
        $("#listings").load("curl.php .excerpt",{},function(){
			$('.excerpt').wrap('<li />');
            $('.excerpt a').attr('target', '_blank');

			//after content is loaded, fire the carosuel
			$(function(){
			  var slider = $('#listings').bxSlider({
				controls: false,
				mode: 'vertical',
				displaySlideQty: 2,
				moveSlideQty: 2,
				infiniteLoop: false
			  });

			  $('#go-prev').click(function(){
				slider.goToPreviousSlide();
				return false;
			  });

			  $('#go-next').click(function(){
				slider.goToNextSlide();
				return false;
			  });
			});
		});
    });
</script>

All that’s left to do now is add some more css to make it look better.  Remember that you can change a bunch of the properties with the carousel as well.  I currently have it displaying 2 excerpts and cycling two more on each click.  You’ll want to add these new properties to your CSS.  It’s important that the <ul> and <li> tags don’t have margins or padding.

#listings {
	width:600px;
	padding:0;
	margin:0;
	list-style-type:none;
	}

#listings li {
	padding:0;
	margin:0;
	list-style-type:none;
	height:205px;
	}

.slidercontrol {
	float:right;
	margin:0 10px;
	color:#006;
	cursor:pointer;
	}

Cool, so now the entire thing is complete.  Let’s take one last visual look at all of this.

Styled jQuery Scrape with jQuery Carousel

Here’s the entire thing, one last time for you.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>Web Scrape Syndication</title>

<link rel="stylesheet" href="style.css" type="text/css" media="screen" />
<script  src="http://code.jquery.com/jquery-latest.js"></script>
<script src="jquery.bxSlider.js" type="text/javascript"></script>

<script type="text/javascript">
    $("document").ready(function() {
        $("#listings").load("curl.php .excerpt",{},function(){
			$('.excerpt').wrap('<li />');
            $('.excerpt a').attr('target', '_blank');

			$(function(){
			  var slider = $('#listings').bxSlider({
				controls: false,
				mode: 'vertical',
				displaySlideQty: 2,
				moveSlideQty: 2,
				infiniteLoop: false
			  });

			  $('#go-prev').click(function(){
				slider.goToPreviousSlide();
				return false;
			  });

			  $('#go-next').click(function(){
				slider.goToNextSlide();
				return false;
			  });
			});
		});
    });
</script>
</head>

<body>
<h1>Super Awesome Web Scraper!  Now With Animations!</h1>

<div id="content">
	<ul id="listings">
	<img src="ajax-loader.gif" alt="loading" />
    </ul>

    <div id="go-next" class="slidercontrol">Next <span style="font-size:10px;">&#9660;</span></div>
    <div id="go-prev" class="slidercontrol">Previous <span style="font-size:10px;">&#9650;</span></div>
</div>

</body>
</html>

And that’s it.  Now you have what looks like syndicated content, even without an RSS feed or an API.  It’s also very flexible.  However, if the site you’re scraping changes their markup, it will break what you’re retrieving.

Feel free to leave suggestions or fixes in the comments.  It helps us all get better.  I’d also love to see what you are coming up with.