Ars Technica penned an interesting article this weekend on ad blocking and how it impacts an outfit like theirs. This spawned discussions of varying quality ranging from thoughtful discourse on the problems of client-side ad implementations, all the way to "WTF FUCK ADVERTISING".
I've been in the advertising game in some capacity for nearly a decade now, so I have a bit of a small- to medium-size publisher perspective in all this. Ad blocking has been around for the majority of that time, and that we're still in a battle about it is interesting to me.
It turns out monetizing entire new mediums is difficult.
Hey, let's start with users. A large — and vocal — chunk of users block ads. There's three reasons:
This covers a slew of ads ranging from poorly-designed ads (audio, graphically substandard, attention-grabbing, invasive) to resource-intensive ads (otherwise known as the "Flash is garbage" movement). In other words, people block ads because they fear ads may otherwise impact their productivity.
This covers everything from low-brow ads ("LOSE WEIGHT FAST!") to adult ads to ads that just don't interest people on a broad level. Personally, I don't even mind ads that aren't targeted to me if the ad themselves are well-done; even though I might not be in the market for a BMW, it's nice to see a well-designed, full-page ad for one of their cars.
A small group of ad blockers literally won't care and will block regardless.
As advertising-funded site owners, I'm going to argue the first and last are irrelevant for us. The freeloaders you can't deal with regardless (see: file sharing), and those that block from a technical standpoint are going to be extremely difficult to please, for the simple reason that we've hammered it into their subconscious for years that advertising consists of invasive content. Even if Ars, for example, promises no popups, no Flash, no audio, and no interstitials, users just don't think that way. Ads change over time. This page view may be different than the next. It's easy to block, so might as well block than risk having something jump out at you down the line.
Instead, targeting is a far better option. This includes typical content relevancy (a la AdSense), but it also includes better, high-brow ads. It's the same reason people watch the Super Bowl commercials instead of flipping away: ads are just better then. Far easier to watch. The problem, of course, is that these two areas are extremely hard to make work.
If you're a small- to medium-sized publisher (even, say, up to Ars-sized), finding advertising is both difficult to do and a pain in the ass to do so. Handling billing, making schedules, monetizing every page view, setting up default chains... just the baseline concepts are a grind to deal with, and every hour you spend reworking ads you take away from doing the stuff you really enjoy doing: namely, working on your site and creating something new.
The path of least friction is to outsource to 3rd party networks. Google, Tribal Fusion, Amazon, whomever. It's far easier than doing direct sales, and they ensure that you can monetize your entire traffic stack from top to bottom. But that's exactly the reason people block: the ads you get are usually shitty, irrelevant, and a pain to deal with. That's the core of the problem: the path of least resistance is the path of most suck.
There's ways to get around this. Google AdSense, for example, is far less invasive, but you're still gambling on the quality of text results to be contextual and not low-brow, and that's a risky gamble. Ideally, networks like The Deck and BuySellAds start taking over as the easiest thing to implement. I run BuySellAds on Good-Tutorials and, though it doesn't offer me enough of a breadth of advertisers to run them exclusively, it does consistently run ads that are topical, neatly-designed, and more clickable than anything else currently.
I don't think this is going to happen. If the last decade was instructive (which it should have been), it's taught us that site owners are scum or otherwise don't care enough to make advertising an effective long-term revenue model. I doubt advertising is going to completely collapse, of course, but between ad blocking, the user's natural avoidance inclination, and the general decreasing effectiveness of advertising, things aren't going to get better. And the problem with ad blockers is that the online advertising industry has been so messed-up for so long that honest publishers like Ars get slotted into the same grouping as your Viagra vendor and they feel the squeeze financially because of it.
There's plenty of great technical writeups and non-technical writeups on REST. In short, REST is one way to model and present your data to your users in a consistent, logical fashion that lends itself to being accessible humans, machines, and your own developers. (I don't know if I just called developers neither human nor machine, but let's go with it.)
There's a number of great technical reasons to move to an architecture like REST. In fact, there's a number of great reasons to implement SOAP, XML-RPC, or similar architecture on your app, the distinctions being less relevant for this post. But REST isn't just for API clients or web browsers: REST is for people, too.
Part of what REST does really well is help you to define the resources for your app. User, Post, Tag, Comment... all of these form the concept of a blog. By defining your structure this way, not only can you craft a machine-readable and machine-writeable interface for your site, but you can end up centering your UI around this, too. It's easy enough to scaffold your usual CRUD of an app: a barebones form to create a new Post, a page to show that Post, an index page to list Posts.
Humans understand this. I'd wager even your stereotypical I-don't-understand-computers Farmville player has a basic grasp on the basics of REST, even though they haven't the foggiest of what a "resource" is. They might not know how to do something upon first visit, but they know what resource they want to interact with. Posting a comment on a blog post? Look for the form labeled "New comment". Signing up for an account? Look for the "New account" button. As long as the resources are basic and intuitive — "create a new TransientSubscriberSubscriptionNode" might be slightly too complex, for example — humans are going to have a good chance at figuring out how to work with your objects.
This is all pretty straightforward, of course. A comment on a blog is simple enough that it's hard for anyone to get that wrong. The problem is when you bleed between objects or devise complex on-page ajax interactions. I have nothing but reckless intuition to back this up, but I'd bet that the majority of the most-confusing forms and interactions online stem from a failure to separate resources sufficiently, or a failure to properly define resources for your app (one resource doing the job instead of splitting things up into smaller pieces, for example).
Admin screens are a magnet for this. The noble idea is to have a few screens to manage as much data as possible, since we really care about efficiency and flexibility and productivity and other words ending in -y. This is where the minefield of checkboxes and radio buttons and dropdown filters come from when you're trying to graft one screen onto multiple interactions with resources that are sort of related but not exactly related. Yes, it can certainly work, but the more you drift from that simplistic, single-resource mindset, the harder it is to intuitively grasp what the hell's going on here. And sure, you can add help text and documentation and mouseovers and plenty more to explain how all these doohickeys interact with the data, but that just means you have a more complex screen that's harder to use and harder to get others to use. This doesn't even take into account the harder technical hurdles you face with complex screens, either.
It's not just admin screens, of course. Complex, multi-page signup forms may expand beyond just the User object and into other areas (Profile, Social Networks, Preferences, Billing, and so on). There may be a tendency to have a listing page that offers inline editing, state changes, and child creation for each object on the page. The listing page may, in fact, list a mixture of four separate resources rather than having four discrete lists. By bypassing the simple and straightforward, the user has to sit and think about how they might go ahead and accomplish their goal. I don't know about you, but users aren't the brightest tool in the shed, so leaving them to their own devices to think for themselves probably will sink your company and likely will cause California to sink into the ocean.
I'm not saying any of these are wrong, of course. REST is meant to be broken, really. It's not entirely all-encompassing; we've all skimped on a show action when the data is small enough to put on the index action, or we've added a few actions to help separate out complex state changes. But every time you cram more interactions into your controller than those magic seven actions or try to consolidate multiple resources into one page, you're not just going against the REST implementation grain; you're going against what might be the most intuitive route for the user.
I alluded to this yesterday, but finally have been able to release Boastful, a jQuery plugin that grabs "tweetbacks" for your blog. Tweetbacks are like trackbacks- every time someone mentions your blog on Twitter, you can pull in those mentions and print them out on your blog. This has gained in prevalence lately with 3rd party services like Disqus, which grabs all those tweets and integrates them with your normal blog comments. That's a really crappy solution.
Every year or so, the blogosphere circles the wagons and self-germinates on the topic of blog comments. Most of the elitist bastards of the internet — myself included — go sans-comments for ease of living without spam and for added focus on your writing. So I'm not terribly interested in pulling in tweets themselves and tossing them on my blog, particularly since I don't have traditional blog comments in the first place. But I like the idea of at least offering some aspect of dialogue, some aspect of community to the site. The Disqus way of doing things — regurgitating tweets on-page like a comment — never made any sense to me for the simple fact that tweets aren't comments. It's not just Disqus, either; almost every implementation I've seen works this way. This is why if you see this functionality on a site it's almost always tweets that take the form of "RT @someone RT @someoneelse RT @originator Hey! A link! http://example.com". 140 characters (minus your short URL) is not a lot of space to add your personal flavor. But it's kind of cool to see who's linking to you, what the general sentiment is, that sort of thing.
So I tackled this a bit differently. You can see it in my blog footer now (once this post gets mentioned online, that is). It looks like this:

Just the avatars, with a tooltip that lets you look at the tweet if you really are interested. Simple, not obtrusive, and hopefully something where the average reader might say "Hey, I can get some exposure just by tweeting this? Nifty!" (Actually, it might be nice to pull in follower numbers or Twitter bios here, too, as a way for existing readers to find more interesting people to follow. But that'll have to wait for a future version.)
The other difference with this is that it uses Topsy's API, which handles all the yucky details of old data that Twitter normally wouldn't return results on, and breaks through URL shorteners to get to the actual page in question (usually a search for zachholman.com/some-page won't match a shortened url like is.gd/some-string). So no matter how they link to your blog post, it should still be picked up by boastful.
The code's on GitHub, and you can browse the readme there for some additional background on the technical side of things. Hope you dig it, fork it, contribute back, and be merry. Or tweet a link to this page over Twitter. Your mug will look good down there.
iTunes is old. It was first released in 2001, with the majority of its code then based on SoundJam MP, which can trace its origins way back to 1999. After QuickTime's recent Snow Leopard rebuild, iTunes is left as one of Apple's oldest apps. That leaves it on a relatively ancient Carbon foundation, which carries some implications John Gruber detailed.
If we accept that iTunes will eventually go through The Big Rewrite — and that assumption might warrant some discussion, but let's go with it for now — what form does iTunes take? Does the existence of iPad fundamentally change iTunes' future? What can Apple's direction as of late tell us about "iTunes X"?
There are three main concepts that form iTunes: Your Syncing, Your Library, iTunes Store.
This is the eight billion pound gorilla in the app. There's no huge mysteries behind playing a song. You don't need to innovate very hard on that. But syncing in iTunes has gone from very humble and very music-only iPod beginnings to full-fledged broad media iPhone syncing to hybrid iPod+iPhone+iPad syncing. In The Big Rewrite of iTunes, this is absolutely the area that will receive the most attention.
When I first talked about the iPad, part of the problem I had was the concept of where your data lived. In the current scheme, everything surrounded your main machine. You hooked up your devices — iPod, iPhone, and now iPad — and they become slaves to the content hosted on your machine. And it is machine. Not machines. This has been a common gripe for years: how can you keep your music and library metadata in sync between your iMac and your MacBook? It's a huge pain in the ass, and even if you do that your devices are still basically tethered to one specific machine. It's not just a "hardcore techie problem", either; I've heard this gripe from a variety of sources, including my dad. With a more digital lifestyle, more people want to use and sync their stuff in more places.
So where does that put us? The natural direction, of course, is the cloud. It's the holy grail, really: over-the-air syncing of all of your data. You become effectively free from having to worry about where your data is housed, which device is up to date and which is doing the updating, and hell, you don't even have to worry about files (which is one of the things Apple seems poised to dismantle, and for good reason: the file system is still a confusing concept to many). This solves a lot of my initial worries about the iPad, too; if you're not tied to a certain machine, the iPad itself becomes far more capable as a standalone machine.
This is a technical nightmare. Part of it is sheer bandwidth, part of it is reliability, part of it is it's just too damn new of a concept. The bandwidth issue can be solved by, you know, building a massive, massive data center. But the rest is typical cloud computing criticisms, and for good measure. There's been quite a few high-profile mishaps and unreliability. Pushing everyone's media and data to the cloud would be, in a lot of instances, just frightening.
But still achievable. I'm an avid supporter of MobileMe. Between three Macs and an iPhone, it's managed all of my data beautifully, to the point where I no longer care or worry about where my contacts or email settings are. But Apple did have its fair share of hiccups when MobileMe launched, and one has to take into account that the underlying technology, .Mac, has been around for nearly a decade, slowly being built upon and improved the whole time. Expanding that same tech would be a substantial feat, to say the least. But the payoffs are fantastic.
I'd imagine direct, over-the-line syncing would be here for quite some time, and it likely would be the only option for the large initial sync and any large video or music additions, mostly for the sake of speed (even wireless N is painfully slow compared to USB 2.0 and 3.0). But after that initial sync, one could imagine the incremental downloads and uploads of your library metadata — play counts, skip counts, titles, album art, and so on — could be done entirely over-the-air, likely when you're not even using your device. Download an album on your MacBook and, like MobileMe contacts, it gets pushed simultaneously to your iMacs at work and home and your iPhone in your pocket.
One big question mark is how much data gets pushed to Apple's servers. Can you only send Music Store-purchased songs to the cloud, or can you send your songs pirated from Napster in 1999, too? How much will the EFF piss themselves when they realize how much information Apple would have on you? Does the central server even make sense in this scenario, or will you instead deem one of your own machines as the "host" and push all the updates out directly over your own private pipes? A lot of questions, a lot of problems, but still one tantalizing prospect.
This is, and always will be, the bedrock of iTunes. It's a way to access your media: music, video, podcasts, and soon-to-be likely e-books. It may seem fairly straightforward at first; list your media, play your media. But I wouldn't be surprised if it's some of the gnarliest code Apple's got. The problem is that, over a decade of use, feature additions, and rewrites, there's a lot of stuff in there. Between playlist subtleties, library sorting, album art display, multiple views (including Cover Flow, List, and Grid), iTunes Store hooks, library metadata, audio file conversion, iTunes DJ, iTunes Genius, ringtones, album art downloading, ratings, and a slew of underlying code tying it all together... it's a feat it works as well as it does in its current form. And the major problem with changing any of this is that iTunes is arguably the most entrenched software out there. Regardless of technical skill level, iTunes is the place most users call "home". It's their media. And everyone has very specific, custom ways of accessing their media. Removing, say, Skip Counts may utterly ruin some subset of users that tend to shuffle their library based on songs they've never skipped, for example.
But in The Big Rewrite, this is an area I suspect would get reworked. There's just so much stuff to handle, and the feature bloat of iTunes has grown substantially. I can't think of one feature release where Apple has ever removed anything from iTunes. That's pretty significant, given Apple is the type of company willing to completely throw away iMovie and replace it with a new version entirely. If they were to start stripping out smaller, lesser-used features for the sake of streamlining and simplicity, this will be the area that will get the most amount of griping and criticism. I suspect they'll keep the majority of the UI and look and feel around as they swap out the underlying engine, but those inconsistencies will creep through the cracks.
The Store, interestingly enough, has gone through The Big Rewrite. And no one's really noticed. While the Store had previously used some type of web-oriented technologies, as of iTunes 9 it's basically bleeding-edge HTML in a thin WebKit wrapper. This is a big deal. It means for one, they can leverage their existing investment in WebKit for cross-platform compatibility, and two, they can draw from the comparably bigger well of web designers and developers for a more flexible storefront. As a byproduct, this has let them shift more of their content onto the "proper" web; you can browse most of the music store without iTunes already, and just this week they've flipped on the ability to view apps from AppStore on the web.
In terms of the next version of iTunes, the Store is effectively independent of any new development work. It's not nearly as big of a factor as it once was.
This just feels like the way to do it. But I wouldn't even wager on a timeline. Apple's been very cautious with iTunes for quite some time, and if they push something this substantial to market and they experience widespread failures, it'll take some time to turn it around. But given iTunes' lag behind other 64bit apps in Snow Leopard, Apple's move to Cocoa, the sheer "oldness" of the codebase, and the much larger focus as of late on satellite devices with iPhone and iPad leading the way, it seems like now's the best time to really make a dent in how iTunes is positioned and built.
If there's one thing I've learned from my experiences with women, it's that they secretly can't control themselves if a man is there to sweep them off their feet with tales of legacy data migration suavity. (If you're wondering why I didn't title this in the reverse, it's simply because I didn't feel that it was any particular secret that every man is most attracted to legacy data conversions.)
So, you're a few weeks away from launch day and you have a fair amount of data in an older database structure and you want it accessible to your new app. The possibilities here are really quite endless, and it depends entirely on a multitude of questions: do you need access to the old data in the old app? Is speed an issue? Can you fill records after the fact, or do you need them all immediately in the new format pre-launch? Is this a one-off or a regular thing? Will anyone notice if you instead just add a couple hundred thousand rows of quotes from Dr. Quinn, Medicine Woman instead of bothering with a possibly delicate transfer? These are all equally important questions. I recently did one such transfer, and here's my process.
I actually chose a few different routes. A lot of the data was going to be in a similar structure in the new system and the quicker the better, please. For those cases, you're best off to do everything via database dumps and raw SQL commands. By far, that's going to be the fastest way of doing things. Dropping out into Ruby land is going to be painfully slow. But in some cases, that's a tradeoff worth making. In this post, I'm going to detail the latter process. The benefit of pushing data through Rails is that you have access to model-specific validations, callbacks, and relationships. If you're dealing with complex data structures spread over multiple models and relationships (both in the old and new apps), sometimes the longer processing time is a decent tradeoff, particularly if you can kick off the process and slowly migrate post-deploy.
The first thing to do is make the data accessible to your new app. I've found it's easiest to create a temporary table that you can nuke when you're done. If we're migrating Users between totally different codebases, I'd use mysqldump to dump your old users table and import it with a prefix: old_and_busted_users, for example.
To actually use this data I've seen a few different approaches. The full-blown route is to script/generate model OldAndBustedUser, which gives you access to an ActiveRecord instance of the old_and_busted_users table. That's fine, and it offers you some additional freedom in terms of testing and relationships, but I've found that in a number of cases you don't really need to scaffold yourself out a full-fledged model. For the most part, you just need access to the table and to some relationships. Besides which, generating an entirely new model for your app means you're more likely to keep this old code polluting your new code, which really kind of sucks if you're only going to be doing this once.
The trick I've used is to define all of your models inline. For example, if you're running this as a rake task:
namespace :import do
desc "Import our old and busted users into the new hotness."
task :users => :environment do
class OldAndBustedUser < ActiveRecord::Base ; end
class OldAndBustedProfile < ActiveRecord::Base ; end
end
end
It's a little funky defining classes like that, but for most scenarios you just don't need much scaffolding. If your task balloons to hundreds of complex, intricate lines, definitely think about refactoring quite a bit more. The model names should coincide to your temporary tables. If they don't, you can do so with set_table_name.
Once you have your temporary models set up, you can use them as you'd expect:
task :users => :environment do
class OldAndBustedUser < ActiveRecord::Base ; end
class OldAndBustedProfile < ActiveRecord::Base ; end
OldAndBustedUser.find_each(:batch_size => 2000) do |old_and_busted_user|
User.create(:name => old_and_busted_user.name, :email => old_and_busted_user.email)
end
end
...the find_each being important since we're in Ruby Land, where instantiating huge object arrays willy-nilly will deplete your available memory faster than the alcohol you experienced on your 21st birthday. You also can set up relationships as needed, too:
task :users => :environment do
class OldAndBustedUser < ActiveRecord::Base
has_one :profile, :class_name => 'OldAndBustedProfile', :foreign_key => 'profile_id'
end
class OldAndBustedProfile < ActiveRecord::Base
belongs_to :user, :class_name => 'OldAndBustedUser'
end
OldAndBustedUser.find_each(:batch_size => 2000) do |old_and_busted_user|
User.create(:name => old_and_busted_user.name,
:email => old_and_busted_user.email,
:twitter_handle => old_and_busted_user.profile.twitter)
end
end
Now for the Pro Tips: if you're doing a lot of data importing, you're bound to have to wipe your database and repopulate a gaggle of times. I found a lot of the manual steps in dumping data to be a pain, particularly in terms of importing into the temporary table. So I'd suggest making use of sed for some fun file replacement (which sure beats having to pick through vi multi-hundred-megabyte-db-dump every time). Assuming you're using regular mysqldump output:
sed -i '' -e 's/`users`/`old_and_busted_users`/g' multi-hundred-megabyte-db-dump.sql
Your mileage may vary on that particular sed command, though. OS X uses BSD sed, Linux uses GNU sed. I've found the difference to be the spacing after the -i and before the single quotes (denoting an in-place substitution, which is usually safe enough considering you can always re-dump from MySQL if the file gets corrupted).
Since we're dealing with Rails, we get all the fun magic methods, but they're not all fun and games: they'll overwrite your created_at and updated_at fields. Have no fear; record_timestamps is here:
ActiveRecord::Base.record_timestamps = false
method_that_does_crazy_zany_stuff
ActiveRecord::Base.record_timestamps = true
You'll also want to think about turning validations on or off (with a user.save or user.save(false))- validations are good for ensuring the validity of your data, but it's another point of slowness, and if you're importing legacy data there's a good chance your data might break validations anyway.
If all else fails, IMDB maintains a startlingly large repository of Dr. Quinn quotes.