How to Perform an A/B Test on Headlines, Tweets, Traffic, and More

4.9K Flares Filament.io Made with Flare More Info'> 4.9K Flares ×

CompareDo you ever wonder how often you’re being A/B tested?

The practice is so commonplace among websites and marketers these days that at any given point at any given website you could be part of a grand experiment in optimization. I often hope this is the case. I love the science and analysis behind improvements—both on the web and in the real world—so I find myself clicking a blue button and hoping my participation is making a website better.

I love participating in A/B tests, and I love performing them. We get an opportunity to test a number of different elements on the Buffer blog, always striving to add more value for our readers. It’s an ongoing process for us, and it’s one that I’m excited to show you.

But first off, just so we’re all on the same page …

What is A/B testing?

I imagine that most of you have some idea of the way that A/B testing works (a large chunk of the definition is right there in the name). In essence, an A/B test is a way to measure two versions of something to see which is more successful. This description from LKR Social Media is the perfect way of putting it in layman’s terms:

Have you ever gotten into an argument with a friend about which route is fastest to get from your house to theirs? How’d you settle that bet? You tested it! You both left the same place at the exact same time, went your separate routes, and found out once and for all whose way is the best.

I do this all the time, much to the chagrin of people in my caravan.

Another way of looking at A/B testing is this description from Shopify, which replaces cars with Biology 101.

(A/B testing) is not unlike your high school science experiment. Except, instead of dissecting a frog you’re analyzing which is the better scalpel to slice it open. Which one cuts more smoothly, is easier to handle, saves time and increases overall dissection efficiency (so to speak of course).

YOu may have also heard about multivariate testing, an advanced version of A/B testing that introduces a grid of variables that combine to form a number of different tests. The idea here is the same as A/B, but you end up with more complicated data and more specific results.

(There’s a third type of testing that I won’t go into here but thought was too cool to not mention: multi-armed bandit testing. This type of testing is essentially an automated A/B test that promotes a winning experiment as the data comes in.)

A straightforward A/B testing workflow

One of the deepest resources on A/B testing comes from Unbounce’s ebook on the subject. Conversions are a key part of what Unbounce does best: landing pages. As such, they have a huge amount of expertise on on A/B tests. Their workflow goes something like this:

  1. Brainstorm what you want to test and why. 
  2. Create alternatives to your control page. 
  3. Run your A/B test. 
  4. Promote the winner to be the next control. 
  5. Do it all again. 

Unbounce caps off the A/B discussion with this helpful chart about how to move on after an A/B test. The takeaway here is that it’s okay for a test to fail so long as you understand why.

Unbounce a/b test table

How we A/B test at Buffer

Testing and experimentation are at the core of our company, both the business and the culture. We document a lot of our experimentation—on Buffer products and on personal productivity—over at the Buffer Open blog (which is a sort of new experiment in and of itself).

Our main Buffer blog has been a fertile playground for A/B tests as we are always interested to learn more about what content performs best. Here are some of the recent tests we’ve tried.

How we A/B test our headlines

Writing a must-click headline requires so many different elements: research, experience, intuition, style, and in the case of the Buffer blog, A/B testing. We test all our headlines in hopes we find one that really resonates with our audience.

Our primary testing ground is Twitter. A/B testing on social is a rather inexact science, but so far, we have been able to find reliable data with our headline tests. Here is what we do in a nutshell:

Post two separate headline versions to Twitter, and track which one performs the best.

Our specific headline process looks like this:

  1. For each post, we brainstorm five to 10 headlines and decide among the marketing team which ones we like best.
  2. The winners from step one become our test candidates. We take three headline variations and post them as updates to our Buffer Twitter account. Ideally, the closer together we can post them (e.g., all in the morning or all in the afternoon), the more reliable data we can expect to receive.
  3. We track the results in Buffer analytics to see which headline performed best. The winner becomes the new headline on the post (or stays the same, depending on what we started with).

We end up changing a good number of headlines on the blog, like our post about Hashtags, which went from “A Scientific Guide to Hashtags” to “The Research Behind Hashtags.” Note the big difference in retweets, favorites, mentions, and reach in the comparison below.

First tweet:

Twitter headline test

Second tweet:

Twitter headline test

 

(Some commenters asked for a bit more detail on our headline testing process, and I’m happy to go deeper here. Thanks for the nudge!)

When we look at the comparison of tweets, we would like to see that one of the posts has a significant edge in a particular stat or that the majority of statistics lean in favor of a particular tweet. In the example above, the second tweet’s success with retweets, favorites, mentions, and potential indicated that it might be a better headline than the first.

Retweets and clicks are both useful metrics to observe, and we don’t necessarily like one more than another. Retweets are a great signal to us that people find the headline worth sharing with their own audience, and clicks are a helpful indicator that the headline drives curiosity enough to clickthrough and read. In some ways, retweets and clicks tell us a headline’s popularity for a broad audience vs. a specific individual—both stats have value in determining the right headline.

Of course, it might also be best to point out that A/B tests on social media are not perfect. The varying times of day that we use in our testing can make for significant variables, as can the images we use to share along with the headlines. In the end, we’re just interested in gaining any edge we can to make a headline more meaningful for our audience, and this Twitter test has been a useful indicator so far.

How we A/B test adding photos to social shares

When Twitter tweaked its layout to show image previews right in the Twitter stream, we took it as an opportunity to test.  Do expanded images make any different in engagement and interaction with the content we share on Twitter?

The answer is a resounding yes.

Images in Tweets

To test this, we A/B tested tweets with pictures and tweets without, and we analyzed the clicks, retweets, and favorites from each group. The pictures group was the clear winner, outperforming text tweets in every category:

  • 18 percent more clicks
  • 89 percent more favorites
  • 150 percent more retweets

We’ve continued to focus on pictures in our Twitter shares, and we typically have around a 70/30 split of picture posts to text posts. The visual emphasis continues to pay off.

How we A/B test our Friday publishing schedule

Content on the Buffer blog is typically new every Monday through Thursday, four days a week. We’ve found these to be the most active days on the site and that Fridays tend to be less busy. But we wanted to be sure.

So we tested just how much traction we could gain from Fridays by posting a few times on that day and comparing the stats to traffic on other days.

Our most recent test came on Friday, March 21, when we published a guest post from James Clear about the power of imperfect starts.

The test was to see a) how Friday, March 21, compared in traffic to other Fridays where we did not post new content and b) how this particular Friday post compared in traffic to other posts on weekdays.

Friday, March 21, traffic vs. other Fridays

On a typical Friday at the Buffer blog, we average 28,000 visits. On our A/B test Friday, we had 30,796 visits—an 8 percent increase over our average. Not bad! It was a larger bump than we expected, and gave us optimism that we shouldn’t completely write off Friday posts.

Was it a sure sign that Fridays are worth publishing? Not necessarily. As you can see, there was also a nearly 8 percent jump over average back on February 21, and we didn’t run an A/B test or post that day.

Friday Buffer blog traffic

 

Friday blog post vs. weekday blog posts

The comparison to weekday blog posts told quite a different story for the value of Friday content. Our test post was the 17th-most popular post from the past 30 days, surpassed in traffic by 16 others (which equals just about the entire month, give or take a few of the most recently published articles).

Knowing this, we might hypothesize that an extra 2,000 or 3,000 people  might come to the site on Fridays to see the new content (and thus bump up the Friday average, relative to other Fridays) but overall, new content on Fridays  get the same viewership as our weekday articles.

How we A/B test with Hellobar

You may have noticed the orange bar at the top of our blog. That’s the Hellobar, and it has been a fun experiment in optimizing for conversions.

We’ve A/B tested a number of variables with the HelloBar to see which worked best. Predominately these tests include copy and color. Do visitors prefer an elevator pitch or a call to action? Do they prefer green or orange?

Here are some stats:

  • Green bar with the text “Buffer is the easiest way to manage social media accounts.” — 0.4% conversion rate
  • Green bar with the text “Enjoyed reading this article? Share it at the best time on Twitter, Facebook” — 0.6% conversion rate
  • Orange bar with the text “Easily post to multiple social media accounts” — 0.6% conversion rate
  • Orange bar with the text “Buffer is the easiest way to publish on Social Media” — 0.7% conversion rate
Hellobar tests

So far, the data on this experiment is leaning toward the orange bar and the “Buffer is …” text. We’re continuing to see how these experiments go, so you may see the bar keep changing over the next few weeks.

The conversion trinity, form friction, and other areas to A/B test

There is a litany of ways to A/B test on your website, and it’s always interesting to see the crazy success stories from small or off-the-wall changes.

Mad Libs form A/B test

The takeaway here is that you can A/B test just about anything.

So where might you begin?

Search Engine Watch offers some great advice on a few important areas to consider: friction, anxiety, and the conversion trinity.

Cut down on friction for your visitors

Common friction elements include form fields, steps in a lengthy process, and page length. Any of these can be difficult to endure for a visitor to your website, so the more you can optimize with an A/B test, the better off your conversions will be in the long run. You might consider testing a form with fewer fields or trying a multi-page versus a one-page signup process.

Avoid information-entering anxiety 

Anxiety is created when people aren’t sure if they are going to be rewarded for all of their work.

With this in mind, think about your checkout process or surveys or subscription forms. Visitors will bail if they get anxious about the value of their time on your site. This doesn’t mean to abandon all forms or checkout steps but rather to A/B test different copy and design to ensure the visitor feels confident they’ll get what they’re after.

Pay attention to the conversion trinity

  • Relevance
  • Value
  • Call to action

Visitors will seek relevance on a landing page, making sure your site fits their wants or needs. Likewise, the value proposition for your product or service will need to show the right solutions and benefits. And don’t forget the call-to-action; it should be crystal clear so the visitor knows exactly what to do.

What do you A/B test on your website?

We always love hearing stories of experimentation and improvement. Do you have a good one to share? Let us know in the comments which elements you test on your website or blog.

P.S. If you liked this post, you might also like A Scientific Guide to Writing Great Headlines and How We Increased Landing Page Conversions by 16%

Image credits: TheBusyBrain via photopin cc, Unbounce, Lukew.

  • http://dbproductreview.tumblr.com/ DB Product Review
  • Alex Debecker

    Great article. I’m in the process of planning some A/B Testing myself.

    One question semi-related to your post. Which metrics do you give the most importance to when you A/B test your headlines on Twitter?

    There is so much choice (favorites, retweets, clicks and mentions). In your example above you actually show that you picked the headline that drove the less clicks. I’d love for you to expand on that :)

    Thanks!

    • http://www.briangerald.com/ Brian Gerald

      Co-sign on this.

    • http://blog.bufferapp.com Kevan

      That’s such a great question, Alex! I can see from the post that I could have been clearer here. :)

      In my experience, we value each of the metrics when we’re testing headlines as they all provide some level of information into what people are interested to read. When the majority of stats skew in one direction, we go with that headline.

      But I know you’re probably interested in a more specific answer, too, so I will say this: We find retweets to be super valuable in gauging a headline’s appeal to a mass audience because the person retweeting is effectively vouching for the quality of the headline by sharing with all his/her followers. Clicks tell us that the headline works on a more individual level. Basically, it’s a matter of a headline that’s good to share vs. a headline that’s good to read. Hope this clears things up a little and doesn’t muddy it further. :) I’d love to hear your thoughts, too.

      • Alex Debecker

        Thanks a lot for you reply!

        I agree. I think favorites can kinda be disregarded (although they make us happy). Retweets show love for the headline, mostly because I believe a lot of people retweet without reading the content – just because they like the headline.

  • http://www.brand.com/blog James R. Halloran

    “Seriously, a kitten just died because of you.”

    Haha! That’s exactly how I think when something doesn’t work for us here. Thanks for sharing how you guys do this! I’m always curious how you guys are always on top. (But now I see why.)

    I definitely agree with you about using images in tweets! I think a good strategy would be to use images for headlines that are on the mild side. A strong headline alone may get by without one, but it never hurts to have a corresponding image if you can find one.

    • http://blog.bufferapp.com Kevan

      I like your thinking here, James! Iffy headlines could certainly use a boost with images. I often find myself looking at images before I check the text of a tweet. :)

  • CaptainPingu

    I was hoping for some concrete advice here but I’v come away very little the wiser.

    Eg headlines. HOW do you determine which different types to use? Your description that you “decide among the marketing team which ones we like best” seems like a shot gun approach.

    And the process itself. You say you test different headlines on twitter. But HOW do you achieve this? ie what is the process to post two different headlines? Are you literally posting one in the morning and one in the afternoon and seeing which one gets clicked on? Or is there a system that makes half of your users see one headline, and half another?

    No offence meant. Just trying to learn :)

    • http://blog.bufferapp.com Kevan

      So glad you asked this question! No offense taken at all. :)

      I definitely could have been more specific here. So our process for testing headlines begins with me (the author) sending around 5 to 10 alternative headlines in an email to our team. That results in some discussion based on what we believe will work best, based on past experience and what we’ve researched.

      Once we reach a consensus on a few headlines, those are then used when we Buffer the post to Twitter. Overall, we post 14 times per day via Buffer, so three of those slots are taken by our headline tests. Test 1 and Test 2 typically are separated by one other post (so we’re maintaining some variety on the feed) and then Test 3 goes out a few tweets later. We’re looking for retweets and clicks or a decisive number of data points that lean a certain way.

      How is this description for you? I’d be happy to elaborate on anything else if you’d like. :)

  • http://www.lukethomas.com/ Luke Thomas

    Your example of A/B testing tweets doesn’t make sense.

    You illustrated with 2 variations, one which received the most “engagements” on Twitter, and another (the losing variation) received MORE clicks than the “winner.” IMO, the winner is the one that gains more click-throughs. Isn’t that the goal? More visits to the website?

    Also, A/B testing tweets at different times in the day introduces the time of day as a variable, which muddles the data.

    • http://blog.bufferapp.com Kevan

      Really glad you brought up these points, Luke! The clickthroughs vs. engagements data is an interesting one to discuss. You’re absolutely right about the value of clickthroughs. We also use clickthroughs as an indicator of headline success – in the hashtag example, there was just more overwhelming data (retweets, faves, engagement) that swung our decision in a certain direction. I definitely see how I could have been clearer on this in the post. Thanks seeking clarification on this.

      You’re right on about variables, too! Not only does time of day change things but also the image that we share with the headline. Social is not exactly the perfect testing ground for A/B. We’re looking for any edge we can get with headlines.

      Let me know if I can elaborate on anything else. Really glad to have the discussion here. :)

  • Mike

    Hey Kevan,
    Great post like always, full of knowledge and real case scenarios.
    Keep up the good work!

    • http://blog.bufferapp.com Kevan

      Awesome, Mike! So glad you enjoyed it. :)

  • http://mattragland.com/ Matt Ragland

    Kevan, thanks again for an awesome post, here’s another info-sketch in your honor!

    Also, I agree it would be great to hear more on the Buffer team’s headline generation sessions. I’m sure it would be helpful for everyone :)

    There’s a great episode of This American Life where The Onion team of writers did their headline pitches, it’s fantastic (http://www.thisamericanlife.org/radio-archives/episode/348/tough-room). Thanks again!

  • http://www.tone.co.uk/ Anthony

    Thorough and actionable advice as always Kevan! So you like hearing stories of experimentation and improvement? Well, I published an A/B test case study yesterday with interesting results. In short, we hypothesised that by removing all references to pricing, signups to our free beta product would increase (eventually it would be a paid for product). Our hypothesis was proven correct with a 31% increase BUT on closer inspection I noticed that we included the word “free” in the challenger version. This could have easily influenced the increase in signups just like in the example you give above from shopify “”Two magical words near a signup button (“It’s free”) increase conversions by 28 percent”. Would be interesting to get your thoughts on the case study and the results – http://www.tone.co.uk/removing-pricing-improves-signup-conversion-rates/ Do you think the word “free” had any influence?

  • http://hubskills.com/ Partha Bhattacharya

    Great post Kevan. I’ve a small query. You have the sharing buttons at the bottom of the posts, and not on the side or at the top. Is that too a result of A/B test?

    • http://blog.bufferapp.com Kevan

      Great question, Partha! The sharing buttons changed before I joined on at Buffer, but I do believe there was some experimentation/testing behind the change. From what I understand, sharing buttons are often more helpful at the bottom of posts as readers will have hopefully read the whole thing and can better decide if it’s worth sharing or not. How does that explanation feel to you? I think it’s a fun theory (although I imagine people share without reading, too). :)

      • http://hubskills.com/ Partha Bhattacharya

        Thanks for your views, and I do agree readers would find it convenient to share an article after reading a post. Honestly though I find it a bit confusing since many people suggest having the buttons at the top and/or on the side. And yes… people do share without reading (I know because I do it sometimes).

  • http://dbproductreview.tumblr.com/ DB Product Review

    A/B testing is a simple way to test changes to your page against the current design and determine which ones produce positive results DB Product Review

  • http://www.drwebsitesuk.com mark Zub

    Really nice post. i really like it. it has better understanding. thumb up to you. keep up the good work.

  • http://spinsucks.com Gini Dietrich

    I have a technical – and very tactical – question about this. Let’s say your headline has to be changed, based on your test. Do you change just the headline and leave the URL alone? Do you change the URL and do a redirect? Or how do you handle the URL with the headline name change?

    • Courtney Seiter

      Hey Gini! So great to see you stop by, and with an awesome question to boot! We try to create the URL in such a way that it can remain the same regardless of which test wins. For example,
      “20 WordPress Plugins You Can Install Today for Easier Sharing, Better Posting, and a More Powerful Blog” has this URL: “http://blog.bufferapp.com/best-wordpress-plugins.” So basically just the major keyword phrase in the URL and not much else to tie it to a specific headline. That’s worked pretty well for us. I’d love to hear from an SEO perspective whether there’s any downside to having a URL be quite different than a headline. Not sure about that. :)

      • http://spinsucks.com Gini Dietrich

        The URL can be different…it’s more about the keyword or phrase than the headline. So that makes perfect sense. Thank you!

  • Karen McCamy

    Excellent article…as usual! And, the technical details really clarify HOW to do it! I’m a freelancer (1-person business). It seems like it really needs a team of people to pull off all of this, as well as having tons of traffic/”followers” in order to make the A/B testing valid. I can see the value of running the test on social media, since you also get pretty much instant feedback, not to mention the “share” features…but I’m just wondering if there is any “benchmark” about how many followers you need to have on a practical basis for these tests to be valid…
    Thanks!

  • Oisin o slatarra

    Hi Kevin,
    Just wondering if you have written any articles on how to write a headline for a tweet. Thanks,
    Oisin