Archive for the ‘Ab testing’ Category

Part 3 – Presentation of tests

Wednesday, November 25th, 2009

In part 1 we looked at what is involved with allocating different users to different groups. The next stage after allocation is to actually present the test to the user.

There are two basic choices when it comes to presenting the interface changes, and that is client or server side. Generally, if you work with a server side scripting technology such as asp.net or php, you are likely to harness this to load the ab testing data from your database and conditionally present the different versions of the site to the different groups. A sample client-side implementation is optimost, where javascript is used to do the tracking and showing of different versions. Its generally a lot less powerful as you are limited to showing/hiding html rather than on the server where you have complete control over the presentation.

Test storage is usually done with a database, so you would register all your tests in a database. When the page code runs, it looks in the database to see if any tests are assigned to this page type, and returns them all. Now the page knows about the tests that apply to it, individual tests that are coded on the pages can check whether the test is running or not, and whether this user will get the A or the B or the C etc. The code then executes to modify the interface for the user. We then need to gather the key information about that user, which is where your tracking comes in. Traffic tracking is quite a large area on its own, but for the purposes of this, we just need to be aware that the following things are tracked;

Which version the customer saw
Whether the customer converted or not, and how much they spent.

The analysis script will later crunch all this data to find out which test is winning.

There are often dilution effects to consider as well. For instance if some functionality will only be shown for certain product types, yet the test is registering across all users, then the results are diluted. Also, it may be that the test is only shown within an ajax piece on the page, which only a certain proportion of users will actually use. Dilution is not a huge problem, if you can reduce it, it certainly helps.

The next phase we will look at is the reporting and analysis side. We will be looking at power functions, which are used to calculate the sample size we need to test again, a few difference confidence calculation formulas, along with charting and decision making. This is probably the hardest part of ab testing as its a subjective area, and we will soon be there.

Persuasion Architecture

Friday, November 13th, 2009

Persuasion Architecture is a term describing a method of content targeting to appeal to different personas.  The idea is to break down your customers into basic groupings.

  • Spontaneous
  • Humanistic
  • Competitive
  • Methodical

As well as this you determine where they are in the buying cycle;

  • Just browsing
  • Want a product but not sure which
  • Know exactly what they want

Knowing that all your customers can be categorised these ways you then plan content tests to target them, so that you appeal to a broad section of your audience. You must answer the questions from everyone in your target audience if you want to maximise potential conversion rate.

Part 1 – Ab testing – deciding what to test

Thursday, October 8th, 2009
Now the hard part. What to test?  Your prioritisation method should consider 2 main variables: the implementation difficulty and the expected gains.  You have to accept that you are not going to be right most of the time, or even much of the time. Prepare for your customers to turn all your ideas on their heads.  Knowing this, then you have to given the time to implement the highest weighting when considering which test to run first.
Initially, its definitely best to go with small changes. Changing font sizes or bolding to highlight call to actions. Find out what colours work best for what type of content. Red for discounts? Blue for CTA buttons? It depends on how it contrasts with your sites colour scheme, but there will be optimum settings for your site.  This is level 1 – modifying the formatting of existing content.  Tests such as removing images would also fall under this category.
You can also test changes in text, try to use less technical language and talk to your customers in the language that they understand. Instead of ‘change your requirements’ you might say ’start over’. We can call this level 2 – modifying text for short sentences.
Next are larger content changes, where you are changing paragraphs.  Almost every content area on your site should be considered as a candidate for testing.  Any links within the content need to be thought out carefully, and consider removal as C or D variants. Mouseovers and popup text as well should all be tested as these can have great impact on conversion.  Rewrite the content for different audiences – try to appeal to technical types as well as humanistic, competitive, spontaneous types. This is the essence of personalisation and it is the most advanced stage of testing. I will cover this in a seperate article. These are level 3 changes.
The next level of change are functional changes. You might change the options in a drop down box, or the layout of a dynamic piece of content. This is where our AB testing custom system shines, as we have programmatic control over all the elements in a page, so our change is not restricted to one area of the page, or restricted in any way whatsoever.  We can change functionality across the whole page and consider it a single test. These are the most costly to implement. These are level 4.
Finally we can talk about multi-page tests. Again the 3rd party packages cannot deal with this, as they dont understand the concept of a page type, and require an understanding of the structure of your site.  If you have a shopping site, and all the product pages have the same test applied to them, the test needs to register for all those pages, which we call page types. Some tests may require that more than one page type is updated, so the product pages, and the shopping basket page may be affected by a promotion test.  With the custom AB testing system you are able to span tests across multiple pages, and see the overall impact they are having.
With everything you can test removal as a C version, as a general rule is, if it isnt helping conversion, its detracting from it. Less is more, and every element of your site should have a purpose.  What you must remember is that changing a font colour can have the same conversion impact as reworking a whole section of functionality so you must carefully consider which tests to run, given which tests are currently running on the site. You must try and keep your site constantly testing, so when few current tests are running you will put out easy tests to make sure the site is capturing data, and in that time develop the more complex functional tests. Level 2 and 3 tests should be running all the time, as all they require for setup are text changes.

Now the hard part. What to test?  Your prioritisation method should consider 2 main variables: the implementation difficulty and the expected gains.  You have to accept that you are not going to be right most of the time, or even much of the time. Prepare for your customers to turn all your ideas on their heads.  Knowing this, then you have to given the time to implement the highest weighting when considering which test to run first.

Initially, its definitely best to go with small changes. Changing font sizes or bolding to highlight call to actions. Find out what colours work best for what type of content. Red for discounts? Blue for CTA buttons? It depends on how it contrasts with your sites colour scheme, but there will be optimum settings for your site.  This is level 1 – modifying the formatting of existing content.  Tests such as removing images would also fall under this category.

You can also test changes in text, try to use less technical language and talk to your customers in the language that they understand. Instead of ‘change your requirements’ you might say ’start over’. We can call this level 2 – modifying text for short sentences.

Next are larger content changes, where you are changing paragraphs.  Almost every content area on your site should be considered as a candidate for testing.  Any links within the content need to be thought out carefully, and consider removal as C or D variants. Mouseovers and popup text as well should all be tested as these can have great impact on conversion.  Rewrite the content for different audiences – try to appeal to technical types as well as humanistic, competitive, spontaneous types. This is the essence of personalisation and it is the most advanced stage of testing. I will cover this in a seperate article. These are level 3 changes.

The next level of change are functional changes. You might change the options in a drop down box, or the layout of a dynamic piece of content. This is where our AB testing custom system shines, as we have programmatic control over all the elements in a page, so our change is not restricted to one area of the page, or restricted in any way whatsoever.  We can change functionality across the whole page and consider it a single test. These are the most costly to implement. These are level 4.

Finally we can talk about multi-page tests. Again the 3rd party packages cannot deal with this, as they dont understand the concept of a page type, and require an understanding of the structure of your site.  If you have a shopping site, and all the product pages have the same test applied to them, the test needs to register for all those pages, which we call page types. Some tests may require that more than one page type is updated, so the product pages, and the shopping basket page may be affected by a promotion test.  With the custom AB testing system you are able to span tests across multiple pages, and see the overall impact they are having.

With everything you can test removal as a C version, as a general rule is, if it isnt helping conversion, its detracting from it. Less is more, and every element of your site should have a purpose.  What you must remember is that changing a font colour can have the same conversion impact as reworking a whole section of functionality so you must carefully consider which tests to run, given which tests are currently running on the site. You must try and keep your site constantly testing, so when few current tests are running you will put out easy tests to make sure the site is capturing data, and in that time develop the more complex functional tests. Level 2 and 3 tests should be running all the time, as all they require for setup are text changes.

AB testing – a DIY approach. Part 2 – Test Allocation, who gets what?

Wednesday, October 7th, 2009
AB testing – a DIY approach. Part 1 – Test Allocation, who gets what?
AB testing is a complex and difficult beast. There are a few commercial offerings, the most notable of which is Optimost, which was bought out by google last year and integrated into the AdWords site. They have been pushing the agenda on this, because of course better conversion for their customers, means more PPC spending, so in the end google themselves stand to reap the rewards.  I am not going to discuss these packages, but i am going to give an overview of how to build your own AB testing framework, as i did exactly this for a large, but cannot-be-mentioned dotcom.
This is a really big subject area, and this is just an introduction, so lets break it down into the following areas;
1. Collecting Data
2. Presenting Tests
3. Analysing Results
Collecting data at its simplest level, is just registering which customers did what. Generally the KPI that is most interesting (due to its low variance, unlike other KPIs such as avg margin/unique) is conversion. This is simply the number of conversions made divided by the number of unique visitors that visited the site. For AB testing, when there are only A and B groups, you would calculate these seperately for each group.
Example;
‘A’ group had 445 unique visitors, and 34 conversions. It has a conversion rate of 34/445 which is 7.6%.
‘B’ group had 421 unique visitors, and 29 conversions. It has  a conversion rate of 29/421 which is 6.8%.
Was there a significant difference? This will be covered in the confidence calculations articles
For the number of people required to get confidence, this is less about people than it is about conversions. We aimed to get 10k conversions per group to obtain significant results. Of course if the test is a bigger dial mover then it will happen much quicker.  The further down the funnel your test page is, the quicker you will get your results. If the test is on your landing page where you convert, then 100% of the visitors much hit this page to convert. If its on a side information page, then not all must hit to convert, and so it will take longer to get results, as you must only count the people that hit this page and did or did not convert.
The key part about the data collection is that you divide your users equally between the groups. This can be done using IP, a cookie or at the server level. It is of utmost importance to make sure that your allocation method has these properties;
- Repeatable at the visitor level, so every time they revisit the site during the period of testing they will see the same test version.
- Allocates evenly between groups.
We found the cookie based allocation method was best of all. The reason being that ;
IP’s in parts of the world cluster around large proxies, so a lot of users from certain countries get placed in A or B (or C or D etc), and as users from different countries generally convert at reliably different levels, this puts a bias on the test. Certain regions, such as europe, dont suffer from this problem as much.
Traffic diverting has the problem that it depends on the servers performance. Not all servers perform equally, and actually managing the allocation is not straightforward as it involves dealing with load balancers etc.
Cookie based allocation turn out to be beautiful. This is based on asp.net, where the cookie ID is actually a GUID (globally unique identifier). I did some analysis on generated GUIDs and they turned out to be almost perfectly random. With a test program I had written that produced a million new GUID’s per group, we had each group to within less than a hundred of each other.  However, the really really cool thing about GUIDs is that they are hexadecimal, and contain 32 hex characters. Like this :
28121de6-85cc-4aef-acbc-19c2e5cb57d3
Why is this so good? Well the reason is, you want to have more than one test running at once. You also want to have that being repeatable per person, and you want to be able to have more than just AB. Maybe ABCD or ABCDEFG and you are verging on multivariate possibilities.
So somewhere we have a matrix that says for this test ID, which slots belong to which groups.
01234567890ABCDE
A xxxxxxxx
B                  xxxxxxxx
C
D
E.. etc
So if the test ID is say 21.  We do a modulus 32 on this (this divides by 32 and returns the remainder). Which in this case gives us 21.
If we then take the 21st element of the GUID, we get ‘1′. The 1 then goes into our lookup table and tells us this is an A user. Thats it! Then all the presentation tier has to do is show the A version for that user.
Thats all for now, my hands are tired. We need to cover a lot more stuff, such as analysis, the presentation tier, and tons of other problems in this and other areas.  Go wild with the comments if you have questions i will do my best to answer them all.

Now for a break from building our software and to talk about something completely different – ab testing.

AB testing is a complex and difficult beast. There are a few commercial offerings, the most notable of which is Optimost, which was bought out by google last year and integrated into the AdWords site. They have been pushing the agenda on this, because of course better conversion for their customers, means more PPC spending, so in the end google themselves stand to reap the rewards.  I am not going to discuss these packages, but i am going to give an overview of how to build your own AB testing framework, as i did exactly this for a large, but cannot-be-mentioned dotcom.

This is a really big subject area, and this is just an introduction, so lets break it down into the following areas;

1. Collecting Data

2. Presenting Tests

3. Analysing Results

Collecting data at its simplest level, is just registering which customers did what. Generally the KPI that is most interesting (due to its low variance, unlike other KPIs such as avg margin/unique) is conversion. This is simply the number of conversions made divided by the number of unique visitors that visited the site. For AB testing, when there are only A and B groups, you would calculate these seperately for each group.

Example;

‘A’ group had 445 unique visitors, and 34 conversions. It has a conversion rate of 34/445 which is 7.6%.

‘B’ group had 421 unique visitors, and 29 conversions. It has  a conversion rate of 29/421 which is 6.8%.

Was there a significant difference? This will be covered in the confidence calculations articles

For the number of people required to get confidence, this is less about people than it is about conversions. We aimed to get 10k conversions per group to obtain significant results. Of course if the test is a bigger dial mover then it will happen much quicker.  The further down the funnel your test page is, the quicker you will get your results. If the test is on your landing page where you convert, then 100% of the visitors much hit this page to convert. If its on a side information page, then not all must hit to convert, and so it will take longer to get results, as you must only count the people that hit this page and did or did not convert.

The key part about the data collection is that you divide your users equally between the groups. This can be done using IP, a cookie or at the server level. It is of utmost importance to make sure that your allocation method has these properties;

- Repeatable at the visitor level, so every time they revisit the site during the period of testing they will see the same test version.

- Allocates evenly between groups.

We found the cookie based allocation method was best of all. The reason being that ;

IP’s in parts of the world cluster around large proxies, so a lot of users from certain countries get placed in A or B (or C or D etc), and as users from different countries generally convert at reliably different levels, this puts a bias on the test. Certain regions, such as europe, dont suffer from this problem as much.

Traffic diverting has the problem that it depends on the servers performance. Not all servers perform equally, and actually managing the allocation is not straightforward as it involves dealing with load balancers etc.

Cookie based allocation turn out to be beautiful. This is based on asp.net, where the cookie ID is actually a GUID (globally unique identifier). I did some analysis on generated GUIDs and they turned out to be almost perfectly random. With a test program I had written that produced a million new GUID’s per group, we had each group to within less than a hundred of each other.  However, the really really cool thing about GUIDs is that they are hexadecimal, and contain 32 hex characters. Like this :

28121de6-85cc-4aef-acbc-19c2e5cb57d3

Why is this so good? Well the reason is, you want to have more than one test running at once. You also want to have that being repeatable per person, and you want to be able to have more than just AB. Maybe ABCD or ABCDEFG and you are verging on multivariate possibilities (2 x 2 x 2 x 2 or 2 to the power 4, so thats 4 multivariate variables you could run simulataneously).

So somewhere we have a matrix that says for this test ID, which slots belong to which groups.

01234567890ABCDE

A xxxxxxxx

B                  xxxxxxxx

C

D

E.. etc

So if the test ID is say 21.  We do a modulus 32 on this (this divides by 32 and returns the remainder). Which in this case gives us 21, as 21 is less than 32.

If we then take the 21st element of the GUID, we get ‘1′. The 1 then goes into our lookup table and tells us this is an A user. Thats it! Then all the presentation tier has to do is show the A version for that user.

Thats all for now, my hands are tired. We need to cover a lot more stuff, such as analysis, the presentation tier, page types, and tons of other problems in this area, never mind all the other SEO stuff i have going on in my head. I made now headway into those areas yet,  maybe tomorrow or the day after.  Go wild with the comments if you have questions i will do my best to answer them all.