Building a Mailing List using Excel and Amazon Mechanical Turk

Amazon mTurk and Excel or CSV files

Amazon Mechanical Turk LogoI was working with a friend on building a mailing list, but all he had was an Excel Spreadsheet with a column full of Business Names and a need for more information. The question was how do we quickly build out a list of qualified customers without spending tons of money actually buying a list? In addition to Business Name we needed, Address, City, State, Zip Code, Email, Phone Number, Contact Name, and Website Address.

We decided to use Amazon Mechanical Turk and Human Intelligence Tasks (HITS). AKA mTurk for tasks. So consider this my comprehensive mTurk Howto. We got our completed mailing list in less than 20 hours by offering 10 cents for every entry. This doesn’t seem like a lot but when you are doing a list of thousands of records, it’s not bad for either the customer or the person doing the actual work.

Using mTurk is not entirely self explanatory and sometimes counterintuitive. We are used to uploading a CSV file and then defining the other fields we need populated. You must first disassociate the spreadsheet you have from the data you need. In the end, you will only provide the data you already have, not the data you will need.

I am going to create fake mTurk HITS so that you can see how this all works. Screenshots and all. I will use a list of 10 dog groomers in West Hills, Canoga Park, Chatsworth, and Woodland Hills and ask for all of the additional fields to be provided for us.

Formatting Excel data for mTurk

The list you see is all the information I am going to need and use to start creating a task for workers on mTurk to complete. From Excel, I will save the file as a CSV file. I am going to define what else I need in order to complete my list on Amazon Mechanical Turk by creating a project. Visit http://www.mturk.com

Excel Spreadsheet Needed Columns for mTurk

Using the mTurk website to create a project

When you visit the mTurk site you will use your existing Amazon login to access the requestor option.

Under Create, choose New Project, and then Data Collection. We are going to start this project by using the template they have already provided.
Creating a Data Collection project in mTurk

Choose Create Project. Name your new Project and fill out a brief explanation of your HIT.

Describing your HIT to workers mTurk

Setting up your HIT or how you will pay per assignments

How you are going to pay for this work or the Setting up your HIT section really could use a revamp as I find it confusing as hell. However, after creating a job and then editing it 10 times, it finally made sense.

In the screenshot below, you can see Reward per assignment. I have 10 individual assignments that I am going to pay 10 cents each for $1.00 total. The Number of assignments per HIT is 1. I could say I want each tasks to be worked on by 5 different people to ensure my data is accurate but 1 worker uniquely working on my assignment is perfectly fine. The time allotted per assignment is how long I think it will take to complete the assignment or task. I want to run this for 3 days before it expires. I need it done today so I can write this article, but we shall see if I get my results earlier. I think it will. Lastly, not included in this screenshot is the part where you can say you need qualified workers. In my opinion, being qualified means you can set how many accepted assignments someone has done before you let them work on your project among other “qualifications”. If you want, you can get workers who are very good and have a ton of accepted HITS but that also slows down your project by limiting it to established workers. If the work is not good you can always reject it later in the process for others to work on. This is simple and we really don’t care about the quality of the data so I am saying let everyone do it.

Setting up your HIT in mTurk

mTurk form creation or design layout to collect data

mTurk Design Layout is the most intimidating part of this whole process because we are still thinking about how to get our spreadsheet uploaded and additional fields defined. We should be doing that now, right? Don’t worry, this is so simple you won’t have any problems. We are going to define what we want on this screen and then we are going to provide workers with the information we already have. You can download a template I am using from here and tidy it up a bit. It’s basically just HTML and should be pretty simple to figure out. If you have questions about the template, leave a comment.

mTurk Design Layout CSV definition

You can simply edit the text of the template. The screenshot above shows the template. I am going to define it for my use case. I am going to define any field in my form by using the column name from my CSV/Excel spreadsheet so that when a customer accesses this HIT, they will automagically have that information populated in their view. I have added some other special rules or instructions for workers to my form. I have to accommodate a global public who might not understand how US mailing addresses work.  I still have some editing to do but this was particularly confusing to me so I wanted to give a special section to explain.

Spreadsheet Columns in mTurk

I am going to round out the form a little bit more and define the fields I am trying to capture for these businesses.

Click the source button at the top and identify/change/add to this line. Even if you don’t know basic HTML you can simply copy this bit of code, change a couple of the text strings, copy this line and paste it under the next to define your fields to collect. The id/name specified in your code will be the column name you get when you download your completed list. The label is what will show up for that field when you are looking at the rendered form.

<div><label>Website:</label> <input class=”form-control” id=”web_url” name=”web_url” placeholder=”http://” required=”” type=”text” /></div>

Below is the finished product.

Creating your input forms in mTurk

I will now preview the form, it will show the variables or the text, ${Groomers} until I upload the data sheet after Step 3. Click Finish and you will be sent to a list of your projects. Your new one should be there.

Choose Publish Batch, and now finally we will upload our data from the CSV or Excel Spreadsheet with just the Groomers column.

uploading your csv or excel spreadsheet to mTurk

Submitting HITs for Workers to Complete

OMG, we have provided useful data that can be used and now previewed to start our project. Notice how the business name now shows where our Groomers variable was. We are ready to submit our tasks and get them done. The rest should be self explanatory to start your job. We will move on to acceptance, rejection, and finally completion.

Excel Spreadsheet Columns showing in mTurk

Acceptance and rejection of HITs.

Workers are starting to work on my assignments. Here is a view of the status. Click Results to see the information workers are gathering for the project.

Managing progress in Amazon mTurk

While viewing the results, you will need to approve or reject entries, I specifically asked that Website addresses not be included from Yelp or CitySearch, but this worker did anyways, I will now reject their work because they didn’t follow the directions. Those HITs will go back into the queue for others to complete.

Rejecting HITs in Amazon mTurk

Rejecting HITS with comments in mTurk

I will check back in shortly to find the status of my jobs throughout the day to check the results. Anything not rejected after the time you picked will be automatically approved so don’t be lazy. Check often. If they are correct, click the box and then choose Approve.

Completing your mTurk project

I will next go over how to complete your assignments once you have all of the information you need and everything looks good. After 6 hours I have 13 submissions, only 10 of them useful but the list should be done.

Completed Assignment in Amazon mTurk

We will now need to download our results by clicking Download CSV. Once you download the file you should open it in Excel or other spreadsheet program and start massaging the data to your liking. In my case I have to delete the columns inserted from mTurk to be useful.

Download your completed assignments in CSV from mTurk

I wish the junk data wasn’t there, but you can see it there. Delete those columns

Completed assignments in mTurk columns of junk

Here is what the finished data would look like. Because I didn’t care about some of the data being accurate, you can see some data is crap but you can “require” those fields in your project and not have workers leave them blank.

Completed assignments show in Amazon mTurk

 

That’s it. This is what I learned about using Amazon Mechanical Turk and I really hope that as people search for how to work with an Excel spreadsheet and mTurk find this article. The workflow didn’t make sense to me but now it looks and works exactly like I needed it to.

One thought on “Building a Mailing List using Excel and Amazon Mechanical Turk

Leave a Reply

Your email address will not be published. Required fields are marked *