journalism, personal

Getting started in data science: One journalist’s journey

Let me first say up front what you will not get in this blog post: A step-by-step guide to using whatever data tool you want. Using spreadsheets is beyond the scope of a blog post. Using R (and, I’d guess, using Python) is beyond the scope of the classes I’ve taken that purport to teach me how to use R.

What you will get is one person’s take on how to dip your toes into an ocean. I hope you’ll be able to get some advice on how to go from the occasional spreadsheet user to Data Journalism Deity and perhaps get some idea of where to go next. If this is the first thing you read about data science or data journalism, fine — I’m assuming no prior knowledge.

First: Reconsider what you mean by “education.” Seriously.

Remember back in college when you knew a bunch of annoying dudes who had figured they just needed to make a lot of great contacts in school, and the classes themselves were secondary? While you busted your butt studying and working, they were shaking hands and drinking beer?

I’m not going to say they were right. But they were on the right track. Here’s why:

Working with data is less about learning how to do it and more about learning where to ask.

Don’t believe me? Ask Vik Paruchuri, who made the liberal arts-to-data leap himself and has this to say about it:

data-problem

Check out his whole video. It’s 32 minutes, but you can skip the first couple of minutes because he took it from a Google Hangout and spent the first bit of it waiting around. (Expert on data science but doesn’t edit video in YouTube? Knowledge is specialized.)

He devoted a year or so to learning data science. But he also just jumped in. He started doing projects (because you learn by doing in this field) and going to meetups, all before he knew much code.

On the other hand, here’s how *I* did it:

I signed up at Coursera, an online-learning hub, for a nine-course series offered by Johns Hopkins University. I figured I would plow through the courses and get a spiffy certificate at the end, proving to myself and everyone else that I know my way around R (the en-vogue data programming language today) and everything else in the world of data.

Around the 13:30 mark of Paruchuri’s video, he says MOOCs (like Coursera’s content) are not the best way to learn. But by the time I watched his video, I had already gone past the “no refund” part of the Hopkins specialization. Oops.

That’s not to say I’ve wasted my time and money. Check that: I do think I’ve wasted quite a bit of time trying to pass quizzes that I really didn’t need to pass.

In retrospect, I wish I knew there are two ways to approach the Hopkins specialization:

  1. Make this your life, as if you were a full-time student, particularly if you don’t have a ton of prior programming experience or stats background. (The Pascal I learned in college and the JavaScript I learned 20 years ago weren’t enough. In stats, I’m comfortable talking about medians, means and even standard deviations, but I have little idea what a “linear regression” even means.) You’ll finish up with a certificate that might get you employed somewhere.
  2. Browse. Learn what you want. Attempt a few quizzes, but feel free to bail.

The seductive part of data science is that it seems so accessible. It seems like everyone’s doing it, from political bloggers breaking down government data to 14-year-old fantasy football wizards. But in reality, they’re just doing a small part of data science. When you start digging around and finding powerful data applications, you’ll find they’ve been developed by people with “PhD” in their LinkedIn profiles, not “BA in philosophy and music.”

Consider a music analogy. As Radiohead sang, anyone can play guitar. It might be a high school kid figuring out Rush songs (like me, many years ago) or your friend’s dad who suddenly whips out an old acoustic and plays Classical Gas. But how many people do you know who can sight-read just about anything on piano? Or teach band in an elementary school, helping kids learn every woodwind and brass instrument?

You don’t go to Berklee for four years to learn how to play Purple Haze or even to write your own guitar riffs. So why would you work your way through everything in the Hopkins data specialization to learn a few tools to use in journalism?

The funny thing here: The quizzes in the Hopkins sequence helped teach me that lesson and the importance of knowing where to look for the answers. Those quizzes — at least, once you get past the simple multiple-choice stuff in the intro class — are programming assignments. And the classes don’t teach you how to do them.

Kind of a weird way to approach teaching, isn’t it? And very frustrating if you, like me, don’t know what you’re getting into.

To pass the quizzes, you have to look around the Web for help. You may quickly find that the regulars at StackOverflow, an impressive online forum for sharing programming tips, are getting sick of answering questions from people who are stuck on the Hopkins programming assignments. But you can often find a couple of things that help.

The course itself has an online forum that substitutes for the interaction you’d have the teacher if you were taking this class in person. But they can only give you general tips, not answers. You click an honor-code pledge with every submission, just like we did at Athens Academy. (All together now: “I have neither given nor received any aid on this work, nor have I observed any infraction of our Honor System.” One kid made a rubber stamp with all those words to speed along his test-taking.)

The forum is manned by mentors who have survived the class already. And the general message is to get used to “hacking.” Get out on StackOverflow and other sites, then figure it out. Because that’s what you’ll be doing in the real world.

“Sure,” you may say, “but what am I paying for?” You’re really paying for the lectures, a nifty set of online tutorials, and a basic intro to some of the tools you need, like RStudio (a bit like Notepad with a whole lot of tools to help with your code) and Github (a sharing site). And if you have hours upon hours — other students have reported spending months on quizzes with an estimated time of “30 minutes” or so — you may be able to plow your way through and get the specialization.

At some point — and I’m writing this so you’ll do it before you take the course rather than partway into it like I did — you have to stop and ask what you really want to accomplish. Even if you want a full-time data job, there are so many different ones. Data scientist? Data engineer? Data journalist?

panther

You’re probably better off playing around with online data tools first, and then signing up for a course. That’s true whether you’re just looking to supplement your knowledge and skillset (like me) or going become a Full-Time Data Science Person (like Paruchuri).

One example: Paruchuri says 90 percent of the work is data “cleaning” (if you’ve ever seen a spreadsheet in which some entries say “Miscellaneous” and some say “Misc,” you get the idea). You could use R for that. It’s powerful. Or you could use a former Google tool called OpenRefine. Knowing a bit of programming logic may help with that, but it’s not as intense as learning complex operations in R.

So now that I’ve spent four months learning what I can, I’ve managed to define my goals.

First, what do I want to do? 

  1. Find an efficient way to do Olympic medal projections. I’ve used spreadsheets to track past results and use a few formulas to do them in the past, but it’s safe to say I spent far too much time gathering and processing data.
  2. Learn enough to try other projects on my own, perhaps a survey of North American curling clubs, for example.
  3. Learn enough to tell a potential part-time or full-time employer that I might not be a full-fledged data scientist, but I know the tools and have a good sense of what’s feasible.

Now bear in mind everything else I want to do in the next 2-3 years:

  1. Continue writing epic soccer pieces and other content for The Guardian.
  2. Finish retooling parts of my unpublished MMA book into a series of posts at Bloody Elbow.
  3. Finish retooling the other parts of that book into a small self-published book.
  4. Write another book on youth soccer.
  5. Write a bit more for FourFourTwo and OZY.
  6. Maybe find a steady outlet for Olympic-sports content (which could include a lot of data work).
  7. Maybe start working for a nonprofit (maybe even with data).
  8. Maybe even start the definitive book (or multimedia project) on creativity.

I’m not including high priorities like “be a good parent” or even low but unavoidable priorities like “mow the danged lawn.”

So from a data perspective, here’s what I should be able to do:

  1. Understand what I’m looking at when I check Kaggle, which turns data-science sharing into fun things like a March Madness contest.
  2. Navigate github.
  3. Use OpenRefine and any other good web tools I can find.
  4. Scrape data from reputable sources.
  5. Present the output in some coherent and engaging form.

I’ll pick my way through the rest of the Hopkins courses. I’ve also enrolled in a cost-friendly course at Udemy, which I started taking so I could figure out enough to pass the R programming course at Hopkins. (I passed two. The rest? You may consider me an auditor.)

And then I’ll just explore, like I did when I was figuring out Rush songs on my guitar. (Hmmm … can I process songs in R?)

Uncategorized

Google not playing nice (or, don’t drop your Kindle)

You may have arrived at this post by searching for ways to set up your Kindle Fire so your Gmail, Google Calendar and related contacts flow seamlessly into your wonderful device.

I have good news and bad news. The good news is that you can just read this post rather than doing what I just did — spend two hours perusing outdated info on how to make Gmail, your calendar and your contacts sync with ease.

I have to give a bit of background. This is my second Kindle Fire. The first fell out of my hand. The case opened up, and it landed screen-first. That’s not good.

Amazon’s customer service, though, is wonderful. They’ve made the replacement as simple as can be. And when you open your new Kindle Fire, you’ll find so many things automatically syncing with your old one. Even some game data. (Though NOT in Bloons TD 5, unfortunately.)

But I couldn’t get my Google apps to sync up as easily as they did before. On my old Kindle, once I got Gmail set up (easily), then the calendar and contacts appeared as well.

I found one work-around to help me get Gmail at the very least. Instead of typing (username)(at)gmail.com, type (username)(at)googlemail.com. That works. But you won’t get calendar or contact info.

So does a quick session of chatting with Amazon’s helpful customer service people. You may take a longer route to get Gmail set up, but it still works.

I went through that process because I was still hoping to get my calendar and contacts. And the tech support person thought it would work.

Nope. Here’s why:

Google makes Gmail sync harder on rival platforms by dropping Exchange ActiveSync for consumers | The Verge.

So that’s a bummer. I suppose I can still get to it through the Web browser, but it just looks like another case of Google violating its “Don’t be evil” dictum. (Another one: Search results are getting less and less helpful. Go ahead — try to get the most recent information on a given topic. They would be driving people to Bing — if Bing were any better.)

But in any case, I’m posting this info in the hopes that someone searching for “kindle fire gmail calendar” or something like that will see that this is recent info.

For now. In three months, it might be outdated. We can hope, right?

So I hope I’ve saved you from a wild goose chase. And I hope you don’t drop your Kindle.

personal, web

How I ditched the smart phone

I’m officially a former Blackberry user. I fell in love with the devices while covering the Olympics, where USA TODAY would set us all up so we could communicate from everywhere. Snapping pictures and Tweeting from Beijing was a new and wonderful experience.

My older cell phones had their charms. But without a keyboard, texting was virtually impossible. The tiny screens weren’t good for the “mobile Web.” And so I was thrilled to get a Blackberry — emailing, Tweeting and using Facebook any time I was otherwise idled.

But my Blackberry also had annoying habit of freezing at inopportune times. It was a decent email device but a terrible phone. And obviously, the iPhone and Android phones had overtaken my old semi-reliable companion.

Then Verizon introduced its “Share” plans. Unlimited talk and text! And data plans that were … ridiculously expensive!

The idea is to capitalize on the masses’ demand for smart phones and tablets, apparently so we can go out into the woods and watch silly videos on YouTube or something like that. They’re not even making a pretense of productivity any more. If you can’t share your kids’ origami project in 2.3 seconds over a 4G network … well … you’re just lame.

Hi. I’m lame.

I noticed that the “basic phone” today is not the same keyboard-less wonder I had in 2004. Today’s “basic phones” have touch screens. Wide screens. Slide-out keyboards.

So when Verizon called and told me I could get a discounted “basic phone” and a reasonable data plan, I made the call. Blackberry out. Brightside (Samsung) in.

So far, it’s a little disappointing. The ad copy made it seem that I could do everything I was doing with the Blackberry — email, Twitter, Facebook and occasionally GPS. Sort of.

The GPS is the real aggravation. Like the old days, Verizon offers voice navigation — for a fee. If you’re thinking you’ll just get around that by using Google Maps, think again. Go to Google Maps through the ever-clumsy Opera Mini browser, and you’re prompted to download an app. Then the phone won’t support a download.

And that little hitch prevents me from doing a lot with Twitter and Facebook as well. The app market for non-smart phones has basically died. I was able to get more apps back in the old days — even had a playable EA Tiger Woods golf game. Today, I can’t even find Freecell.

The upside: You can still do a lot through texting, and now that I have unlimited texts, that’s a viable way to keep up with Twitter if I’m not at a laptop.

But the bottom line is this: What do I really need? I need email and phone. These days, I need more texting as well. I can reach a lot of sources that way. Parents can contact me if their kids are running late for soccer.

The biggest difference between my new phone and my 2004 phone: I can easily text and keep up with email. Easily. It runs Gmail with no hitches at all.

I don’t need games. I don’t need Facebook. I only occasionally need Twitter. I never go on long hikes and stop to watch Hulu. And so I really can’t justify the expense of a smart phone.

Now if someone wants to get me a Kindle Fire for Christmas …