Open Sourcing Precautions

Making Your First Open Source Project Public

Last week, I recently open-sourced Teamline, my final group project for Flatiron. I’d created many public repos beforehand on my own account, and contributed to open source by creating the OmniAuth strategy for Jawbone, but this was the biggest project I’ve ever open-sourced. From that process, I learned about a few steps a beginner might have to take before open-sourcing his or her project. I’ve distilled that process, plus a few other things I’ve recently learned about open source and some thoughts I have on how to improve my project now that I’ve made it open.

Check out my repo here btw.

Step 1: Create an application.yml file

This is a local file that lives in your .gitignore and thus is not pushed up to GitHub. This is the place where you keep your secret keys and tokens for different third-party applications you might be authenticating with. If you already have one, you can use that.

Step 2: Hide your secret token

Rails uses a secret token to allow you to verify the integrity of signed cookies. Thus, for security reasons, you probably don’t want everyone to have the ability to know and use that token. It is found in config/initializers/secret_token.rb. If you’ve already open-sourced your project without taking these precautions to hide your secret token, you’ll probably want to rake secret to generate a new token. If you do that, you’ll need to restart your server. If you haven’t open-sourced your project yet, you can use your existing token, but you’ll want to move it to your application.yml file, and reference it in your secret_token.rb file as an environmental variable (which has scope throughout the whole application —> look at how rake tasks and the task: TASK_NAME => :environment do line to see how rake tasks require the whole application environment in order to have scope throughout the app). Create a line in your application.yml file that is something like APP_SECRET_TOKEN: "xxxxxxxxxxxxxxxxxxxxxxxxxxx" with the x’s being your secret token from the secret_token.rb file. In your secret_token.rb file, change the code to App::Application.config.secret_token = ENV['APP_SECRET_TOKEN'].

Step 3: Hide your database names

Exposing your database names opens you up to attack. Changing them from the default name Rails suggests will also help you heighten your security. Either way, you definitely don’t want them out in the open. In your config/database.yml file, in the development, test, and production sections, change the line that starts with “database” to something like database: <%= ENV['APP_DEVELOPMENT_DATABASE'] %>, database: <%= ENV['APP_TEST_DATABASE'] %>, and database: <%= ENV['APP_PRODUCTION_DATABASE'] %>. I tried it without using erb, and it didn’t work, so I recommend doing that, just to be safe. In your application.yml file, make environmental variables called ENV['APP_DEVELOPMENT_DATABASE], ENV['APP_TEST_DATABASE], and ENV['APP_PRODUCTION_DATABASE].

Step 4: Hide your deploy user and server

Exposing your server username and IP address also opens you up to attacks – you can imagine someone taking that username@IP and trying out passwords to ssh into your server. Better that didn’t happen. In your config/deploy.rb file, change the set user line to set :user, ENV['APP_USER'] and the three role: web, role: app, and role: db lines to role :web, ENV['APP_SERVER'], role :app, ENV['APP_SERVER'], role :db, ENV['APP_SERVER'], instead of listing out your actual server IP address in those lines.

Step 5: Git fixes

Okay, push those changes up to master – maybe do rails s and test it first, for any mistakes you might have made or some bugs that emerge. Now, you’ll want to clear your history of any reference to your actual database, server, or secret token. Copy those files you’ve just created somewhere else, outside of your app. Then you’ll want to run git filter-branch --tree-filter 'rm -f config/deploy.rb' HEAD to remove that file, git filter-branch --tree-filter 'rm -f config/database.yml' HEAD, and git filter-branch --tree-filter 'rm -f config/initializers/secret_token.rb' HEAD to remove all traces of that file in your branch. You may need to force this if it doesn’t work automatically. You’ll then want to commit that change and push it up. You may also need to force push or force merge those changes. It’s what you want to do though. Then your branch has no traces of that file. You’ll want to do this for every branch you keep in your final project (you could also consider squashing vestigial branches where all chanegs are merged). Finally, you’ll want to create those files in your app once again, and paste the files that you’ve copied elsewhere.

Step 6: Open source!

This is an exciting moment. Open source your project for the world to use! I recently went to an interview, and one of the people I met with suggested to me that the best way to get acquainted with an open-source project would be to run its test suite. That way, you literally get to see how it “should” work. In terms of thinking about my next steps now that I’ve open-sourced Teamline, making my test suite really robust is one of the big ones. Another big thing I’ve thought about in making my app more robust is changing its scope to be a multi-tenant application, so that the “teamlines” (i.e. stories of a team over time) of multiple teams can be created. Improving my documentation and identifying and fixing bugs is another one. I’ve also got the refactoring from good to great video on my YouTube queue, so going through my code and refactoring it will be important as well. I’m sure for your project, you also have some steps you’d like to take to improve it, and open-sourcing it (safely!) is a great motivation for that. Plus people can fork you, and that’s always exciting.

Let me know if you have any feedback/improvements/tips/success stories to add!

Setting Up Sidekiq With Cron Jobs

Setting up Sidekiq with Cron Jobs

For our final projects for the Flatiron School, we broke up into groups and developed web applications that related in some way to the Flatiron School. Groups made web applications to help potential students apply to Flatiron, a CRM program to help potential employers connect to students, a new and improved version of Piazza (the Blackboard-like class website we use). You can see all these projects and more at our Science Fair on Thursday, August 13.

My group worked on Teamline, a digital storytelling app that aggregate, filter, and display student-generated data sources. This included implementing multi-provider authorization using Omniauth; collecting and parsing user data from the GitHub, Twitter, and Feedzirra APIs, creating cron jobs to regularly pull data using the Whenever gem, and setting up asynchronous background processing to perform those jobs using Sidekiq and Redis.

That last one is what I want to talk about here. My classmate Matt Schmaus asked me for advice about how to implement regular scraping of high-volume data sources, and I suggested that he take the same approach I did (described below).

My challenge here was to regularly pull data from Twitter and perform it in the background, so that’s why the Twitter domain comes into play in this code, and why I chose to use cron jobs with background processing.

Creating Cron Jobs Using Whenever

First I needed to set up a regular time for my site to go and check whether there were new tweets by students whose feeds we were parsing. To do this, I set up a cron job. Learn more about these in [Railscast #164] (http://railscasts.com/episodes/164-cron-in-ruby).

Cron jobs generally look something like this:

1 0 * * * printf > /var/log/apache/error_log

0 */2 * * * /home/username/test.pl

or in other words, clear the error log at 12:01AM each day and run the test every 2 hours (thank you Wikipedia).

Luckily in Ruby, we have a gem that can do that for you. I used Whenever. Whenever provides a Ruby syntax for cron jobs, as a Ruby-like semantic clarity.

To Use Whenever:
* 1. Add gem ‘whenever’ to your Gemfile gem 'whenever', :require => false
* 2. Bundle
* 3. Cd into your app in the terminal and then use the command $ wheneverize . to create a schedule.rb file in your Config folder
* 4. Customize your schedule.rb file to perform the cron jobs you want

Here’s the one I used:

1
2
3
every 1.minute do
  rake "import_tweets:tweets"
end

Pretty syntactical, every 1 minute, do this rake task.

That brings us to my rake task.

Creating a Custom Rake Task That Calls A Sidekiq Worker

1
2
3
4
5
6
7
8
9
10
11
12
13
require 'twitter'
require_relative '../../app/workers/tweet_scrape_worker'
 
namespace :import_tweets do
 
  task :tweets => :environment do
 
    Student.all.each do |student|
      TweetScrapeWorker.perform_async(student.id)
      sleep 1
    end
  end
end

What I’ve done here is create my own custom rake task, which is actually pretty easy. Beginner Rails developers are familiar with rake tasks such as rake db:migrate. You can set up your own namespaces (db in this case) and tasks (migrate in this case). You namespace your task, as I’ve done when writing namespace :import_tweets do, and then you create a task by writing inside that do task :tweets => :environment do. Then, for this specific use case, I’ve gone through each student in our Student class, and on each, called the TweetScrapeWorker (more on that in a moment) to “perform_async” on that student, passing in the student’s id. You’ll see that I’ve also included sleep 1.

So what does this all mean?

  • Starting with the most last part of this code, I’ve included sleep 1 as a way to create a pause in between the Rake tasks, so we don’t face Twitter rate limits when polling the site.
  • import_tweets and task: tweets is pretty straightforward. It’s just a way for me to call $ rake import_tweets:tweets in my terminal to perform this task on my app
  • TweetScrapeWorker.perform_async(student.id) – this is a bit more complex. It leads me to my next tool, Sidekiq…

Sidekiq

Sidekiq is efficient background processing for Ruby. Learn more about it in Railscast #366. There’s several steps involved in setting up Sidekiq, and I would highly recommend watching the Railscast over and over again until you understand the concept and execution. One note for beginners is not to get too lost (right now) in the last part of this episode, where Ryan Bates talks about multithreading, concurrency, and thread safety. This is important and iteresting stuff, but perhaps not your top priority when you’re just starting out. Also note that I was also able to set this up without using Celluloid, which provides a Ruby-esque object-oriented way of creating concurrency in Ruby.

So, going back to my code, the first thing you see after the rake task is set up is that I’ve called a method called “perform_async” on a “TweetScrapeWorker”. In Sidekiq, you set up workers to do different jobs for you, and then send those jobs to a queue, so that when the workers are not busy, they can do your job as part of this background process. So, after including the ‘Sidekiq’ gem and the dependencies described in the Railscast, you’ll also want to create a folder in your app directory called “workers”, where you can save your workers. I’ll go through that model in a moment. You call “perform_async” (a Sidekiq convention) to send your jobs to the Sidekiq queue.

You’ll also note that I’ve passed in “student.id”. This relates to the server I’m using to do the background jobs, which is Redis. Redis is a database server. The way it is structured means that you should pass in ‘serialized’ data like the id of a student object, versus the whole object. To install Redis, use brew install redis and then start it up in your terminal using this command: $ redis-server /usr/local/etc/redis.conf.

That sums up the code I’ve written in my rake task, so now let’s go to what’s being called in my worker class.

Worker Class

1
2
3
4
5
6
7
8
class TweetScrapeWorker
  include Sidekiq::Worker
  
  def perform(student_id)
    student = Student.find(student_id)
    TweetScraper.new(student).scrape_feed
  end
end

I’ve created a camelcased class that ends with “worker” to follow Sidekiq worker class conventions. Then I’ve included the Sidekiq::Worker module, which allows this class to inherit the Sidekiq functionality. As part of this convention, I’ve created a perform method (which Sidekiq looks for), passed in the student id, and then called the TweetScraper model I’ve set up in my models directory to do my tweet scraping, creating a new instance and passing in a student, and then calling the scrape feed method on that instance. All you really need to understand here as a beginner are what are the conventions you inherit from Sidekiq.

My Tweet Scraper Model

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class TweetScraper
attr_accessor :student
 
  def initialize(student)
    @student = student
  end
 
  def scrape_feed
      Twitter.user_timeline(student.twitter_handle, options = {:count => 200}).each do |tweet|
 
        if Tweet.where(:tweet_id => tweet.id).empty?
          tweet_post = Tweet.new
          tweet_post.student_id = student.id
          tweet_post.tweet_id = tweet.id
          tweet_post.tweet_content = tweet.text
          tweet_post.tweet_published_at = tweet.created_at    
          tweet_post.profile_image_url = tweet.profile_image_url
          tweet_post.save!
 
        end
      end
   end
end

This is the easy part. I make calls to the Twitter API (which I’ve authenticated with using Omniauth elsewhere) using the Twitter gem to get tweets, and meta-content about them. One thing you might note if you are new to scraping is how I’ve set it up so that if a tweet has been scraped before, my program will recognize its unique id, and not scrape it each time. With high-volume scraping, this is pretty important. And don’t forget to tweet_post.save!

Last thing
Remember that when running your server in development phase, you should also run the redis server and sidekiq as well as rails server. You can also check out your sidekiq activity and whether everything is working at the ‘/sidekiq’ route after you set up the route in your routes.rb file like so mount Sidekiq::Web, at:'/sidekiq'.

The End


And in this way, you can create cron jobs easily in Ruby using whenever, and perform them regularly in the background using Sidekiq with Redis. Happy scraping Matt (and others)! Please post in the comments if you have any questions or if there’s anything I can clear up.

Setting Up Custom Git Hooks

Ten Steps to Custom Git Hooks

This past weekend, I participated (along with Flatiron students Adam Waxman, David Manaster, Max Jacobson, and Sarah Duve) in a hackathon sponsored by Jawbone, Tumblr, the Clinton Foundation, and Ace Hotel. It was an awesome event where we worked with the soon-to-be-public Jawbone API to create hacks that encourage healthy behavior using sleep data from the Jawbone UP bands, as a way to create behavioral change around social determinants of chronic illness (in this case sleep).

I’ll write more in depth about this hackathon later, and about the project Sarah, Max, and I created, where we wrote a custom pre-commit git hook which allows or blocks commits to a project based on a threshold of user sleep data parsed form the Jawbone UP API. However, I wanted to describe how to make custom git hooks in this post, as a fun little hack that can also provide more powerful functionalities (such as not allowing people to commit their code if they haven’t slept enough) based on how you design it.

Before I start, I wanted to describe a little more about a hook and why they are cool and important. In programming, there is a concept of hooking into a certain moment in time in your program. A common one that Ruby developers might be familiar with is the initialize method, which allows you to hook into the moment in time when a new instance of a class is created and set certain attributes, or set other programs in motion.

Git also has hooks, and I’m going to describe here how you can set them yourself.

Step 1

Create a new directory by running $ mkdir wellthisisfun or whatever you want to call it

Step 2

Set it up as a git repo by calling git init in your directory.

Step 3

Here you could push the repo up to github, but it’s not necessary to create a git hook. What you do need to do is cd into .git by doing .git. You won’t see it when you call ls on your directory, because it’s a dotfile, which means it’s hidden. However, after you initalize a git repo (step 2), it’s there in your directory, whether or not it’s on github.

Step 4

You can ls your .git directory. You’ll see some interesting stuff that’s fun to explore here, such as config, HEAD, info, and objects. If nothing else, it’s an interesting way to start learning more about what git is and how it works. In this step, we want to get into the hook folder by calling $ cd hooks

Step 5

Open up your hooks. I’d suggest using Sublime here, so you can do something like $ subl . and look at all of them.

Step 6

Cool! Here are your git hooks. You can see the moments in time you can hook into through git here. In our project, we played around with the pre-commit.sample hook, but you can browse through all of these, and figure out what the best tool is for what you want to do (or which sandbox interests you the most).

Step 7

Setting your Ruby environment is necessary if you want to be able to write this hook in Ruby. Put this at the top of your code: #!/usr/bin/env ruby

Step 8

Write your hook. You can write stuff like puts "hello there #{ENV["USER"]}"to say your name, puts Time.now to put the current time, or things like exit(1) to make it exit the program. Play around. You can require gems (since they have global scope on your computer, this works), and do all sorts of Rubyesque/Railslike stuff, such as making calls to APIs, etc.

Step 9

Make your hook executable. This requires either one or two steps. The first is to take away the “.sample”“ at the end of your file name (e.g. ”“pre-commit.sample”). If you decide to do this by saving over your pre-commit.sample file, your file is already executable, so that’s it. If you decide to just add a new file to hooks called something like pre-commit (or whatever moment you want to hook into), you’ll need to run this in the command line $ chmod a+x YOURFILE (e.g. $ chmod ax pre-commit). The point of this step is to make your computer able to execute the file. Also, just a note – make sure to copy the same format of the filename as the existing hook samples. For example, pre-commit will work, since there is a pre-commit.sample hook, but precommit will not work.

Step 10

Test it out. Cd to the top of your directory, create a new file by doing $ touch myfile.rb or make changes to a file, run git add ., commit it by running git commit -am 'my message', and you should see your hook executed when you do.


I think this is fun way to explore the concept of hooks, to think more deeply about git, and to conceptualize what you can do by hooking into those moments in git and linking them to APIs, other processes you are running, etc. Please let me know if you have any questions or thoughts in the comments!

So Many Resources…

I’m starting to feel both excited and overwhelmed by how many resources there are for Ruby and Rails online, especially now that I’m at a point where I have built up a base so that I can really start to take advantage of them.

Types of resources:

Blog Posts There are a lot of blog posts out there that a beginner can take advantage of. For me, though I’ve been on and off blogging for basically my entire sentient life, it’s new and different to read technical blogs. I find them useful both for trying to solve my problems, and for trying to become a better code communicator and storyteller. On Tuesday, Peter Cooper from Ruby Weekly spoke to our Flatiron School class via Skype. I really enjoyed hearing about his background, his approach, and how his role in the community has evolved. It made me feel inspired to try my own hand at email newsletters – never thought I’d say that!

One of the things that stuck with me from Peter’s presentation was what he said about how to tell a good story or teach someone about code. He said that he has an ability to always remember what it was like to be a beginner, and leverages that mindset in order to most effectively teach people about code. Between that comment, and reading the thought-provoking blog posts of my peers at Flatiron and of the Ruby community in general, I am trying to become more comfortable with blogging about code as a beginner. Sometimes I think that perhaps my realizations are so minute that they might be irrelevant to others – “oh, I need to bundle update” or “I just needed to drop the database” or “so THAT’S how application.yml works.” At the same time, I know that whenever I find posts or discussions that help me quickly resolve a problem that was leading me down an unnecessary rabbit hole, I feel so thankful.

I don’t often go on Hacker News, but when I do I usually find some really cool stuff, so maybe I should start going on more… Yesterday I found this blog which really inspired me: http://blog.jenniferdewalt.com/post/56319597560/im-learning-to-code-by-building-180-websites-in-180 Jennifer DeWalt is teaching herself to code to building 180 websites in 180 days, starting from a place of no experience. I so admire both the learning approach, what she’s actually been able to make, and her openness to exposing her learning in this way. If you go back to her first posts (http://blog.jenniferdewalt.com/page/15), she is able to write about her learning as a beginner in a concise, technical, and elegant way. Goals!

So there are a lot of blog posts out there to read, to find when I search for answers to my questions, and to try to emulate as I become a better communicator of my code ‘journey’.

Railscasts Omg Railscasts. I’m pretty sure this was the reaction of me and all my classmates when we first started listening to them. Sometimes when you’re starting out, you’re not quite sure how to begin your learning and identifying topics to investigate. Ryan Bates knows. And he explains them so well. At first I was listening to them while working out (the app is awesome, thank you to whoever made it!) or on the go, but I’ve more recently started to code along with them. Each approach provides different benefits. It’s intellectually fascinating to better understand rails initialization. However, when learning about dynamic forms or omniauth, it might be better to code along. Anyways, Railscasts is an addictively useful and interesting resource – and this is coming from someone who is really averse to online videos, especially ones on the longer side.

Curated Resources (e.g. Ruby Weekly) I just signed up for Ruby Weekly (and a bunch of other stuff too) after Peter Cooper spoke to our class, but I’m excited to get my first batch of newsletters! Again, I expect these to provide more blogs and online resources to help me overcome specific challenges in my programs.

Speakerdecks I’ve written about these in an earlier blog post, but I LOVE speakerdecks. I could watch Zach Holman teach me about Github all day. I find the speakerdeck content as a whole really engaging and high quality. It’s another resource you could spend a long time exploring. Additionally, it’s another expression of how people think about presenting and teaching others about the programs and apps they are building or what they are learning as a result.

Books There are so many books. I’ve started an ebook folder in my Dropbox which is already burgeoning. I could write a whole post about deciding which book to read, where to start, how to read it, etc. I could code along with a book, and might learn more, but I would read it much slower. I could go for breadth, and try to get the ‘sound’ of the approach of the author in my head. You just have to start somewhere, but in this field it feels like there is always so much more to learn that figuring out the most efficent use of your time is challenging.

And there’s more… For me, the wealth of resources about Ruby and Rails is both incredibly exciting and a bit of a challenge. I love to understand how things work and how different parts of a system operate in tandem, so I definitely can fall into reading, watching, and learning from these resources in the few hours I have between the end of Flatiron on one day and the start of school on the next. At the same time, I’ve seen my understanding solidify markedly over the last week and a half, as we’ve moved at Flatiron from lecture/lab mode to the application-building phase. I know that the more code I write, the better I will be. At the same time, reading Practical Object Oriented Design in Ruby or watching Railscasts feels like a master class in understanding how to code better and helps close some of the gaps in my understanding. I feel thankful that I’ve discovered a field where there is so much to learn all the time. The question now is how best to do that…

In some way, learning to program reminds me of when the iPhone first came out. My father was the first one in our family with one. At first, it was this surprising new thing, where each time we had a debate or an unanswered question, he would just default to looking up the answer on his iPhone. This is kind of obvious behavior by now, but it was a totally different approach than remembering to look it up when you get home or whatever. In some way, I think these resources present an opportunity to do just that. As I try to build apps and programs, there are answers to my questions and new paths they can lead me down in terms of learning, as long as I figure out how and when to ask.

Slides From My Presentation

Slides from my talk about conceptualizing Rails after Sinatra (after one week of Rails) below:

https://speakerdeck.com/ruthienachmany/conceptualizing-rails-after-sinatra