What a n00b!

MySQL Secure Installation Using Ansible

Recently I was setting up MySQL using Ansible, and wanted to ensure the mysql_secure_installation script or an equivalent was run to get rid of the default users and db. Turned out that writing a task in Ansible wasn't all that bad using the check_implicit_admin feature of the MySQL plugins, it could even be idempotent.

Weekend Project: Library Checker

This weekend I finally took some time to work a project that I've been toying with for a while. I built a small app that for now is being known as LibChecker. It's a Flask app which takes a URL to a public Amazon wishlist and searches for each item on the list at my local library (Hennepin County Library, hclib.org). Here's a screenshot of the app in action:

screenshot

The biggest challenge here was that neither Amazon wishlists nor the library have any sort of API. The app uses the BeautifulSoup library to scrape results. Since this is just screen scraping, there's quite a few ways that it can break or be wrong. The search on the library side is pretty aggressive, stripping out editions and subtitles. Then, it matches against a single author (the library gives back only a single author when in search view). Despite all of this, it does the job that I set out to do here. The links given are to a specific result within a set of search results, so the user can easily browse for related titles if the result isn't quite correct.

The code for this is available on BitBucket. The app has been deployed on OpenShift at libchecker.wyattwalter.com.

2015 Reading List - Part 2

Continuing on with my post from earlier this year, here's my list of books for Q3 of 2015. This time I read most of the books during the first month or so and then slowed down considerably for a while (too busy with other things).

  • Good Boss, Bad Boss: How to Be the Best... and Learn from the Worst - Picked this one up at the recommendation of my own boss. A great read, for sure. The biggest takeaway for me reading it the first time is an understanding and acceptance that I'll never truly know what it's like to work for me. And that's ok. This one was packed with advice and definitely worth the read for me.

  • The Practice of System and Network Administration, Second Edition - This one is a classic in my field. Would definitely recommend at least a skim if you're a SysAdmin and have never read it. I plan on picking up the later released Practice of Cloud Administration at some point in the near future.

  • Zero to One: Notes on Startups, or How to Build the Future - This book is a long form of asking seven questions to answer when considering launching a business. It changed my perspective a bit and helped me evaluate previous failures a bit more.

  • The One Minute Negotiator: Simple Steps to Reach Better Agreements - I picked this book up at the library by accident. I had read The One Minute Manager in the past and confused it with this one sitting on the shelf. I'm glad I did. It's a short read, but a great prescriptive plan for analysis of whether or not a situation is a negotiation and what a good approach may be.

  • Resilience and Reliability on AWS - I have had this one on my wish list for a while, by the title and description, I thought it would be really helpful. Was a total disappointment to me. The content wasn't terribly deep and was a bunch of source code which could've been a repository on Github.

  • Execution: The Discipline of Getting Things Done - Thinking about getting things done at a level that's larger than myself is something I've thought about often, but rarely a thing I've been responsible for. This book was really an interesting one to me, but is focused at the executive layers of an organization.

  • Made to Stick: Why Some Ideas Survive and Others Die - While there's a good deal of creativity and art in advertising and other types of campaigns, Dan and Chip Heath argue that there's a framework to follow in order to convey a message that "sticks". Definitely one that I would recommend to anyone looking to be heard with an important message (who isn't?).

DevOpsDays Minneapolis 2015

I had the pleasure of attending DevOpsDays Minneapolis again this year. This is my third time at a DevOpsDays, the 2nd such event here in Minneapolis. Below are some of my highlights and takeaways from the conference this year.

Designated Ops

During Katherine Daniels' talk on DevOps at Etsy, she described what they call there "Designated Ops", which is a person from the Ops team designated (but not dedicated!) to each smaller group within their product engineering team. Each team's designated Ops person would attend standups most days for that team and develop a relationship over time with that team and be their first point of contact into the Ops team. This helps them develop familiarity, trust, and get an operational view of things in the development phase, rather than being reactive when some new feature shipped without any input.

"Developer of the week"

Relatedly, later a similar idea came out during an open space session (unfortunately, I don't know the person or company involved here so I can't give proper credit) where the development team had a rotation of what they called the "developer of the week". During a developer's "week", they would physically be sitting amongst the customer support team, helping answer questions about their company's product, seeing pain points for customers and customer support folks, as well as sharing tips and tricks that maybe they wouldn't have realized would be useful.

Remote Work

During another open space, we talked quite a bit about remote work and how to improve bonds between team members even though we can't often be physically in the same space often (for some people, ever). Some action items for me:

  • buy a real camera! - Using the little web cam that's built into the laptop is not flattering to anyone, you're almost always looking at someone at weird angles. Video chat really is the best way we have today to interact with a remote team today. Invest in making the experience as good as it can (reasonably) be.
  • everyone uses their camera - There can be exceptions to this rule occasionally, but if you're having a meeting everyone should be engaged and connecting which can be made far better by looking at each other and watching responses. If the team is not engaged, what's the point of the meeting?
  • establish protocol for communicating and asking/answering questions - This is especially important when working with teams which are cross-cutural. When asking questions via text or video chat, it can be impossible or difficult to read someone's body language. Examples in the discussion:
    • When answering a question, be sure to ask back "did that answer your question?" if no positive response is given.
    • If someone is intimidated by asking a group and asks via private message, ask the question to the group on their behalf (not revealing who asked it).

"Finished"

Another open space that I got a lot of value from was one proposed to talk on how to "finish" work in an either IT or development that can never really be "done". This is a reality that's faced in a lot of fields. If you're not careful this list of "I'd like to get back to that" can start to become depressing. This topic deserves a post on its own, but I'll add the takeaways that I thought were helpful for my situation.

You don't really need to accomplish more to feel accomplishment. Morale is really the most important thing. While getting tasks completed is important, accomplishing more tasks more quickly will happen naturally as a team accrues wins and gains momentum. In a lot of cases "done" really means finishing with the current iteration. While you should start with an end goal in mind, don't create grandiose tickets with all of the things you'd like to do on a project that stay open indefinitely.

Related to the previous point, retrospectives and demos are really important. Often in Ops work demos are difficult. Patching and maintenance work don't demo well, so this one may be used sparingly for some teams. Retrospecitves are something that I can't say I've ever participated in, will certainly be trying this one.

TODO

Recommended followup reading and watching:

  • The Year Without Pants: WordPress.com and the Future of Work - a book on remote work at Wordpress, recommended to me by someone in the remote work open space
  • The Box: How the Shipping Container Made the World Smaller and the World Economy Bigger - a book on literal shipping containers, about friction in the shipping industry and how it was changed, recommended by Mary Poppendeick during her talk "The New Software Development Game"
  • Focus: Use Different Ways of Seeing the World for Success and Influence - another recommendation by Mary, on motivation and how we view the world
  • This is Lean: Resolving the Efficiency Paradox - I believe also recommended during Mary's talk, don't remember the context for this one now
  • "How DevOps Can Fix Federal Government IT" by Mark Schwartz at DevOps Enterprise Summit 2014 - recommended by Joshua Zimmerman during his talk on DevOps and public sector

2015 Reading List - Part 1

Inspired by other folks posting reading lists and wanting to start writing again, I've decided to start compiling my list of books read and post them periodically. So far this is the list I've recorded for the first half of 2015. Since I didn't start recording the books I read until recently, this list is only the highlights (and lowlights) that I can remember.

  • How to Win Friends & Influence People by Dale Carnegie - This one is a classic in the self-help genre. The book was less about winning friends as the title suggests, and more about influence and getting along with strangers. I'd definitely recommend it to anyone who struggles with conflicts and affecting behavior from others.

  • How to Fail at Almost Everything and Still Win Big: Kind of the Story of My Life by Scott Adams - Yes, that's the same Scott Adams of Dilbert fame. His storytelling is amazing and this book has actually had a major impact on my year. Specifically, his generic life advice - diet, exercise, and "using systems not goals" - has helped me quite a bit since I read the book. Not bad for a book by a guy that creates comics for a living.

  • Quiet: The Power of Introverts in a World That Can't Stop Talking by Susan Cain - This one was recommended at a local tech Meetup, so I decided to check it out from the library. As someone who considers himself an introvert, it was eye-opening to read examples of how my world experiences are vastly different from others with less of this trait. Definitely worth a read, whether you consider yourself introverted or not. I did skip over a bit of the first section of the book as it got a bit repetitive to me while describing all the reasons that it's an "extroverted world".

  • Managing Humans: Biting and Humorous Tales of a Software Engineering Manager by Michael Lopp - This book has been on my shelf for a while. I don't even remember where or why I picked it up. Since I recently became a manager (again), I figured now was as good a time as any to finally read it. A pretty generic read about managing software developers, covered quite a few topics that I didn't think about as someone who went from being an engineer to manager without a lot of formal training.

  • Great by Choice: Uncertainty, Chaos, and Luck--Why Some Thrive Despite Them All by Jim Collins - I've read a lot of the other popular books by Collins (Good to Great, Built to Last, How the Mighty Fall) and this one definitely didn't disappoint. None of the concepts about "greatness" and achivement introduced are new, but reiterating each of the concepts in the compilation and reinforcing their importance was good stuff.

  • Start with Why: How Great Leaders Inspire Everyone to Take Action by Simon Sinek - I read this one immediately following Great By Choice which was kind of interesting since several of the companies studied are the exact same ones. The two authors came to slightly different conclusions about what made those companies great (though not necessarily competing), which is expected since there's really not one solution for being "great". The book was a bit repetitive, but perhaps my perspective was skewed by reading a book on a similar topic so close together.

  • The Goal: A Process of Ongoing Improvement by Eliyahu M Goldratt and Jeff Cox - I really enjoy a well-done education fiction book, and The Goal definitely fits into that category. This fictional book is used to provide an introduction to the lean manufacturing techniques and does so in a really easy-to-read way.

  • Rework by Jason Fried - This book is a collection of essays from 37 Signals capturing lessons learned while building their business. It's been on my list of books to read for a while and was a bit disappointed at the lack of depth.

That's it for now. Hopefully a longer list for the second half of 2015 since I've started to record which books I read in a consistent place.

Cleanup After Conversion from Aperture to Photos

I finally got around to upgrading my Mac to Yosemite this past weekend. With the upgrade, Aperture stopped working and since Apple since stopped supporting it and pulled it from the App Store (jerks), I was stuck converting my library over to the new Photos app.

The actual conversion was fairly smooth. My photos all imported, and the old library was renamed "Aperture Library.migrated" or something like it on disk. That's where it got a bit confusing for me. See, I don't have that large of a library (130G or so), but it's larger than it really needs to be. Anyone who shoots in raw can likely relate. Each image at 20M+ - with a really low-end and outdated DSLR - can add up quickly, especially when you don't cleanup things like 30 blurry shots of that bear we encountered at Sequoia National Park a few years ago.

I have been wanting to cleanup my library for quite some time and decided that since I'd be learning this new Photos app, now seemed like a great time to do some cleaning. Since I wanted to know how much impact my cleanup efforts had actually made (what engineer doesn't measure before they start?), I did my usual thing when checking out disk space used. I hopped over to iTerm, cd'd to my Pictures directory, and started poking around. Turned out, du was showing ~130G for my Aperture library, but something like 6G for Photos. This was sort of weird to me since I thought my library had been "migrated" and now I didn't know what to do with that old Aperture library.

I sort of ignored this situation at first and started cleaning out photos (since I did know roughly where I had started). However, all of my efforts to clean out photos (I calculated that it should've been at least 10G at some point) had only produced more disk space usage than when I started. This was due to Photos generating thumbnails, which I assumed would happen, but no positive progress was made on the Masters part of the library. I read a few forum threads about older versions of Aperture not deleting originals, and was dreading the mess that I was likely going to have to clean up.

After exploring for a bit more (okay, like an hour), I discovered that du showed the Photos library was ~120G when I ran du directly against it, rather than using "*" in the Pictures directory. Clever. Photos was using hardlinks to maintain the same file structure for the "Masters" directory, but not duplicate space usage after the migration of the library. After quickly experimenting and confirming that the inodes were indeed the same between the two libraries for a few random masters, I realized the only thing left to do was delete the old Aperture library to get rid of the references to the deleted files (after safely archiving it to an additional external drive, of course).

Except for one more problem. It turned out that Time Machine will create a local backup copy of files (not cleverly using hardlinks!) while you delete them, until they can be transfered safely to your backup volume. This is great, except that I didn't have an extra 130G on my disk at the time to store them. There's also apparently no safety mechanism for Time Machine to detect this exact situation. Fortunately, I caught on around the 94% mark while it was emptying my trash. Cancelling the trash emptying (took a long time and) made some of the temporary files disappear, but not all. I was still stuck around 90%.

Looking at the storage data in the "About This Mac" window, it clearly showed that a lot of storage was still being used by "Backups". Since I had actually planned on switching Time Machine volumes, a quick sudo tmutil disablelocal cleared out that storage. Since this is normally a useful feature, I also turned it back on right away with sudo tmutil enablelocal.

Good times. I'm pretty late on this one, but hopefully I can help someone else avoid the same pains I went through!

Documentation and Monitoring

At DevOpsDays Minneapolis a few weeks ago, we were discussing the topic of documentation within the context of Operations/DevOps/IT/whatever. After I talked a bit about what we did at our company, I realized that we were sort of unique, and others found this technique useful. I thought I'd share a bit about what we're doing. Certainly we're not all the way there yet, but striving to improve over time.

Playbooks

For any check that gets added to the monitoring system, what we call a "playbook" must be written before the pull request is approved and merged. A playbook is essentially a document describing the following things:

  1. What this check is checking - this seems obvious, but should include things like data sources; the "what"
  2. What the impact of this alert could be - what services would be affected; the "why"
  3. Where to go digging for more info on what could've caused this state; at a minimum, start with a log file
  4. Bonus: an idea of what "normal" looks like

A playbook doesn't necessarily have to have all that much info, just enough to give the person who's oncall a fighting chance. This also seems to be a fairly nice format to begin documenting something. It gives practical knowledge at the time it's needed.

Notifications

We've recently migrated to Sensu, which helps us out a lot here. Since it acts as a 'monitoring router' and lets us setup everything as we want, we can easily add arbitrary data and display them in alerts however we want. All checks are defined in Puppet and we add a playbook in the custom data on a check like so:

$wiki = 'http://wiki.example.com' check 'elasticsearch' { ... custom => { ... playbook => "${wiki}/Elastic_Search#Dealing_with_Pages", ... } }

Once this playbook is defined as an attribute on the check, you can easily add it to the message goes out. In this case, we're using the standard mailer handler from the community repo with lines added something like this:

playbook = "Playbook: #{@event['check']['playbook']}" if @event['check']['playbook'] ... body = <<-BODY.gsub(/^ {14}/, '') #{@event['check']['output']} Host: #{@event['client']['name']} Timestamp: #{Time.at(@event['check']['issued'])} Address: #{@event['client']['address']} Check Name: #{@event['check']['name']} Command: #{@event['check']['command']} Status: #{@event['check']['status']} Occurrences: #{@event['occurrences']} #{playbook} BODY

This adds a link in the email body to the playbook if it exists (like I said, not perfect yet :)).

Conclusion

When faced with the challenge of building out documentation for an environment, writing down the 'what', 'why', and 'where to start digging' when something pages is an excellent (and seemingly often overlooked) first step. No one has time to read a 10 page manual in this scenario which will force the writing to be concise and as helpful as possible. Obviously, implementation of this concept will vary wildly, depending upon which monitoring solution you might use.

Remote Access to AirPort via SSH Port Forwarding

At home I use an AirPort Extreme as my firewall / access point / all of that, with a few ports forwarded through to access some services I have running on a small box at my house (among them SSH access to mange things). I ran into a situation while traveling where I wanted to make a couple of changes to the port forwarding configuration, but did not have the option to "Allow setup over WAN", and also have no desire to enable this. I've always had home routers that have some web GUI, so I just used SSH port forwarding and hit the web interface in my browser. As it turns out, it's not a lot more difficult with an AirPort Extreme, and still lets me leave the AirPort mangement port not to be exposed to the Internet. In my situation, I have my MacBook along and have a Linux machine at home that I access via SSH.

The AirPort Utility communicates with the AirPort via TCP port 5009. You can setup a port forward with something like:

ssh -L 5009:172.16.0.1:5009 home.mydomain.com

Of course, you'll want to change your internal IP address (172.16.0.1 in my case) to whatever the internal IP is for your AirPort and then change the SSH hostname to whatever you use. If you don't know the IP of your AirPort and have a fairly typical router, you can use 'route' on Linux or Mac systems to find the IP of your default gateway. For most people, this will be the IP of their AirPort.

Leave the above SSH session open in the background and launch the AirPort Utility on your Mac. When you launch the Utility, it will probably not find one (unless you happen to be on another network with an AirPort). Go to File -> Configure Other. In the pop-up, enter "localhost" in the Address field, and your password into the Password field. You should then be able to manage the AirPort via the utility. Note that as usual any changes you save to the AirPort will trigger a reboot and probably cut off the connection you established above. You will likely have to restart the tunnel after each time the box reboots.

Deploying Your Nikola Site to S3 (and CloudFront)

A few weeks ago, I made a post about moving my site into S3 and a few things unexpected issues that I ran into. What I didn't mention was my migration to Nikola from WordPress and then my actual deployment into S3. I've outlined most of the deployment steps I went through. A basic understanding of AWS services (specifically S3 and CloudFront) is assumed, but I try to be as helpful as I can.

S3 Setup

This is fairly well documented in the S3 documentation, but to host a website you first need to create a bucket with the name of the hostname you want to use. For me, I have two buckets: whatan00b.com and www.whatan00b.com (more on the two buckets later). Once the buckets are created, you'll want to enable website hosting on them. In the S3 interface, if you click on Properties for the bucket, expand the "Static Website Hosting" bit. Check the box for "Enable website hosting". The default document root of index.html is perfectly fine. Error Document is something you'll likely want to come back to later.

If you created two buckets like I did, you'll want to do that step only on the primary bucket you want to host the website through. For the other bucket (for me, whatan00b.com), instead check the radio button for "Redirect requests to another host name". You'll want to fill out the hostname you used in the other bucket to redirect requests and save it.

"Pretty URLs"

This is somewhat related to the migration from WordPress, but I really didn't want to have URLs that pointed directly to the static HTML file, I wanted to hit the directory index to view pages. This was mainly to avoid adding redirects in S3. This is easily doable, I just like the cleaner look of URLs. If you left index.html as the document root as index.html in the S3 setup step, things should be fine on the S3 side.

On the Nikola side, I used the PRETTY_URLS option and set it to True in config.py. At the time of this writing, I had to install Nikola from the master branch on Github to make this work (see example) . It appears this feature will be released in 5.4.5. Enabling this option creates a directory for each blog post, with an index.html inside. Note that S3 will redirect and add a trailing slash if the user does not add one themselves (this was fine for me).

Deploying the files!

I have been using Fabric for other projects as of late, so I actually skipped the deployment bit altogether inside of Nikola. The deployment piece inside of Nikola currently only allows users to specify a list of commands that make up their deployment anyway (I assume this is generally an rsync to a list of servers or something along those lines for most folks). This is probably easily extendable, but I'm already in the habit of doing a 'fab deploy' when I am ready to push out code anyway, so I chose to stick with Fabric and ignore the Nikola deployment bit altogether for now. You could easily write the same type of function and add a "fab deploy" to the deployment section of your Nikola config file. You can view my deployment script over on Bitbucket (and the whole site for that matter!). It's probably not the most efficient file sync function ever written, but it's mine and it works for me.

If you don't want to go through all of this, there are lots of FTP / SFTP / whatever clients out there that support S3. You can also easily use something like Cyberduck on a Mac or Windows to push the files to your bucket.

CloudFront Setup

The CloudFront setup was one that bit me. This is something that I mentioned in my previous post, but I'll reiterate it again. For this setup, don't set CloudFront to use the S3 bucket as its source. Set the origin to be a "Custom Origin", using the S3 bucket website hosting endpoint as the URL. This can be found in the bucket properties section you looked at in the S3 setup section. If you don't do this, the site index will work, but lower directory indexes won't work.

Once the distribution is created and origins set, you should be able to hit the origin URL given in the AWS console. Once you are happy with the way things look, you can add an alias on the CloudFront distribution with the hostname you want to use for your site. After that's complete, you're just a CNAME away from the site being hosted on CloudFront. I happen to use Route53 as my DNS provider, so I just added an alias for the CloudFront distribution and things work nicely.

Adventures in Static Site Hosting on S3

I recently switched this site from WordPress to be static pages stored on Amazon S3. I had significantly more problems than I anticipated with hosting a simple static site. This is my story.

Why static?

Every static site generator out there has a section header on their front page posing essentially the same question. While I don't particularly want to duplicate this list, I did want to spend a moment and speak to my own motivation for doing so.

  • No server needed! - I've been using WordPress basically since I started blogging, so about 4 years on and off. During that time, I had to find a place to host my blog that had PHP and MySQL installed. I moved from shared hosting to hopping AWS Free Tier accounts for a while, which has been getting old for a while now. I chose S3 specifically because it was a service that is far more available than I could accomplish by myself and is far simpler to use. Plus, the cost of a little bit of storage in S3 is stupidly cheap compared with running an EC2 instance or in a dedicated or shared host at any provider.

  • Security - With a CMS, it's very important to keep your systems patched. You have to worry about things at a platform level (for me: OS, PHP, Apache, MySQL) as well as the application layer (and don't forget about all the plugins!). With static hosting via S3, I have nothing to worry about except keeping my AWS credentials safe. Reducing this exposure was important to me as I don't have time to keep up on vulnerabilities at all these layers.

I'm sure there are plenty of other benefits, but I was already sold.

Pitfalls

As I said earlier, I hit quite a few more snags (and used a few more workarounds) than I anticipated when initially evaluating S3. Below are the major ones I encountered, hopefully I can help someone avoid these in the future.

  • Lack of CloudFront aliases in Route53 - This was a bit of an issue for me as I used to host my blog off the domain apex (whatan00b.com). I had to add a 'www.' in front to make this work properly using a CNAME. Of course, less than 24 hours after I post this I get an email in my inbox saying this feature was released.. :)

  • Lack of solid on-the-fly compression support - I greatly misunderstood this one at first. S3 does support serving gzip-compressed content, however you have to upload and link directly to the gzipped files. You essentially just gzip your HTML, CSS, etc files and serve them up directly, hoping that every single client that connects supports gzip. This probably isn't a terribly unsafe assumption with browsers, but I am a little worried about crawlers, not wanting to cause issues with SEO, etc. It may be a non-issue, but certainly something to consider carefully. For now I have opted to leave things plain.

  • CloudFront does not forgive. CloudFront does not forget. Expect CloudFront. - CloudFront can cache things for a while, which is of course a big part of why it's awesome and useful. Unfortunately, if you cache badness, badness lives on for a while. In my case, I changed the site to use "www." and changed the S3 bucket redirects in the correct order, but I did forget to change the origin for my CloudFront distribution to use the new bucket without a redirect. This caused infinite redirect loops for the site at various places throughout the world, but not all. It was impossible for me to catch without a tool like Pingdom. This took a couple of invalidation requests, but seems to have quieted down for me now.

  • Don't set CloudFront to front-end your S3 bucket directly - CloudFront does have the option to use an S3 bucket as the origin when setting up an origin. This works great if you aren't concerned about needing the index option or error document. To get both of those to work properly, use the endpoint URL from the Static Website Hosting setup section for the S3 bucket instead.

Performance

I use Pingdom for monitoring my website (and would highly recommend it), and saw some weird results when switching to the static site hosted on S3. The average response time globally went from ~500ms with WordPress on a single EC2 micro instance to ~700ms with S3. I would probably blame lack of gzip on this, but didn't take the time to properly investigate. CloudFront far more than made up for the performance difference. After the switch, the response time hangs out just above 150ms for a worldwide average.

Pingdom graph

Cost

I've been running the site on S3 + CloudFront for just shy of 10 days and so far have racked up a giant bill of 28 cents. Around half of that was due to a stupid amount of PUTs I generated when experimenting with deployment tools.. Oops. The pricing can't possibly compete with trying to run cloud systems on my own, and performance (and price!) is far better than any shared provider I have found.