CascadiaPHP 2019 - Day Two
CascadiaPHP is an amazing conference in Portland, Oregon hosted at University Place Conference Center. I attended 2019's edition and it was full of great content and fantastic speakers.
This is the second of a three-part series about the conference.
« Day One: Thursday, Sept 19th
» Day Three: Saturday, Sept 21st
Friday Sept 20th
Opening Keynote: The Knowledge Grows
This was the opening keynote for CascadiaPHP, and it was fantastic.
Beth talked about the importance of being a mentor. How it can be challenging, but also supremely rewarding on both an individual and company level. It's important to define your needs and goals, particularly with an eye on achievable definitions. "I want to implement a customer contact form", not "I want to learn PHP".
Setting boundaries is also important for the mentor-mentee relationship, including establishing plans upfront for how to handle going out-of-bounds. Another aspect is establishing whether your goals are aligned or have an overlap, and whether you can communicate comfortably with each other.
Plan deadlines ahead of time (e.g., 3 deadlines out) so that the work doesn't accidentally slip or languish. Aim for three tangible deliverables for those deadlines. Schedule meetings 2-3 out ahead of time. This will help with the routine of the relationship, and will help to ensure a path to an effective mentoring relationship.
And make sure to evaluate the relationship. Have retrospectives to better understand how the relationship is working for each other, and whether it might be time to go another way or temporarily step away from things.
Some tools were also suggested, like calendar reminders and custom Wunderlists as well as Asana/Trello. One suggestion that stuck out for me was using scheduled emails to remind "we're meeting in 2 days, make sure to be ready for the meeting". This will help the mentee bring their A-game, even if the deadline itself might be getting missed for some reason. The mentee will be ready to discuss the why at the meeting, or will have an opportunity to reschedule.
Machine Learning: A Beginner's Practical Guide
Michael talked about different types of machine learning algorithms, but more importantly, about how important it is to handle the data that is being used to feed the ML algorithms.
The data can come from any source (logs, databases, telemetry from IoT devices, etc). When collecting it, the formatting isn't a primary concern.
Once you have your data, you have to prepare it to be analyzed. One key
point was that ML is not magic, it's math - so the data has to be
reduced down to numbers. Later in the presentation, there was a great
example where cities (Miami, Tampa Bay, and .. Cedar Rapids?)
had to be converted to numbers. Most information system developers, such
as myself, would reach for a simple mapping: 1 - Miami, 2 - Tampa Bay,
3 - Cedar Rapids. But this doesn't work in ML land! The ML algorithm
might think that the size of the number matters for identifying the
city. Instead, a column is added to each row in the training set for
each city, and it contains a 0
or a 1
based on whether it is in that
city or not.
Then, he advised using visualizations to analyze the data. This can involve scatter plots, heat maps, count-plots, and more. The goal is to identify which values might have an effect on what you're trying to predict.
Collecting, cleaning, and analyzing data takes 90% of data scientists' time. Once those phases are complete, you can move on to the fancy stuff: Model Training.
Michael outined 3 main categories of ML algorithms. Supervised learning, where each row has had a human determine the definition, and where the algorithms concentrate on classification (e.g., tagging photos and making other predictions) or regression (e.g., trying to predict a continuous range of numbers like weather trends or stock markets). Unsupervised learning, where the rows have not been human-vetted, and where the algorithms concentrate on Clustering and Association (e.g., recommendation engines). And finally, Reinforcement learning, where the algorthms are driven by positive and negative rewards to gradually 'learn' to be more accurate.
He demonstrated a three-set approach to training ML models. Dividing validated data into a training set, a validation set, and a test set. Then, he demonstrated several runs of an SGDClassifier versus a RandomForestClassifier and showed what accuracy they were able to attain against the validation and test sets, and how sometimes the SGDClassifier would be more correct than the RandomForestClassifier, despite the latter having a higher general accuracy rating.
Converting Your Dev Environment to a Docker Stack
Dana's introduction to Docker, Docker Swarms, and docker-compose.yml
was extremely useful for me.
She began by talking about the bad old days of sharing multi-gigabyte VM images, and how Docker improves on that by allowing developers to declare the environment in a small teeny-tiny YAML which is then used to spin up a container configured exactly to the specification.
One particularly helpful part of the presentation was the overview of the hierarchy of Docker components. An "image" is sort of an unchanging constant snapshot, a "Container" is an instantiated/running version of an image, a "Service" is a group of Containers based on the same Image, etc etc until you get to the Swarm which lets all your services/nodes work together to deliver your application's functionality.
Docker makes it so you can stand on the shoulders of giants - images, both official and 3rd-party, exist for a ton of services. This takes much of the difficulty out of configuring systems. You'll want to target a specific SHA to ensure consistency, though.
Docker commands follow the model docker (object) (action) (args)
, like
docker image ls
to perform the ls
action for all image
s.
To pull one of those images of giants, you would run docker image pull
mysql:latest
(or a SHA instead of latest
).
After talking about how to define local and named storage volumes, the
talk hit on the parts that were the most helpful and enlightening for
me: logs! Well, specifically, there was an example of how to drill down
from the top of a malfunctioning stack all the way to the individual
container that was not working right. And then a demonstration of using
logs with following to see live requests
(docker service logs cascadia_nginx -f
, along with --no-trunc
if
the lines are being truncated too soon).
I was very happy with this talk and I'm looking forward to applying what I learned. :)
Lunch Keynote: Empathy as a Service
This talk was a powerful and moving overview of mental health in the workplace, and what can be done to improve things for everyone.
It's also a challenge to summarize because it was very content-rich but also personal in a way that makes me not want to leave anything out.
Nara spoke about her friend Ni Mu who, after relocating to San Francisco for a tech job, took her own life. She had previously struggled with bipolar disorder, and it seemed that the relocation had played no small part in what happened.
Nara had been scheduled to speak at a conference just a handful of days later, and she wrote the organizers and told them she wouldn't be able to speak on her original topic. They agreed to let her present an entirely new talk about mental health. Nara used the OSMI resources as a reference point for the talk's content.
In her friend's case, relocation seemed to be the trigger, so Nara presented some observations about how relocation is difficult. It takes you away from your existing support systems and building new systems & making new friends is hard. She suggests that remote employment can be more inclusive because it can alleviate relocation-related concerns, but it has its own issues.
"Stop saying that the best way to succeed in tech is to relocate to Silicon Valley"
For relocated employees, a welcome wagon approach can jumpstart their experience. This can mean taking the new employee out for lunch and/or dinner on the very first day - but be careful to avoid alcohol. (Alcohol is a sensitive topic, and needs to be handled with care. Best not to jump into it on the first day.)
Take turns introducing the new employee to activities and events: Bocce! Rock climbing! Book club! This will help to form connections that can be difficult to organically grow.
Remote work is largely a good thing from a mental health perspective. It's better for appointments (health professionals also work 9-5 days). It's easier to take breaks and there's less anxiety or peer pressure about sitting at a desk. It can also be peaceful and quiet, assuming you have such a working environment.
Companies should make funds available for coworking spaces. They should fly employees into the office & do company-wide offsites. And companies should work to follow the latest research on remote work culture.
Work/Life balance can be harder as a remote worker. It can be isolating, and it's harder to build relationships with coworkers. The lack of proximity can also make it harder to tell if a coworker is doing OK or what their health status is.
And then there are open-plan offices. Good for budgets and fire codes, not ideal for productivity and privacy. Can cause stress and peer pressure, and makes it easier for diseases (e.g., colds and flu) to spread amongst the workforce. Companies with this type of office should provide noise-cancelling headphones & heavily encourage employees to work from home if sick. Encouraging employees to decorate their workspace can help to reduce stress and increase productivity.
Many offices with multiple meeting rooms end up with a tiny closet-like one. Companies should consider setting that one aside as a quiet workspace that people can book.
On-call employees are particularly susceptible to mental health challenges in the workplace. Interrupted sleep, stress, anxiety, burnout -- all challenges in an on-call rotation. Invest in fixing tech issues before they can trigger on-call notifications. Ensure woken-up folks receive time to catch up on sleep (e.g., don't come until the afternoon) If an employee has an anxiety disorder, see if they can be removed from the on-call rotation. But, because they're humans, and team members, and really want to contribute, make sure to find a way for them to contribute in meaningful ways.
Alcohol in the office or as a cornerstone of "company culture" is problematic. Non-drinkers feel unwelcome. Recovering alcoholics feel threatened. And perhaps there are more functioning alcoholics in tech than we are aware of. With that in mind, companies should always offer nice alternatives - not just a can of pop, but fancy sodas and mixed non-alcoholic drinks. There should be a limit on drinks, as well. And activities other than drinking should be emphasized. Another approach would be to alternate Dry and Non-Dry events. Workplace substance abuse wellness programs have also proven to be effective.
Tech leads should lead by example. No 2am Saturday email bursts. Regular vacations. Be open to talking about your mental health. And don't micromanage.
Nara also presented the ALGEE action plan, in case you notice that someone is struggling in significant ways or they come to you for help:
- Assess for risk of suicide or harm
- Listen nonjudgementally
- Give reassurance and information
- Encourage appropriate professional help
- Encourage self-help and other support strategies
By better supporting marginalized people, embracing neurodiversity, reducing peer pressure on social media, breaking the stigma around talking about our feelings, and dispelling the myth that engineers don't have empathy, we can build a healthier workplace for everyone.
Code Review for Me & You
Steve outlined the two basic types of code review: Formal and Informal.
Formal code review exists as part of a workflow, usually sitting at the point between developing the code and merging it. It can also take the form of Pair Programming or "bug sweeps".
Informal code review is more of a culture of support. That is, you might ask coworkers for gut-checks of your code, or you might do periodic checkins on their work to ensure things are not spinning off the rails.
Code review is useful for catching bugs and architectural issues, but one undervalued component of it is educational. More eyes on different parts of the codebase helps to reduce the bus factor (or lottery factor) within the company. It's also great for developer growth - junior devs can learn from senior devs, and senior devs can learn from juniors as new ideas are brought into the company.
Who should perform code review? EVERYONE!
There are some arguments against code review, such as how it can devolve into a charade if not done in earnest.
"I have reviewed the code and can confirm that it is, in fact, code."
It can also cause a death-by-1000-cuts effect, where every litte bit is nitpicked and the feedback is very personal rather than objective.
Don't get too attached to your code. Allow your coworkers to provide their input and guide you toward what might be a better solution that you hadn't thought of.
When reviewing code, it helps to come at it in a certain order:
- Does it work?
- Does it work well?
- Does it meet our standards?
It doesn't make sense to apply a non-automated coding standards analysis to a review if the solution itself doesn't work, for example.
When performing reviews, be objective, not subjective. Provide resources where necessary to back up statements. And ask questions! If something doesn't make sense to you, say so!
It's important to highlight successes. Celebrate the small victories. Dole out the warm fuzzies, because they are few and far between in adulthood in general. Some examples:
- "Great catch! A lot of people would have missed that condition"
- "Where did you learn about this function? It's super useful!"
- "This documentation is excellent, thank you!"
Steve wrapped up with a ton of advice "From the Trenches":
- Practice atomic commits. Don't be the person who slams a bunch of
stuff into the commit at 5pm with the description "hitting the pub".
- If it's a save point, instead of a logical history of commits, it is very difficult to work with.
- Practice atomic PRs. A PR should be as small & focused as reasonably possible. This enables a shorter feedback loop.
- Leverage GitHub tags to provide context for your PRs. (Presumably this advice also extends to labels.)
- Use "Work In Progress" (WIP) reviews to get extra eyes on code before going too far down the rabbit hole. The whole change might not be ready for review, but the gnarliest bits can benefit from early feedback.
- Static code analysis is useful for detecting errors and overcomplicated bits, but doesn't replace human reviewers - and doesn't improve the bus factor.
- Use
.gitignore
to simplify the repo and avoid committing assets that can be built by the computer. - Use git hooks locally to hook into the lifecycle of git. (Use github actions to remotely hook into the lifecycle of git.)
- Treat inline documentation as code! Nothing is worse than comments
that don't match the code they're supposed to document.
- Do your part! Fight Doc Rot!
- Periodically audit the Whitelist. Various tools have ways of disabling their settings for pieces of code (such as ignoring linter/sniffer rules). Make sure that expectations haven't changed over time, causing the exemptions to become invalid.
Overall, the goal is to arrive at the best possible solution to a problem. Good code review takes time, but in the end it's worth it.
Algorithms in Context
Margaret's talk was aimed at demystifying some of the terminology that folks might hear every now and then. Algorithms can be intimidating, particularly if one doesn't have a background in computer science.
One such term is "Big-O" notation, which is a way to measure the complexity of a method. You start by removing as much of the context as you can, so that you're basically left with only "iterations" left to count.
A method that doesn't loop on a set of inputs, for example, might be expected
to finish in "constant" time, whether that's 1ms or 20 minutes. This would be an
O(1)
method.
A method that looped over an entire data set would be O(n)
, because it has to
operate on n
items. If each iteration had an inner loop, meaning that it
looped over n
items n
times, this would be O(n^2)
.
Some algorithms can be pretty bad (O(n^x)
) or better than one might expect
(O(n log n)
). Worth keeping in mind that the O notation sometimes measures
best-case performance, but most often measures worst-case performance.
One quote that stands out is "Fancy algorithms are slow when n is small, and n is usually small." Until you know that n is frequently going to be big, don't get fancy. Constantly focusing on Big-O can cause people to lean too heavily on fancy algorithms to unnecessarily get better Big-O ratings.
She highly recommends the book "Thinking in Systems" by Donella H Meadows.
The subject then shifted to Data Structures. This was a quick overview of the actual data structures that several other languages require you to implement manually, but PHP handles invisibly for you (unless you need it to be more visible).
Stack: Memory is a stack. You can push onto a stack, you can pop off of a stack, and you can peek into a stack. When running a program, each function claims a part of the stack while it executes; if you have a recursive function, each nesting level is claiming a new portion of the stack. If you run out of memory in your application's stack, you'll get a stack overflow - hence the name of that particular website.
Dictionary: This is an associative array that links string "keys" to in-memory locations for retrieving the data that has been stored. Some usages of PHP's array structure are similar to dictionaries.
Hash Table: Similar to a dictionary, but hash tables use a tree structure to accelerate lookups.
Tree: Sort of like multidimensional arrays. One parent node can contain multiple children (the number of allowed children varies by tree type). Filesystems are a common example of tree structures. When examining a tree algorithmically, maximize your chances of breaking out early (depth first vs breadth first is a common decision that can help with this). If you're tuned for an early exit, the Big-O notation doesn't matter as much.
Binary Tree: Each node has one parent, but a max of two children.
Heap: This is a type of sorted tree. In a "Max Heap", the root/parent value is always bigger than for the children. In a "Min Heap", root/parent is always smaller than the children.
Binary Search Tree: This is a fun one. A heap where the right child node is always greater than the parent node & left child node, and the left child node is always less than the parent node & right child node. Very useful for accelerating certain types of searches.
Digital Trie: Not a typo. trie
is taken from the word retrieval
, and this
is said to be the best data structure for word completion (aka winning at
Scrabble).
Graph: Graph terminology can be confusing, but it can be a powerful structure. Nodes are connected to other nodes by edges, with the goal of moving between the nodes with the fewest constraints in order to find the connections you want to find.
Much other ground was covered, including the joys of PHP arrays. Also, how using
foreach
on an Iterator will conserve memory (basically, like Generators).
Another topic covered was scaling. Every component that's vulnerable to scaling issues needs 2 estimates: conservative and optimistic case. Neither should be pulled out of thin air; they require investigation and data gathering. Conservative is the "we have to serve it forever" cost. Optimistic is the "we can totally fix that before it's an issue to users" case.
Two topics you should research for those pesky interview questions, but also for computer science practice in general, are Merge Sort and Quick Sort. We don't usually have to worry about them in PHP, because the language provides it for us, but it can be helpful to know how to implement the algorithms.
Other useful tidbits:
Side Effects: A side effect is when a function affects data or resources outside of itself. Sometimes necessary, but on the flip side, if a function has no side effects, its result can be cached.
Memoization: Memoization is a way of storing a value in the current process that might be repeatedly referenced & is costly to look up or recompute. This delivers a quick optimization.
Above all, don't let some abstract idea of perfection stand in the way of solving real problems with quick solutions. Don't dive into a Big-O hole that you needn't bother with, unless you're totally confident that there's an issue you need to solve for your millions of users.
Closing Keynote: Building a World Class Developer Organization
Josh's closing keynote the day was all about leadership, goals, and the importance of different types of diversity.
He covered some myths about leadership:
- Your Manager Is Your Leader (FALSE)
- The one who gets the credit is the leader (FALSE)
- There can be only one leader (FALSE)
- You should make your best people managers (FALSE)
- Leadership is easy to learn (FALSE)
- You will learn leadership from this talk (SADLY FALSE)
- There is only one way to lead people (FALSE)
He talked about styles of leadership, from the autocratic to the task-oriented and servant-leadership styles.
It depends upon the position you're in, the people around you, and the situation you're in, in order to determine the best style of leadership.
After years of management advice attempting to grow his weaknesses to complement his strengths, his best manager once said, "I want to play to your strengths and mitigate your weaknesses".
He told a story about how he came up with a creative solution to a big challenge, and how his manager embraced it and worked to get the pieces of the plan in place to make it happen. Real outside-the-box thinking. It all came together and was a big success, thanks to servant-leadership (and a bit of task-oriented leadership for some of the people who helped with the details).
Goals. SMART goals. He offered some advice for that:
- Specific (e.g., "Buy a membership and go to the gym 3 times a week", not "get in shape")
- Measurable (e.g., "Go to the gym 3 times a week")
- Attainable (make sure you have the skills, abilities, and financial capability to achieve the goal)
- Realistic (High goals drive action, low goals enable lethargy)
- Timely (Too soon makes for an impossible deadline. Too far out makes for no urgency.)
Josh also outlined different types of diversity, from the core of "inherent" diversity to the outer layers. What companies need is a diversity of thought, BUT in order to achieve that they first need inherent diversity and a diversity of lived experience. And we're so terrible at achieving that, at least in tech, that we're robbing ourselves of the chance to have diversity of thought.
If you suggest an idea to a group of people who have the same experience as you, such as "straight married white guy with decent eyesight and a passable beard", you might end up getting a green light on that idea with no other options offered. Does that mean it was the best idea? Probably not. It was just the first idea. But if your team can offer 100 different ideas, or even have 100 different life experience perspectives on 1 idea, you're way more likely to arrive at an actual great idea.
When hiring, ask "what are we missing? let's go find what we're missing". And make sure to define your team's north star. If you don't know what you're company's mission is, or you don't know what your team's contribution to that mission is, or you don't know what your contribution to your team is, you need to straighten all that out in order to be effective.
Summary of Day Two
Day Two was packed with great talks. I learned a lot and got to meet lots of new people. Stay tuned for my summary of Day Three, and in the meantime you can check out my road trip & conference photos on Flickr:
Published: September 27, 2019
Tags: conference, news, dev, development, coding, php