Text Analysis: Getting into Google

Two weeks ago I returned from the concrete jungle of San Francisco, lungs full of West Coast air and patience depleted from the West Coast traffic. Many of my peers asked how I did it? Who did I know? How did I draw up my resume to get through the process to an onsite interview? The answer was, “I don’t know.” It was mostly luck.

I had the opportunity to interview onsite at Google. I killed the interview. Not to brag, but I got my rejection email, and phone call – as if an email wasn’t enough, this past week. In any case, many told me that it was a success to even be considered by the internet giant. Even after I was denied, many asked me if there was a method to how I got in the door. I didn’t have an answer, but it did get me thinking… “What if I pulled Google job postings and ran some analysis to increase a person’s odds of getting an interview?”…

Methodology

Fortunately, thanks to the internet, I was able to find a data-set where someone had already done the scrappy work Google job postings have a standard layout, making it easy to scrape for the necessary information. Niyamat, who I found on Kaggle, used Selenium to scrape the Google Careers page for the job location, category, responsibilities, minimum qualifications, and preferred qualifications for every available job posting available a little under a year ago. I used his data set for the basis of my analysis.

Job Locations

job-map.png

One clear way to increase your probability of reaching success is to consider regions that have a higher proportion of job opportunities. Not surprisingly, when this data was taken, there were 640 job postings for offices in the United States. But, what were the top countries outside of the United States?

Top 5 Countries (Omitting the United States):

  1. Ireland (87)
  2. United Kingdom (62)
  3. Germany (54)
  4. Singapore (41)
  5. China (38)

One technique could be to target regions that are less popular as technology hubs but make up a significant proportion (omitting the US) of Google job opportunities. When you think of Google, you think of Silicon Valley, and so does everyone else. You don’t necessarily think of the areas surrounding the United Kingdom or Singapore. These might be locations to target. Beyond job location, I wanted to look at the teams within Google and their relationships with job postings.

Job Categories

job category bar.png

In addition to understanding where jobs are being posted, it is beneficial to know which teams garner the largest proportion of job postings. Sales and Account Management and Marketing and Communications are in a league of their own, making up ~27% of all job postings. But, what are the teams with the smallest number of job postings?

Toughest Teams to Get Into (based on count of job postings):

  1. Data Center & Network
  2. Technical Writing
  3. IT and Data Management
  4. Developer Relations
  5. Network Engineering

If you look at the midway point of the graph and move right, you’ll notice a larger proportion of the teams are more technical on the job spectrum. Though Google is a technology company, their focus in hiring is around client facing opportunities and business operations. In my own experience, myself and peers I have talked to, targeting jobs that are business facing increases your chance of getting an interview, and consequently a job, at Google, which will allow you to work up the ladder and earn a technical position later. This seems to be backed up by the data.

But, beyond targeting opportunities based on job location and team, how can you increase your chance of getting an interview? One common thing you’ll hear in any job search is to “match” your resume to the job posting. While this is true, text analysis may help us uncover what is more important to Google representatives as a whole (not just by position), allowing you to craft a resume to increase your odds of landing an interview across the organization.

Responsibilities

 

response dash

After running text analysis on the “Responsibilities” section of 1,250 job postings, here are the terms that came up the most. Two things to call out. First, these terms are the stems, used to group similar words (English is a complex language after all). Second, the first number in the tree map is the total number of times that the term was seen (total count). The following number is the number of jobs that this term was seen in (job count).

Collaboration, development, management, and dealing with product appeared in ~50% or more of the job postings. Other notable terms include: support, market, process, design, project, drive, solution, identify, etc. Having experience in these fields is very important. But, even more important for getting your resume seen, including these terms, no matter the job posting (within reason), will increase your odds of having your resume match a job posting and potentially lead to an interview.

In addition to looking at the most frequent terms, I also wanted to see their relationships to each other. Below is a graphical representation of the hubs, authorities, and clustering of the terms and job responsibility descriptions.

This analysis reflects my previous point, that many of the important responsibility terms are shared across job descriptions and have links to many of the other terms – flowing to and from each other regardless of the type of job or description. The second graph is meant to show clusters of terms based on job descriptions. What is interesting is that it cannot be easily discerned that there are different clusters of jobs based on responsibilities to base our analysis. What does this mean? It means that these terms are influential across all types of jobs and their related job postings. This echoes my previous sentiment – include terms with high frequencies across job postings when casting a wider net to increase your odds, since job postings tend to be similar, at a high-level, across the stated responsibilities.

Minimum Qualifications

In addition to responsibilities, that you as an applicant want to match, it is also important to understand what minimum qualifications are required.

min qual dash

While minimum qualifications will differ depending on the team, location, and level of the job, there are important things to look for across most job descriptions. Across ~80% of job postings, having the appropriate practical experience and degree are a minimum requirement. This shouldn’t be news to anyone reading this post. What is novel are the other terms, which you could use to set yourself apart on your resume.

At a minimum, Google HR is looking for people who are efficient communicators, both in writing and in physical conversation. Fluency in a foreign language will also help to set you apart. Understanding in programming, marketing, engineering, and possessing general experience with technology, can help you leap this minimum qualification bar across a significant proportion of job descriptions.

Again, I wanted to look at the relationships of these terms and their associated job descriptions. The first graph shows terms that appear in 5% or more of job postings.

minimum qual .05

Unlike responsibilities, we see two clear clusters emerge: jobs for interns, MBAs, and new graduates, and jobs for more experienced employees. For experienced employees, they are looking for specific skills and experiences: SQL, Java, foreign languages, development, media, strategy, etc. For new graduates, they are looking for basic requirements – are you a student? And, they are looking at your availability – can you start in May or June? While they want experience and skills coupled with these minimum requirements, education and start date are uniquely important to this cohort.

While this is interesting, I wanted to cut down on the number of important terms. Below is a network analysis of terms that appear in 10% of job postings.

Similar to responsibilities, there is only one cluster for minimum qualifications. Understanding these minimum qualifications and understanding how to craft them into your resume will help you to get past the proverbial resume bot.

Preferred Qualifications

The last section I analyzed was “Preferred Qualifications”. Below are the most frequent terms.

pref qual dash

You’ll notice that many of these terms are similar to the minimum qualifications. So, why have two different sections? Well, here they are looking for you to demonstrate your unique abilities and skills. They want you to have project experience. But, interestingly enough, across many of the positions, they want to see that you are knowledgeable, that you know how to handle relationships, that you work well in diverse teams and environments, that you are effective in the work that you do, that you know how to work with data, and, finally, that you know how to use Microsoft Excel. A lot of these are gimmes, but don’t miss out on getting an interview because you didn’t include them in your resume – they are in the “Preferred Qualification” section for a reason.

After analyzing these frequencies, I ran a network analysis. Originally I ran it at the same 5% level as I did for the minimum qualification analysis, but the result was too crowded and not useful for interpretation. So, I ran the analysis on terms that appeared in 10% or more of job postings and some groups actually started to emerge.

Preferred qualifications is where job descriptions start to take on their own character – this is the only place we see numerous clusters. These clusters seem team oriented: solutions, science (presumably data), and design. There also appears to be a preference for master’s students, judging by the green cluster. Even so, many of the terms are shared between clusters and make up a giant cluster of their own indicating what we’ve already uncovered. That is, that many of the job descriptions share similar components in the way they describe their preferred qualifications.

Final Thoughts

So what? Well, first off, your guidance counselor apparently knows what they are talking about. Google does care if you have a degree (they might even prefer a master’s degree). Google does care if you have their minimum qualifications and skills. But, even so, job postings may not be as different as you or I previously believed.

There might be ways to increase your odds of getting your resume through the job submission black hole and into the right hands, across job postings, which may lead to an interview. Here are some final thoughts from this analysis:

  • Target the United States. In addition, look for Google locations that have a high number of offerings but may not be seen as a technology hub.
  • Target business and operation facing teams. They have the lion-share of the job postings and may allow you cast a wider net for opportunities. Avoid technical leaning positions, especially if you are earlier in your job journey.
  • Look for ways to streamline your resume for Google’s needs. They clearly have items they look for, no matter the position or team. Here are some examples of what to include:
    • Responsibilities: Experience with collaboration, development, management, product, support, marketing, processes, design, projects, driving solutions, identifying challenges and solutions, etc.
    • Minimum Qualifications: Efficient communication (written and spoken), fluency in foreign languages, experience with programming, marketing, engineering, technology, etc.
    • Preferred Qualifications: At a high level, should definitely include – knowledge, relationship management, experience in teams and environments, effectiveness, experience with data, and excel expertise. Need to customize to better fit job description, depending on the team.
    • Know Excel: Apparently data-driven tech companies still feel that this needs to be stated…

Last Note: Google revolutionized how we engage with information. Though it is an amazing company to work for, there are many other companies paving the way in Big Data, martech, and digital solutions. I hope you find this analysis helpful and guiding, but it doesn’t replace hard-work, dedication, and passion. If you are on the job-hunt, enjoy the journey! You will be rejected. It happens. But as they say, when one door closes another one opens.

I don’t want to live in a world where someone else is making the world a better place better than we are. Gavin Belson

If you have any questions (about this blog post, the job search, or anything else), feel free to reach out! For those on the job hunt, I highly suggest this book for alleviating unnecessary anxiety and work:

 

Enjoy this post? Take a look at past posts:

World Happiness

Tips on Restaurant Success

Airbnb in the Big Apple

Michael Jordan vs Lebron James

NYC: Where to go for a night out?

Risky Business and Rare Cooked Steaks

Fact or Fiction: NFL Home-field Advantage

Fact or Fiction: NFL Home-field Advantage

Imagine you are in “Death Valley”. It’s a concrete jungle, with Clemson Tigers and orange plastered season ticket holders ready to defend their beloved rock. Now imagine you are a NC State fan, the only one in a sea of South Carolinian hopefuls – your only comrade is a Georgia Tech graduate, who has decided to become part of the “pack” for this one day against #3 Clemson since your tipsy wife offered up her ticket half-haphazardly at a wedding the weekend before…

Like many of my posts, my analysis stems from a question. This time, my Georgia Tech friend got me thinking. He turned to me in the middle of the 3rd quarter, as NC State was down 31 to 0, and exclaimed, “Man it’s loud in here! Do you think there is really such a thing as home-field advantage?”

If there is, I haven’t seen it at NC State… but this did get my brain churning…

Methodology

I decided to analyze data on the NFL, because it is more readily available, stadium sizes are more equivalent in size than college stadiums, and there is more accurate data on weekly attendance numbers. I consulted one of my favorite sites for such data, Pro Football Reference. I pulled weekly performance data, by season for every team, from 2007 to 2017. I was able to obtain the ending score, stadium attendance, and yard and turnover spread for the winning and losing teams – which I coerced into home vs away data.

While I understand that many may debate the value of defensive play as it is benefited by home-field advantage, weekly data was not easily accessible. Since I looked at aggregates of team performance when they played at home and away, as well as their overall performance, I believe that my methodology was a good ol’ duck tape fix over this concern and I felt comfortable moving forward.

The Gridiron

Home teams won 58% of the time from 2007 to 2017.

While not conclusive, discovering this point added some validity to my hunch. Here are how the rest of the variables, points, yards, and turnovers, played out…

Home Team Aggregate Data

This graph looks at the differences between home team performance and away team performance at a macro level – as averages across the NFL. On average, home teams tended to score 12% more points, produce 4% more yards, and commit 5% less turnovers than away teams. While this is a valuable discovery, I soon found that analyzing the NFL as a whole may not be the best approach. So, I started focusing on team performance.

Teamwork Makes the Dream Work

We often hear about the 12th man on a team – the audience and fans that roar throughout the stadium. Here I take a look at the differences between home and away past performance on a key metric – win-percentage.

Team Win Perc As it can be seen, the win percentage of games played at home tends to be higher than the win percentage of games played as a visiting team. Out of all of the teams over this 11 year period, there are only two teams where this does not hold: the Dallas Cowboys and Los Angeles Rams. The LA Rams have a much smaller sample size, due to their move in 2016, causing unsurprising variability from the norm. But, for Dallas, how can you claim to be America’s team when you aren’t even Texas’ team?…

The 3 teams that saw the largest percentage increase in win percentage when playing at home vs away were the Cleveland Browns (82%), Baltimore Ravens (75%), and the Minnesota Vikings (69%). While being bad over a long period of time will effect these numbers, causing home wins to be even more valuable, you have to chalk this up as a small victory if you are a Browns fan. Maybe there is a trophy out there for being “less bad” at home. Even if this is the case, I want to look at a home-field advantage’s effect on our other performance metrics. Below are treemaps of teams and their data. The size and color of the regions indicate the magnitude in the difference between their home and away performance relative to the other teams on that metric. It must be noted that these graphs only reflect teams that did better at home.

Team Points

88% of teams scored more points when the home team, compared to when they were a visiting team, on average. The teams that tended to score more at away games were the: Tampa Bay Buccaneers, Indianapolis Colts, Carolina Panthers, and Los Angeles Rams.

Team To

85% of teams gained more yards when the home team, compared to when they were a visiting team, on average. The teams that tended to have better production at away games were the: Tampa Bay Buccaneers, Indianapolis Colts, Cleveland Browns, and Philadelphia Eagles.

Team Yards

74% of teams committed less turnovers when the home team, compared to when they were a visiting team, on average. The teams that tended to have less turnovers at away games were the: Oakland Raiders, New York Jets, Dallas Cowboys, Tennessee Titans, San Francisco 49ers, Philadelphia Eagles, Kansas City Chiefs, Carolina Panthers, and Los Angeles Rams.

Now, you may be wondering, “Camden, across these 3 metrics, who tended to perform better at home games than away games? I want to have bragging rights as the best fan-base and the true 12th man, even though I know this analysis isn’t causal.” Well let me tell you, based on the mean percentage difference in performance across points scored, yards gained, and turnovers produced.

Top 5 (Home Performance vs Away Performance):

1. Los Angeles Chargers

2. Arizona Cardinals

3. Pittsburgh Steelers

4. Baltimore Ravens

5. Green Bay Packers

Bottom 5 (Home Performance vs Away Performance):

28. Washington Redskins

29. Philadelphia Eagles

30. Kansas City Chiefs

31. Carolina Panthers

32. Los Angeles Rams

The teams playing in Los Angeles only include their data since the move. The Chargers moved in 2017 and the Rams moved in 2016, meaning they have a limited sample size compared to the other teams. What is interesting is that if the Chargers were still in San Diego, they would be ranked 25th on our scale of home performance vs away performance. The Rams would be 5th if still in St. Louis. Have they flipped positions for the foreseeable future? Or is it a case of the Law of Small Numbers? I’ll let you be the judge of that.

Data Squib Kick

After all of this discovery, I desperately hoped there would be a correlation between differences in actual game-day attendance and performance. After hacking the data through function after function to get correlations by metric by attendance by team, I was left with my version of a data squib kick.

Team Corr

It wasn’t pretty and it wasn’t exciting. Interestingly enough, the newer Los Angeles teams are the ones with the highest correlation between attendance and the different metrics – I expect this to normalize over the coming years. While strong correlations do not exist, we can see that the data varies by team – indicating that attendance is more important for certain teams. In addition, attendance seems to be most highly correlated with wins across teams not located in Los Angeles. But, is this important?

Finally

This leads us to the final conundrum,

What came first? The chicken or the egg?

Or, in this case, the size of attendance or a good performance? While I cannot solve this, as of today, I have shown that there may be some truth to the nature of home-field advantage, so attendance numbers probably help to some un-quantifiable degree (we could clone the teams and have them compete with varying attendance numbers as part of an experiment – oh wait, this is happening in LA as we speak).

If you live in Cleveland, Baltimore, or Minnesota, opt for a home game – you are significantly more likely to see a win there. If you are a fan of the Dallas Cowboys or LA Rams save your money – your likelihood of seeing a win is higher for away games. If you are a fan of the Cardinals, the Steelers, the Ravens, or the Packers, you can *claim* that your fan-base makes your team better on game-day *though I cannot be held responsible for this flawed logic…*.  Lastly, the game of football is complex. There are a million opportunities in a game for luck, skill, and even the audience, to influence the outcome of a game. Though not definitive, I would state that there is some truth to the age old adage of home-field advantage. I mean c’mon, look at how Clemson throttled NC State…

Sure, the home-field is an advantage – but so is having a lot of talent. Dan Marino

 

Enjoy this post? Take a look at past posts:

World Happiness

Tips on Restaurant Success

Airbnb in the Big Apple

Michael Jordan vs Lebron James

NYC: Where to go for a night out?

Risky Business and Rare Cooked Steaks

NYC: Where to go for a night out?

There were 91,199 noise complaints filed by police in New York City at establishments categorized as a bar, club, or restaurant in 2016.

Some friends of mine had just come back from a road trip to New York City. Though I am sure they used my Airbnb post to efficiently find housing, they debated whether they had made the most of their short fall break and the nightlife that the city offers. None of them had been before and with so many offerings it is hard to chisel down where to go.

This left me thinking – I don’t drink or party, but could I help my friends find their 5 o’ clock somewhere?

Methodology

Hmm… In a well populated area with a well funded police force, could I use noise complaints as a proxy of party magnitude?

Thanks to the the New York City Open Data Portal, I was able to pull such data for 2016 and filter on locations deemed as bars, clubs, or restaurants. As stated before, this subset had 91,199 noise complaints and 2,456 locations. Upon diving in to find the perfect partier’s paradise, I discovered that my party going friends may also want to know which subway line or station could get them to the best locations. Better yet, which stations and lines should they target to get the best that the night has to offer…

This task complicated my analysis and made this my most difficult post yet. Again, thanks to the New York City Open Data Portal, I was able to map the 1,868 subway stops in the city and their information to every bar with a noise complaint, calculate the distance between every pairing, and move forward with my analysis. As an added bonus, I decided to map all the data I had. For simplicity, I refer to restaurants, clubs, and bars in the data moving forward as bars. But, before the maps, let’s look at some fast facts.

Fast Facts

1) Pick your borough wisely…

borough

The height of the bars represents the average number of noise complaints by bar by year for each borough. The width indicates the number of bars, relative to the other boroughs. It’s obvious that in terms of quantity and quality of bars, restaurants, and clubs for a night out a person should target Manhattan, Brooklyn, or Queens. You can probably skip the Bronx…

2) The subway line matters…

line plot

I predicted the number of  noise complaints for a subway line based on the number of bars that are closest to stations on that line to determine which stations over-performed. That is, which had more noise complaints than they should. The labels to the left show which lines you should ride to maximize the craziness of your night based on the number of bars and past noise complaints. A good rule of thumb is to stick to the avenues: 8, 6, and 4 Avenue subway lines. They each had a higher proportion of bars mapped to stations on their lines, as well a higher number of noise complaints. If you are looking to avoid the decision making, hop on Nostrand. They have ~20% of the bars that Avenue 8 has but like to party just as much, if not more, judging by the distance from the line.

3) Or just target the “party” stations directly…

Top 5 Stations (Based on Number of Bars):

  1. 96: Bedford Av
  2. 91: 2nd Av
  3. 70: 86th St
  4. 65: 95th St
  5. 62: 1st Av

Top 5 Stations (Based on Number of Complaints):

  1. 3,875: 2nd Av
  2. 2,975: Bedford Av
  3. 2,634: 86th St
  4. 2,302: 95th St
  5. 2,294: Dyckman St-200th St

If you don’t want to target boroughs or lines, then maybe just go for specific stations. 2nd Av, Bedrod Av, 86th St, and 95th St might be your best bets. If you are looking for a rowdy time, especially compared to number of bars, then checkout Dyckman St-200th St station. If you are looking for the variety of a top 5 station without the commotion, then 1st Av is for you.

Now that we have created some baselines for you to plan your night, it is time for the creme de la creme – actual maps to base your decisions on.

Maps

These maps can assist you in planning your time wisely in the busy city. While these are static, if you click the images you can use the interactive maps.

Color depicts the subway line. Size indicates either the number of noise complaints or number of bars (if looking at the subway maps). For those of you that like to stay out late and get a little hangry, I’ve even taken the liberty of mapping which stations have vending machines (simply scroll over the points in the interactive map after clicking on the static images).

Bar MapBar Data

Subway Map Dashboardstationdash

***I attempted to embed these in the post, but WordPress declined since I am a free user…

Final Thoughts

Notice how busy the maps are?

There is something in the New York air that makes sleep useless.  Simone de Beauvoir

I am a man of the people. If you are looking for a sleepless night in the city, then this post is for you. If this analysis can help your friends, like I hope it will help mine, then feel free to pass it along and bookmark the map data for your next trip to “The City That Never Sleeps”!