Monthly Archives: November 2016

Creating a searchable and sortable list of draft-eligible CHL players using Python and AngularJS

Looking at past NHL Entry Drafts it can be found that a large number of players selected in this event come from the Canadian Hockey League (CHL), namely its three constituents: the Quebec Major Junior Hockey League (QMJHL), the Ontario Hockey League (OHL), and the Western Hockey League (WHL). A quick analysis of the last six drafts shows that a portion of more than 46% of the selected individuals had at that time been playing in on of the three major junior leagues. Even though this percentage used to be higher as other areas and leagues have become and are becoming more important providers of NHL talent, it is safe to assume that this situation won’t change in the foreseeable future.

For the devoted fan of an NHL team this is one additional reason to follow the CHL more closely. Something that – if you’re not able to attend games personally – has become more feasible with the online dissemination of game scores and player statistics. Yet while it is possible to regularly visit each league’s website to retrieve this information, I have found it unexpectedly hard to keep track of which players have already been drafted by an NHL team (as it is very common that these return to junior for one or more seasons) or are still too young to be selected in the upcoming draft. As I am not aware of any list that compiles all candidates that are eligible for the upcoming NHL Entry Draft I have decided to create such a list by myself. The result is already available, however I will try to outline my work to achieve it in the lines below.

Technologically the process introduced consists of two individual operations:

  1. On the back end we have to retrieve the data from the aforementioned websites according to a number of criteria and finally create a suitable compilation containing all desired players and accompanying information. In the example presented here we have a Python script implementation providing the collected data in a JSON file.
  2. On the front end we want to provide means for searching and sorting the compiled data presented on a corresponding website of our own. This is done using the AngularJS framework which enhances regular HTML for dynamic content display.

Back end data retrieval

Let’s start by having a look at the back end. The general workflow for the data retrieval is made up of three working steps. First we are going to retrieve a list of all teams playing in each of the concerned leagues. For each team we will then fetch roster information, i.e. all players associated with the given team. By doing so we are going to register basic information about each player, i.e. height, weight or position but also age and NHL draft status allowing for the sole selection of draft-eligible individuals. In a last step we will then retrieve up-to-date player statistics to be finally represented in the compiled list which itself will a made available as a JSON file.

The complete process is implemented in a Python script that has been made available in the Portolan GitHub repository. Here we are going to shed light on a few selected aspects of it.

To temporarily hold information I have learned to appreciate named tuples as they have been introduced in Python’s collection module with version 2.6. If you don’t need the flexibility and mutability of real objects but still want to have your data well structured and easily accessible, named tuples should be your first choice. Following are the definitions that have been made to keep information about teams, players and player statistics:

from collections import namedtuple

# definition of named tuples to hold some data
# team information
Team = namedtuple('Team', 'id name city code')
# player information
Player = namedtuple('Player', 'id first_name last_name team league 
                               dob draft_day_age is_overager position
                               height weight shoots url')
# single season player statistics
Statline = namedtuple('Statline', 'id season
                                   games_played assists points plus_minus penalty_minutes
                                   power_play_goals power_play_assists power_play_points
                                   short_handed_goals short_handed_assists short_handed_points
                                   shots shooting_percentage points_per_game')

The main criterion to differentiate between players that are draft-eligible and those that are not is age. The exact rule is laid out in the NHL’s Hockey Operation Guidelines, for the upcoming draft it boils down to the following concrete dates (note that the cutoff date does not correspond with the draft date itself hence the separate definition):

from dateutil.parser import parse

# defining dates
# lower date of birth for draft-eligible players, older players do not need to be drafted
LOWER_CUTOFF_DOB = parse("Jan 1, 1997").date()
# regular cutoff date of birth for draft-eligible players, younger ones weren't draft-eligible in the previous draft
REGULAR_CUTOFF_DOB = parse("Sep 15, 1998").date()
# upper cutoff date of birth for draft-eligible players, younger ones are only draft-eligible in the next draft
UPPER_CUTOFF_DOB = parse("Sep 15, 1999").date()
# date of the upcoming draft
DRAFT_DATE = parse("Jun 23, 2017").date()

(Obviously, I am a friend of the dateutil module and you should be, too.)

As with a lot of current websites, the ones of the three leagues in question don’t have their data presently available in regular HTML directives anymore but in associated data streams usually formatted in JSON notation. In our case this holds true for team overviews, roster summaries and even team player statistics. Whilst it is somewhat awkward to find these links in the first place, it is actually quite awesome for scraping as the data is already well-structured and therefore easily accessible. Hence we’re defining look up dictionaries for each league and dataset type (see source code for actual values):

# league-specific template urls for team overview pages
TEAM_OVERVIEW_URLS = {
    'QMJHL': "http://cluster.leaguestat.com/feed/...",
    'OHL': "...",
    'WHL': "...",
}

# league-specific template urls for team roster pages
TEAM_ROSTER_URLS = {
    'QMJHL': "...",
    'OHL': "...",
    'WHL': "...",
}

# league-specific template urls for team statistic pages
TEAM_STATS_URLS = {
    'QMJHL': "...",
    'OHL': "...",
    'WHL': "...",
}

The retrieval itself is actually quite straightforward and follows the workflow outlined above:

if __name__ == '__main__':

    tgt_path = r"junior.json"

    # setting up result containers for rosters and player stats
    full_rosters = dict()
    full_stats = dict()
    
    # doing the following for each league
    for league in ['QMJHL', 'OHL', 'WHL']:
        # retrieving teams in current league
        teams = retrieve_teams(league)
        for team in teams.values()[:]:
            # retrieving roster for current team
            roster = retrieve_roster(team, league)
            # updating container for all rosters
            full_rosters.update(roster)
            # retrieving player statistics for current team
            stats = retrieve_stats(team, league, roster)
            # updating container for all player statistics
            full_stats.update(stats)

    # dumping rosters and stats to JSON file
    dump_to_json_file(tgt_path, full_roster, full_stats)

For implementation of the used functions again see the actual source code over at GitHub.

Finally we have a JSON file with all draft-eligible skaters from the major junior leagues looking like this:

[
..., 
  {
    "assists": 1, 
    "dob": "1999-01-05", 
    "draft_day_age": 18.174, 
    "first_name": "Cole", 
    "games_played": 16, 
    "goals": 1, 
    "height": 6.02, 
    "id": "14853", 
    "is_overager": false, 
    "last_name": "Rafuse", 
    "league": "QMJHL", 
    "penalty_minutes": 2, 
    "plus_minus": 2, 
    "points": 2, 
    "points_per_game": 0.13, 
    "position": "LW", 
    "power_play_assists": 0, 
    "power_play_goals": 0, 
    "power_play_points": 0, 
    "season": "", 
    "shooting_percentage": 11.1, 
    "shoots": "L", 
    "short_handed_assists": 0, 
    "short_handed_goals": 0, 
    "short_handed_points": 0, 
    "shots": 9, 
    "team": [
      2, 
      "Acadie-Bathurst Titan", 
      "Acadie-Bathurst", 
      "Bat"
    ], 
    "url": "http://theqmjhl.ca/players/14853", 
    "weight": "205"
  }, 
...
]

Please note the selected lines that show the actual age of the player on draft day and a boolean variable indicating whether the current player is considered an overager, i.e. could have already been drafted in the previous draft.

Front end data display

The collected data can now be displayed in tabular form. Whilst using regular HTML is perfectly viable to achieve this task, the user can easily be enabled to search, filter and sort the data comfortably by utilizing AngularJS, a JavaScript framework that extends traditional HTML to allow for dynamic information display. Angular builds on the model-view-controller architecture – and it’s not my business to introduce here what has been explained much better somewhere else (for example at the Chrome Developer Page). An important feature of Angular are directives, basically additional HTML attributes that extend the behavior of the corresponding tag. Theses directives usually can be easily recognized as they are starting with the prefix ‘ng-‘. Always striving to create valid HTML I will further add ‘data-‘ to the directive as described as best practice in the AngularJS docs. Otherwise being fairly new to Angular, I have based my work on an example presented at scotch.io.

The solution I have come up with consists of three parts:

  1. An HTML page outlining the basic layout of our page (junior.html).
  2. A JavaScript file containing the logic of our solution (junior.js).
  3. The actual data – this is the JSON file that has been produced by the back end (junior.json).

A very basic version of junior.js could look like the following snippet. We just create an application called showSortJuniorApp and define the main controller. Within this controller there is just one function. It reads the JSON data file and saves its contents within the scope.

angular.module('showSortJuniorApp', [])

.controller('mainController', function($scope, $http) {

  // loading stats from external json file
  $http.get('junior.json').then(function(res) {
      $scope.last_modified = res.data[0]['last_modified'];
      $scope.stats = res.data.slice(1);
  });

});

Now let’s have a look at a basic version of the accompanying junior.html. After importing CSS definitions from Bootstrap and the AngularJS source from Google, we finally include our very own JavaScript file. In the body part (among a few other things) a container element is linked with the app we created above (using the ng-app and ng-controller directives) and a table is defined and populated via the ng-repeat directive. The latter basically provides a loop over the array we loaded in our junior.js and creates a new table row for each element.

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="utf-8" />
    <title>Draft-eligible players from QMJHL, OHL and WHL: Summary</title>

    <!-- css -->
    <link rel="stylesheet" href="http://maxcdn.bootstrapcdn.com/bootswatch/3.2.0/spacelab/bootstrap.min.css">
    <style>
        body { padding-top: 40px; }
    </style>

    <!-- javascript -->
    <script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/angularjs/1.5.8/angular.min.js"></script>
    <script type="text/javascript" src="junior.js"></script>

</head>
<body>
<div class="container" data-ng-app="showSortJuniorApp" data-ng-controller="mainController">
  
  <h1>Skaters from Major Junior Eligible for the 2017 NHL Entry Draft</h1>

  <hr />

  <div class="alert alert-info">
    <p>The following sortable and searchable table contains all skaters from QMJHL, OHL and WHL that are eligible for the 2017 NHL Entry Draft. For a description of the workflow and a detailed explanation of the methodology refer to this <b><a href="http://portolan.leaffan.net/creating-a-searchable-and-sortable-list-of-draft-eligible-chl-players-using-python-and-angularjs/">post</a></b> on the <b><a href="http://portolan.leaffan.net/">Portolan Blog</a></b>.</p>
    <p><b>Last modified:</b> {{ last_modified }}</p>
  </div>

  <table class="table table-bordered table-striped">
     <thead>
         <tr>
             <td>Name</td>
             <td>Team</td>
             <td>Draft Day Age</td>
             <td>GP</td> 
             <td>G</td> 
             <td>A</td> 
             <td>Pts.</td> 
             <td>SH</td> 
             <td>S%</td> 
             <td>P/G</td> 
         </tr>
     </thead>
     <tbody>
         <tr data-ng-repeat="stat in stats">
             <td class="col-md-2"><a data-ng-href="{{ stat.url }}">{{ stat.first_name }} {{ stat.last_name }}</a></td>
             <td class="col-md-2">{{ stat.team[2] }}</td>
             <td class="col-md-1">{{ stat.draft_day_age.toFixed(3) }}</td>
             <td class="col-md-1">{{ stat.games_played }}</td>
             <td class="col-md-1">{{ stat.goals }}</td>
             <td class="col-md-1">{{ stat.assists }}</td>
             <td class="col-md-1">{{ stat.points }}</td>
             <td class="col-md-1">{{ stat.shots }}</td>
             <td class="col-md-1">{{ stat.shooting_percentage.toFixed(1) }}</td>
             <td class="col-md-1">{{ stat.points_per_game.toFixed(2) }}</td>
         </tr>
     </tbody>
</table>

</div>
</body>
</html>

Now how to allow for sortable columns? This can be achieved quite easy. First we define a default sort criterion and order in junior.js:

$scope.statsSortCriterion = 'points'; // default sort criterion
$scope.statsSortDescending = true;    // descending as default sort order

We may then modify the ng-repeat directive in junior.html to make the whole table sort by points in descending order as the default configuration:

<tr data-ng-repeat="stat in stats | orderBy:statsSortCriterion:statsSortDescending">

To create clickable column headings allowing for varying sort criteria an according HTML tag and the ng-click directive have to be added to each header cell of the table in junior.html:

<td>
     <a href="#" data-ng-click="statsSortCriterion = 'draft_day_age'; statsSortDescending = !statsSortDescending">Draft Day Age 
     </a>
</td>
<td>
     <a href="#" data-ng-click="statsSortCriterion = 'games_played'; statsSortDescending = true">GP 
     </a>
</td>

Here we set a descending sort order on the games played column. However to allow for sorting in both directions we can set the variable to take on its negated value. See the example for draft day age above. This configuration will change the sort order every time we click on the column heading.

Finally we would like a search function allowing for the filtering of last names and a checkbox to hide overagers. To do so we first have to add a suitable form to the HTML:

<form>
    <div class="form-group">
        <div class="input-group">
            <div class="input-group-addon"><i class="fa fa-search"></i></div>
            <input type="text" class="form-control" placeholder="Filter by name" data-ng-model="nameFilter" />
        </div>
        <div class="checkbox">
            <label>
                <input type="checkbox" id="a" data-ng-model="hideOveragers" value="Hide overage players" />Hide overage players
            </label>
        </div>
    </div>
</form>

After adding some variables and a short filter function to junior.js…

$scope.nameFilter = '';               // empty name filter
$scope.hideOveragers = false;         // per default overagers are shown

// hiding overagers if corresponding checkbox is checked
$scope.overageFilterFunc = function(a) {
    if ($scope.hideOveragers && a.is_overager) {
      return false;
    } else {
      return true;
    }
};

… we can complete the ng-repeat directive for our tabular data in the following manner:

<tr data-ng-repeat="stat in stats | orderBy:statsSortCriterion:statsSortDescending | filter:nameFilter | filter:overageFilterFunc">

After a few more modifications to the HTML and the JavaScript code, the final version of our front end data display also includes the ability to switch table contents between basic statistics (as used above), player information (such as height, weight, etc.) and additional information (i.e. special team stats). You may refer to the GitHub repository to review the most recent version of this solution.