The Portland Beavers don't seem to offer their schedule in any electronic format other than their web page.
I like to have these things in my calendar, so that I know what I'm missing when I'm away. So I scraped their "printable" page, ran it through a little script, and produced a standard iCalendar file. (Yes, it really is a standard (RFC-2445), and it doesn't have anything to do with Apple's iCal.)
You can subscribe to it if you want.
To start with, I cut-n-pasted the text from the "printable" schedule page and saved it in a file.
Then I ran it through this little script:
Like so: cat foo | perl sched.pl > beavers.ics
Then import the calendar into iCal and publish it via my previously setup calendar DAV service.
I did do a little manual fix-up; basically just removing the leading "vs" from the All-Star Game and related activities.
If I get really industrious, I'll repeat the process occasionally to keep the record up-to-date. I might also do the promo schedule as well. But even if I don't do any of that, at least now we'll know date, time, and opponent.
I like to have these things in my calendar, so that I know what I'm missing when I'm away. So I scraped their "printable" page, ran it through a little script, and produced a standard iCalendar file. (Yes, it really is a standard (RFC-2445), and it doesn't have anything to do with Apple's iCal.)
You can subscribe to it if you want.
To start with, I cut-n-pasted the text from the "printable" schedule page and saved it in a file.
Then I ran it through this little script:
#!/usr/bin/perl -n
BEGIN {
print <<'EOF';
BEGIN:VCALENDAR
METHOD:PUBLISH
X-WR-TIMEZONE:US/Pacific
PRODID:-//Apple Inc.//iCal 3.0//EN
CALSCALE:GREGORIAN
X-WR-CALNAME:Portland Beavers
VERSION:2.0
EOF
}
chomp;
next unless /^\d/;
my ($mon, $day, $year, $time, $off, $away, $what, $outcome, $score) = ($_ =~ m#(\d+)/(\d+)/(\d+)\s+(?:(\d+:\d+\s+(A|P)M|TB[AD]))\s+(@?)(.*?)(?:\s+([WL])\s+(\d+-\d+))?\s*$#);
my ($hr, $min) = ($time =~ /(\d+):(\d+)/);
if ($time =~ /^TB/) {
$hr = 12;
$min = 0;
$what .= " [Time $time]";
}
if ($away) {
$what = "@ $what";
} else {
$what = "vs $what";
}
if ($outcome) {
$what .= " ($outcome $score)";
}
if (!defined($mon)) {
print STDERR "Unknown: '$_'\n";
next;
}
$hr += 12 if $off eq 'P' && $hr != 12;
my $start = sprintf("%04d%02d%02dT%02d%02d00", $year, $mon, $day, $hr, $min);
$hr += 2;
if ($hr > 24) {
# This shouldn't happen
$hr -= 24;
$day += 1;
}
$min += 30;
my $end = sprintf("%04d%02d%02dT%02d%02d00", $year, $mon, $day, $hr, $min);
print <<"EOF";
BEGIN:VEVENT
DESCRIPTION:$what
URL;VALUE=URI:www.portlandbeavers.com
DTSTART;TZID=US/Pacific:$start
SUMMARY:$what
DTEND;TZID=US/Pacific:$end
END:VEVENT
EOF
END {
print "END:VCALENDAR\n";
}
Like so: cat foo | perl sched.pl > beavers.ics
Then import the calendar into iCal and publish it via my previously setup calendar DAV service.
I did do a little manual fix-up; basically just removing the leading "vs" from the All-Star Game and related activities.
If I get really industrious, I'll repeat the process occasionally to keep the record up-to-date. I might also do the promo schedule as well. But even if I don't do any of that, at least now we'll know date, time, and opponent.
- Location:work
- Music:Level 42 (album take) - Kaela Kimura

Comments
except for this crazy statement
my ($mon, $day, $year, $time, $off, $away, $what, $outcome, $score) = ($_ =~ m#(\d+)/(\d+)/(\d+)\s+(?:(\d+:\d+\s+(A|P)M|T
my ($hr, $min) = ($time =~ /(\d+):(\d+)/);
And when I'm writing regexps for myself, I don't usually spend any effort to make them especially readable. :) But that's the heart of the whole thing -- the whole effort depends on the data that that one line extracts.
If you (or anyone else) wants, I can break it down.
And a representative line of input:
In Perl, when you use forward slashes to delimit the regexp, the 'm' (which is an operator) is implied. But because I was going to have forward slashes as part of the regexp, I want to use a different delimiter, and thus 'm#' which is "match, delimited by hash symbols" (octothorpes, if you like).
If I'd used the "x" (eXtended legibility) option, I could've written it like this:
m{ (\d+) # Grab the month and put it in $1 / # Match the date delimiter (\d+) # Grab the day and put it in $2 / # Match the date delimiter (\d+) # Grab the year and put it in $3 \s+ # There's some whitespace between the fields; eat it (?: # Now it's time to match the time, which will either be "HH:MM AM" # "HH:MM PM", "TBD" or "TBA". This starts a group whose contents won't # be saved in a backreference. Hey, what do you know? This is totally # useless here. I should've left it out. Anyway... ( # Start a group that will save whatever's matched in $4 \d+:\d+\s+(A|P)M # Match the HH:MM [AP]M, and store the A or P in $5 | # OR, if it's not HH:MM... TB[AD] # Match TB followed by A or D. They use TBD, but I wanted to be prepared # for TBA as well. ) # This is the end of the time group ) # And this is the end of the useless group. \s+ # Once again, whitespace between the fields (@?) # If there's an '@' before the team name, save it in $6. For home games # there's no '@', so $6 will be empty. The '?' means to match 0 or 1 times. (.*?) # Match everything that comes next (".*") but don't be greedy ("?"). IOW, # Don't consume stuff that things after this bit might match. All this # gets saved in $7. (?: # Start another non-save group. This is just to accomodate the '?' a few # lines down, which basically means "it's okay if none of this stuff is here" \s+ # Match whitespace between the team name and the game record ([WL]) # Win or loss? Save it in $8 \s+ # Eat whitespace between the outcome and the score (\d+-\d+) # Match NUM-NUM (the score) and put it in $9. ) # End of the game record group ? # And it's okay if it doesn't match (as it won't for games that haven't been # played yet. \s* # If there's any whitespace left after everything else matches, this will eat it. $ # This anchors to the end of the line }x # And this is the end of the regexp with the "x" modifier.I did the matching and the variable assignment all in one step, but it could've easily been two, as in:
When you have a match that you don't bind explicitly to a variable, use of "$_" is assumed. In general I think use of that reduces readability, but for short scripts (especially ones that run with -n or -p) it's okay.
Edited at 2009-02-12 03:29 am (UTC)