Saturday, February 25, 2017

Introducing Computer Science via Online Security: An Experience Report

Last weekend I spent two hours teaching an informal introduction to online security to an audience of political activists. I wound up teaching a fair bit of computer science in the process and I'm writing up this experience report because I think it's a valuable way to teach introductory computer science.

Before I put together my lesson plan I spent a fair bit of time looking at other people's introductions. Broadly, they fell into two categories:
1. Introductions for CS students, which would include things like how to write your own HTTPS server or proofs about why RSA works (too advanced for my audience)
2. Instructions for what software you should download to stay secure.

I'm a member of the political organization from which my audience came. People regularly post articles which fall into category 2 on the online community for the group. And not unsurprisingly, these articles have had limited effects on getting people to change their behaviour. This was why I'd volunteered to teach the workshop. I'd initially planned it to be all about the software to install to stay safe.

As I put together my lesson plan I had a change of idea for the goal of the workshop. In my experience teaching introductory programming, students struggle for the first few weeks because they don't understand why they should be learning this or what it gets them. I started to think something similar might be going on here: a typical article telling you to install Signal and HTTPS Everywhere doesn't sufficiently motivate why it's necessary and what's going on technically.

Computer scientists like myself think of the internet in a very different way than my activist friends. My activist friends see the internet as a mystical black box.

My learning goal for the workshop hence became: to demystify the internet.

What I taught

I gave some homework to my "class": to watch this series of videos from on how the internet works. I'd spent some time on youtube watching videos on how the internet works and conclude those were the best out there.

The videos are quite lovely and well-produced. They, however, do something I don't like: they talk about data as being mystical 1s and 0s. So I started the workshop with demystifying how data is stored.

Files and Encodings

I went over character encoding. We talked about ASCII, unicode, and encodings for languages other than English. I talked about how this entire setup was America-centric, and the pains that non-English writers have had as a result.

From there we talked about other file encodings. I walked through an extremely simplified bmp encoding. We then talked about compression and encodings like jpg. I hadn't expected to bring compression into the mix but it came up in the questions.

I then asked the group, "so what is a file?" I got the same blank looks I get from my first year computer science students when I introduce file I/O. Most computer scientists tend not to realize how much difficulty novices have with the concept of a file.

In our group discussion about files I wound up explaining what virtual memory is and some basics about file systems. This was another piece of computer science that came up through class discussion that I hadn't expected would come up (but was excited to see!)

We then talked about metadata and, from there, how much information you can get from somebody's metadata.


I then shifted gears to talk about "suppose we want to share a file". From the videos my audience already had seen the notion of a packet. We talked about how a file (and any other information) will be broken into packets to be sent over a network.

I then talked about pre-internet networks. I talked about hubs and routers and in retrospect I should have left out hubs. I think hubs added confusion.

We then walked through an example of how UDP works.


Then I started talking about internetworking, and how the internet is a network of networks. I explained what a LAN is, then a WAN. Everybody had heard of an ISP before but was kind of fuzzy on what they do.

The videos didn't go into what ISPs do or how data is shared between ISP. I talked about IXPs, the internet backbone, and the landing stations for intercontinental cable/fibre lines --- and how those are common targets for government eavesdropping (see: Diebert's "Black Code").

In retrospect I wish I'd spent even more time on that part, and talked about tiers of ISPs, as well as net neutrality. I wish I'd also shown a couple examples of traceroutes and how data sent from a computer in Toronto to a computer in Vancouver will most likely go through the USA, which undermines legislation trying to keep sensitive data on Canadian servers.

Once we'd covered how the internet is structured, we talked about TCP then IP. We talked about IP. Again we returned to how the internet has been structured in an America-centric fashion: how IPv4 addresses were allocated.

Were I to do this part again I'd spend more time on it and talk about RIRs and how they're governed.

From IP addresses we talked about DNS. Again, more American neocolonialism was discussed with how TLDs were setup. We talked ICANN. My audience was fascinated by learning about ICANN and similar governance bodies and we would up on a tangent about how FOSS works and how to get involved in FOSS projects.

We then talked about HTTP, and the protocol stack. We talked about some other applications such as SMTP, IMAP, XMPP, etc.

I talked about ports and sockets and regretted it because I don't think I did a good job of it. I don't think it added much to their understanding either.

At this point we'd been going for a bit over an hour and I figured this was a good place to stop and see if they had any questions about how networks work. One participant made an observation that the internet doesn't seem to have been designed to be secure (yep) and we talked about this in more detail. Another participant asked about VPNs so we talked about those, but probably not in a satisfactory level of detail. I mentioned TOR in this discussion but didn't do a very good job of explaining how TOR works --- were I to do this again I would spend more time there.


After all this network talk, I shifted gears to talk about cryptography. I went over symmetric key encryption. As I went through it I wish I'd actually done this before talking about encoding, because there was confusion about whether the text or the encoding is what encrypted/decrypted.

I talked about how the key is often the weakest link in symmetric key encryption and then started talking about Whitfield-Diffie. I gave a high level overview of asymmetric key encryption. At this this point I was running kind of behind where I expected to be so I rushed this, which was a shame. There was a fair bit of confusion about public vs. private keys, which is fairly confusing for novices (especially if you aren't shown the underlying mathematics.)

I talked about why asymmetric key encryption was necessary for the internet to work as we know it. Had I more time I would have loved to get into talking about P ?= NP.

Secure Networking

We then got back to networking. I talked about SSL and HTTPS, and what it means when something is end-to-end-encrypted. I did not talk about certificate authorities due to time constraints but I wish I had.

I then gave them this link to tools for security, and mentioned a few of my favourites. I explained that security is a process, not an end-result, and one of my participants asked, "so how do we keep up to date on what's secure?" and I still wish I had a good response for him. Most ways I keep up to date on these things are written for a tech-savvy audience.

Finally we talked about human factors in security. This xkcd came up. We talked about the DNC email leaks.

We then wrapped up. People told me I'd done a lot to demystify the internet for them. Heartingly, a bunch of people at the seminar have since installed many of the tools I told them about.


One thing I really liked about teaching this workshop was how much the students could talk about what's going on. When I introduce CS via programming, it's much harder to teach it in a student-directed fashion because the students have very little idea where to go next. With "how does the internet work?" my students had so many questions.

I'd gone into the workshop with a lesson plan but then wound up covering things in different order because a somebody would ask a question and we'd go that direction. It was quite exciting for me to teach CS this way.

Another nice thing about introducing CS via the internet vs. via programming is that this way we show the history of CS. CS is shown as a human endeavour that builds upon itself. You don't really get to show this in the process of teaching programming to novices.

How I'd Teach It Again

If I were to do the workshop again, I'd take four hours (two felt too rushed), with some breaks in there. I'd order it as:
- what is an algorithm?
- symmetric key crypto
- files and encoding
- computer networking
- internetworking
- asymmetric key crypto
- how to keep safe

But better yet I'd love to teach this as a 12-week university course. There was so much in there that could be used to introduce computer science and garner interest from new communities. This course would complement any intro-to-programming class and they could be taken at the same time.

I've written up what I'd cover in the 12 weeks here.

I think a student taking such a course would walk out with a better sense of what CS is about than if they'd taken an introductory programming class. Certainly programming is a useful skill that many people benefit from learning (not just CS people), but many people walk out of their first CS class with the misconception that CS = programming.

The material in this course is useful to a broad segment of society. Everybody uses the internet, but few people understand how it works. With internet security playing an increasing role in politics, this knowledge has become even more important in a democratic society.

No comments:

Post a Comment