[SIGCIS-Members] What algorithms are and are not

Sat Oct 30 07:02:54 PDT 2021

Tom,
Thanks for that detailed comment on algorithms - what they are, and what they are not, and conversely, what can be considered an algorithm, and what cannot. Your explanations were very useful.

I just wanted to make two additional points. The first is that many machine learning algorithms these days have a stochastic component. That is, they make use of randomness while learning, and these are sometimes more effective than deterministic algorithms. Thus, the exact nature of the algorithm gets to be a bit fuzzy.

The second point is something that you fleetingly mentioned, but did not develop further, but is critical. That was your point about the training data. The law community (at least here at Yale) is increasingly focusing on the proprietary nature of the datasets that companies use in training their AI systems, and the legality and ethical issues emanating from these datasets (such as what private information is kept in the datasets, etc.).

And finally, I wanted to ask this question: If an equation has a stochastic component in it, can it then be considered as an algorithm?

Best regards,
-Ramesh

________________________________
From: Members <members-bounces at lists.sigcis.org> on behalf of thomas.haigh at gmail.com <thomas.haigh at gmail.com>
Sent: Saturday, October 30, 2021 5:42 AM
To: 'sigcis' <members at sigcis.org>
Subject: [SIGCIS-Members] What algorithms are and are not

CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe.

Hello SIGCIS,

A fascinating discussion. I’d like to pull a few points together and a few of my own.

An algorithm is indeed a description of a step-by-step way of accomplishing something. It differs from program code in not being tied to a specific computer language, and leaving out implementational detail relevant to specific computing platforms. It differs from a specification in being about HOW to do something, not just about WHAT needs to be done.

To answer Kimon’s most recent point first. It is entirely reasonable to suggest that alphabetical ordering may, in Winner’s terms, be a technology with politics. See, for example, Bowker and Star’s book Sorting Things Out. And building that social system into computers magnifies its power and makes it harder to change. But we should not blame sort algorithms for this. There are many sort algorithms – early on sorting efficiently from tape was a hugely important practical problem and the Quicksort algorithm was seen as a major achievement. But they all produced the same results, differing in their efficiency with particular kinds of input, memory needs, and so on. In other words, all the sort algorithms had the same specification and did the same job, but they did it in different ways.

In this case the potentially oppressive alphabetical order is not in the algorithm at all. It’s built into the computer hardware as part of EBCDIC or ASCI, or more recently into the OS or a library as part of Unicode. So pointing the finger at Tony Hoare (creator of Quicksort) for any oppressive impact of alphabetical order would be misleading. In this case, separating out the algorithm from the rest of the “blob” helps.

As Alexandre pointed out, when we talk about “the Facebook algorithm” this is grouping together the collective functioning of what must be hundreds or thousands of algorithms, implemented as programs, running on hardware and responding to vast amounts of data and many configurable parameters. So the issue is ignoring all the infrastructure, executable code, and materiality behind the system, and also missing the hugely complex interactions of all those pieces. In a way, it is privileging the history of abstract ideas over the history of technology.

Another point: in the case of AI systems the problem may not be the algorithm at all, but the data used to train it. When Google makes a racist autocomplete or an AI chatbot starts talking like a Nazi the more direct cause is the racism of Internet users who type queries or post tweets. The backpropagation algorithm certainly has specific affordances which may make it irresponsible to use it in such circumstances, but again the situation is complicated and calling everything “the algorithm” obscures more than it reveals.

Likewise, the devastating consequences of the system Facebook uses to rank items in its newsfeed surely has more to do with parameters set by Facebook managers and data scientists as it does with the algorithms crafted by programmers. There must be switches set to weight different characteristics, and the decision to set parameters that weight strong emotional reactions is likely not in the algorithm itself but in a matrix of weights processed by the algorithm. Just pragmatically, you wouldn’t want to have to throw the code away and change the algorithm every time you want to try out a different weighting scheme.

Of course Facebook does not want us to know anything about what is going on in its black box, so that’s deliberate. I loved the point that very early publication of Google’s PageRank algorithm helped to give the misconception that it continues to rely on a single algorithm.

Another important point is that you can’t understand everything an algorithm will do by reading it. The attempt to predict dynamic behavior from static code is one of the central concerns of computer science, but in many situations, particularly when multiple programs are interacting with each other and with data the results are unpredictable. Facebook surely has huge data science teams running constant experiments. I do not want to let programmers off the hook for their complicity in a truly reprehensible company, but we must also recognize that not all the emergent properties of code are legible to its creators.

A final point. A couple of years ago I found myself in conversation with a media scholar who had just written a book about the history of algorithms and their nasty effects. One of his case studies was what he called “the Black-Scholes algorithm” (used to price derivatives). I pointed out that this is a formula, not an algorithm. A mathematical formula is, in the terms I gave in the first paragraph, a specification. It defines a relationship between several terms. To calculate the value of one of the terms when you have the others you need an algorithm: a step-by-step method.

The business of coming up with a good algorithm to do that computation is not trivial. Apparently simple things like differential equations have given rise to huge fields devising what applied mathematicians call “methods” but computer scientists would call “algorithms.” (That is particularly relevant here because Black-Scholes relies on solving a partial differential equation). Many work well under some circumstances but not others. They will even produce different answers, because numerical solutions are often approximate. Implementing those methods is also a complex process, particularly in code that is portable. The entire field of numerical analysis exists only because equations are not algorithms. Also the entire field of computational complexity. Problems of optimization are trivial to specify but enormously hard to solve. Public key encryption only works because equations are not algorithms and screening all possible solutions is infeasible (while verifying a solution is easy).

I was, frankly, a little shocked that somebody who had written a book about algorithms would seek to erase that vital distinction between algorithms and equations. His response was that he has his own definition of algorithm and did not care what computer scientists thought an algorithm was. According to his definition, Black-Scholes was an algorithm. (I was reminded of Lewis Carrol: “’When I use a word,’ Humpty Dumpty said in rather a scornful tone, ‘it means just what I choose it to mean—neither more nor less’). My view is that algorithms matter hugely and can indeed embed social assumptions, but that we will never be able to understand how they matter if we call everything an algorithm. Code, infrastructures, equations, platforms, data, and infrastructures matter too. If we each invent our own definitions of what those things are then we will never be able to communicate with each other, or with the people who have the power to change them.

Best wishes,

Tom

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sigcis.org/pipermail/members-sigcis.org/attachments/20211030/b775bee9/attachment.htm>