[SIGCIS-Members] What algorithms are and are not
thomas.haigh at gmail.com
thomas.haigh at gmail.com
Sat Oct 30 07:15:46 PDT 2021
Thanks Ramesh,
I am with you 100% on the training data point. The hype, at least, is that a
handful of learning algorithms can handle a very wide range of applications.
One term used in machine learning is "model free" which captures in an
extreme for the idea that there is a standard process that works across many
domains by automatically generalizing from training data.
But to answer the question: an algorithm that will not always generate the
same output when fed a given input is still an algorithm. There is a simple
term for this: a "nondeterministic algorithm." i.e. the output is not
determined by the input. The term has been around since 1967 and was
introduced by Robert Floyd. See
https://en.wikipedia.org/wiki/Nondeterministic_algorithm.
Best wishes,
Tom
From: Subramanian, Ramesh Prof. <Ramesh.Subramanian at quinnipiac.edu>
Sent: Saturday, October 30, 2021 9:03 AM
To: thomas.haigh at gmail.com; 'sigcis' <members at sigcis.org>
Subject: Re: [SIGCIS-Members] What algorithms are and are not
Tom,
Thanks for that detailed comment on algorithms - what they are, and what
they are not, and conversely, what can be considered an algorithm, and what
cannot. Your explanations were very useful.
I just wanted to make two additional points. The first is that many machine
learning algorithms these days have a stochastic component. That is, they
make use of randomness while learning, and these are sometimes more
effective than deterministic algorithms. Thus, the exact nature of the
algorithm gets to be a bit fuzzy.
The second point is something that you fleetingly mentioned, but did not
develop further, but is critical. That was your point about the training
data. The law community (at least here at Yale) is increasingly focusing on
the proprietary nature of the datasets that companies use in training their
AI systems, and the legality and ethical issues emanating from these
datasets (such as what private information is kept in the datasets, etc.).
And finally, I wanted to ask this question: If an equation has a stochastic
component in it, can it then be considered as an algorithm?
Best regards,
-Ramesh
_____
From: Members <members-bounces at lists.sigcis.org
<mailto:members-bounces at lists.sigcis.org> > on behalf of
thomas.haigh at gmail.com <mailto:thomas.haigh at gmail.com>
<thomas.haigh at gmail.com <mailto:thomas.haigh at gmail.com> >
Sent: Saturday, October 30, 2021 5:42 AM
To: 'sigcis' <members at sigcis.org <mailto:members at sigcis.org> >
Subject: [SIGCIS-Members] What algorithms are and are not
CAUTION: This email originated from outside of the organization. Do not
follow guidance, click links, or open attachments unless you recognize the
sender and know the content is safe.
Hello SIGCIS,
A fascinating discussion. I'd like to pull a few points together and a few
of my own.
An algorithm is indeed a description of a step-by-step way of accomplishing
something. It differs from program code in not being tied to a specific
computer language, and leaving out implementational detail relevant to
specific computing platforms. It differs from a specification in being about
HOW to do something, not just about WHAT needs to be done.
To answer Kimon's most recent point first. It is entirely reasonable to
suggest that alphabetical ordering may, in Winner's terms, be a technology
with politics. See, for example, Bowker and Star's book Sorting Things Out.
And building that social system into computers magnifies its power and makes
it harder to change. But we should not blame sort algorithms for this. There
are many sort algorithms - early on sorting efficiently from tape was a
hugely important practical problem and the Quicksort algorithm was seen as a
major achievement. But they all produced the same results, differing in
their efficiency with particular kinds of input, memory needs, and so on. In
other words, all the sort algorithms had the same specification and did the
same job, but they did it in different ways.
In this case the potentially oppressive alphabetical order is not in the
algorithm at all. It's built into the computer hardware as part of EBCDIC or
ASCI, or more recently into the OS or a library as part of Unicode. So
pointing the finger at Tony Hoare (creator of Quicksort) for any oppressive
impact of alphabetical order would be misleading. In this case, separating
out the algorithm from the rest of the "blob" helps.
As Alexandre pointed out, when we talk about "the Facebook algorithm" this
is grouping together the collective functioning of what must be hundreds or
thousands of algorithms, implemented as programs, running on hardware and
responding to vast amounts of data and many configurable parameters. So the
issue is ignoring all the infrastructure, executable code, and materiality
behind the system, and also missing the hugely complex interactions of all
those pieces. In a way, it is privileging the history of abstract ideas over
the history of technology.
Another point: in the case of AI systems the problem may not be the
algorithm at all, but the data used to train it. When Google makes a racist
autocomplete or an AI chatbot starts talking like a Nazi the more direct
cause is the racism of Internet users who type queries or post tweets. The
backpropagation algorithm certainly has specific affordances which may make
it irresponsible to use it in such circumstances, but again the situation is
complicated and calling everything "the algorithm" obscures more than it
reveals.
Likewise, the devastating consequences of the system Facebook uses to rank
items in its newsfeed surely has more to do with parameters set by Facebook
managers and data scientists as it does with the algorithms crafted by
programmers. There must be switches set to weight different characteristics,
and the decision to set parameters that weight strong emotional reactions is
likely not in the algorithm itself but in a matrix of weights processed by
the algorithm. Just pragmatically, you wouldn't want to have to throw the
code away and change the algorithm every time you want to try out a
different weighting scheme.
Of course Facebook does not want us to know anything about what is going on
in its black box, so that's deliberate. I loved the point that very early
publication of Google's PageRank algorithm helped to give the misconception
that it continues to rely on a single algorithm.
Another important point is that you can't understand everything an algorithm
will do by reading it. The attempt to predict dynamic behavior from static
code is one of the central concerns of computer science, but in many
situations, particularly when multiple programs are interacting with each
other and with data the results are unpredictable. Facebook surely has huge
data science teams running constant experiments. I do not want to let
programmers off the hook for their complicity in a truly reprehensible
company, but we must also recognize that not all the emergent properties of
code are legible to its creators.
A final point. A couple of years ago I found myself in conversation with a
media scholar who had just written a book about the history of algorithms
and their nasty effects. One of his case studies was what he called "the
Black-Scholes algorithm" (used to price derivatives). I pointed out that
this is a formula, not an algorithm. A mathematical formula is, in the terms
I gave in the first paragraph, a specification. It defines a relationship
between several terms. To calculate the value of one of the terms when you
have the others you need an algorithm: a step-by-step method.
The business of coming up with a good algorithm to do that computation is
not trivial. Apparently simple things like differential equations have given
rise to huge fields devising what applied mathematicians call "methods" but
computer scientists would call "algorithms." (That is particularly relevant
here because Black-Scholes relies on solving a partial differential
equation). Many work well under some circumstances but not others. They will
even produce different answers, because numerical solutions are often
approximate. Implementing those methods is also a complex process,
particularly in code that is portable. The entire field of numerical
analysis exists only because equations are not algorithms. Also the entire
field of computational complexity. Problems of optimization are trivial to
specify but enormously hard to solve. Public key encryption only works
because equations are not algorithms and screening all possible solutions is
infeasible (while verifying a solution is easy).
I was, frankly, a little shocked that somebody who had written a book about
algorithms would seek to erase that vital distinction between algorithms and
equations. His response was that he has his own definition of algorithm and
did not care what computer scientists thought an algorithm was. According to
his definition, Black-Scholes was an algorithm. (I was reminded of Lewis
Carrol: "'When I use a word,' Humpty Dumpty said in rather a scornful tone,
'it means just what I choose it to mean-neither more nor less'). My view is
that algorithms matter hugely and can indeed embed social assumptions, but
that we will never be able to understand how they matter if we call
everything an algorithm. Code, infrastructures, equations, platforms, data,
and infrastructures matter too. If we each invent our own definitions of
what those things are then we will never be able to communicate with each
other, or with the people who have the power to change them.
Best wishes,
Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sigcis.org/pipermail/members-sigcis.org/attachments/20211030/d6e40c35/attachment.htm>
More information about the Members
mailing list