Mandela’s Library of Alexandria

Man working with Internet-in-a-Box home page

Internet-in-a-Box learning content examples

Quality Content

Internet-in-a-Box shows you the latest Content Packs
installable in the languages your community needs (from online
libraries like
Kiwix,
OER2Go,
Archive.org)
then takes care of all the downloading details for you!

See

Mexico’s live demo

and our

medical examples

used by clinics in Asia and Africa especially, as hosted by
Wikipedia.

Schools can also choose among

almost 40 powerful apps

for teachers and students — optionally with a complete LMS
(learning management system) like Kolibri, Moodle, Nextcloud,
Sugarizer or WordPress.

Two Haitian schoolgirls working on a laptop

Friendly Community

Internet-in-a-Box is a

community product

enabled by professional volunteers working

side-by-side

with schools, clinics and libraries around the world — and the

Wikipedia community

especially.

Thank you everyone for humbly being part of this

OFF.NETWORK

grassroots learning

movement
.

Please consider

how you too might assist

this epic effort.
It’s astonishing how far we’ve come since Internet-in-a-Box’s
original demo in 2013 — and how far we will go together,
If You Too Can Help!

Read More

Extracting Hacker News Book Recommendations with the ChatGPT API

I love books and I enjoy reading through the Hacker News(HN) book recommendation threads. On HN, there’s almost 200 stories so far this year that have the separate word “book” in the title, and aren’t linked to another page. I wondered what the most commonly recommended or mention books are. Mainly wondering if SICP or PCL would be the top recommendation.

After reading of the man who categorised his favourite podcast into dewey decimal using GPT, I was aware that the GPT API could be used to categorise data and output the information in json format. So using the HN data fetched from the hackernews API, I used the subset of stories that seem to be book recommendation threads and extracted book titles, authors and urls from the text using calls to the Chat Completions API.

Here’s the top 50 book recommendations:

# Title Author Count First Mention
1 Structure and Interpretation of Computer Programs Abelson and Sussman 376 5675
2 Gödel, Escher, Bach Douglas Hofstadter 293 56795
3 How to Win Friends and Influence People Dale Carnegie 292 5584
4 The C Programming Language Brian Kernighan, Dennis Ritchie 284 135262
5 Dune Brian Herbert 263 57231
6 Thinking, Fast and Slow Daniel Kahneman 244 3277457
7 Meditations Descartes 233 134993
8 Atlas Shrugged Ayn Rand 222 86114
9 The Art of Computer Programming Donald E. Knuth 213 135245
10 Sapiens: A Brief History of Humankind Yuval Harari 205 10028239
11 Zen and the Art of Motorcycle Maintenance Robert M Pirsig 203 56941
12 The Pragmatic Programmer Andrew Hunt 203 5704
13 Introduction to Algorithms Charles E. Leiserson, Clifford Stein, Ronald Rivest, Thomas H. Cormen 171 55391
14 The Selfish Gene Richard Dawkins 168 85867
15 Code: The Hidden Language of Computer Hardware and Software Charles Petzold 160 135906
16 The Mythical Man-Month Fred Brooks 159 5725
17 The Black Swan Nassim Nicholas Taleb 158 56763
18 Designing Data-Intensive Applications Martin Kleppman 153 8671875
19 1984 George Orwell 152 85938
20 Code Complete Steve McConnell 149 56709
21 Snow Crash Neal Stephenson 146 85862
22 The Three-Body Problem Cixin Liu 143 8867599
23 Ender’s Game Orson Scott Card 143 56704
24 The Design of Everyday Things Don Norman 136 85860
25 Bible Unknown 134 85859
26 Founders at Work Jessica Livingston 133 5613
27 Antifragile Nassim Nicholas Taleb 130 4966437
28 Man’s Search for Meaning Victor E. Frankl 129 1634144
29 The Hitchhiker’s Guide to the Galaxy Douglas Adams 128 56709
30 Cryptonomicon Neal Stephenson 127 85940
31 The Fountainhead Ayn Rand 127 135463
32 Surely You’re Joking, Mr. Feynman! Richard Feynman 125 85858
33 Fooled by Randomness Nassim Nicholas Taleb 125 57595
34 Siddhartha Herman Hesse 124 86337
35 Foundation Isaac Asimov 123 140379
36 The Lord of the Rings J. R. R. Tolkien 121 56629
37 Zero to One Peter Thiel 115 7968392
38 Calculus Charles B. Morrey Jr., Murray H. Protter 114 193554
39 Neuromancer William Gibson 112 56663
40 The Phoenix Project Gene Kim 110 5569687
41 The Lean Startup Eric Ries 110 1570888
42 Never Split the Difference Chris Voss 108 12245967
43 Design Patterns Addy Osmani 107 80916
44 Guns, Germs, and Steel Jared Diamond 107 56777
45 JavaScript: The Good Parts Douglas Crockford 106 259986
46 Clean Code Robert C. Martin 106 1945860
47 Deep Work Cal Newport 105 11702897
48 The Elements of Computing Systems Noam Nisan, Shimon Schocken 104 1295307
49 The Little Schemer Daniel P. Friedman, Matthias Felleisen 102 56629
50 Influence: The Psychology of Persuasion Robert B. Cialdini 101 193848

Some things I discovered while doing this project:

  • When the API doesn’t return valid JSON, usually this is when chatGPT is saying things
    like “I apologize for the confusion…” or “You’re welcome! If you have any more questions, feel free to ask.”, in response to a HN comment that just says “thanks” or asks a question.
  • Designed the prompt so that I can discard responses with empty titles. This is because I was unable to get chatgpt to stop including mentions of Authors without a title of a particular book.
  • Processing 57k comments cost about $40 using gpt 3.5 turbo API.
  • Even with a temperature of 0, GPT’s results vary from call to call. Others have noticed this effect
    (HN discussion), it’s not just GPT-4 that is non-deterministic – GPT 3.5 turbo exhibits greater variability
    compared to earlier GPT-3 models.
  • It can identify links from the text, but I had to remove the html tag and just leave the url otherwise GPT would pick up the truncated link text instead of the URL.

Here’s an example of the json output by chatgpt, for this comment, it got everything wrong except the link, but it shows the format of the data:

[
  {
    "match": "Hitchhiker's Guide Vms Unsupported Undocumented Can Go Away At Any Time Feature",
    "title": "The Hitchhiker's Guide to the Galaxy",
    "author": "Douglas Adams",
    "link": "http://www.amazon.com/Hitchhikers-Guide-Vms-Unsupported-Undocumented-Can-Go-Away-At-Any-Time-Feature/dp/1878956000"
  }
]

Edit: someone has asked for the prompt, here it is:

prompt = [
            {"role": "system", "content": "Assistant that identifies book titles and authors in the following document and shows the words you match to a book title from. Some titles may be abbreviated, please expand the abbreviated title. If the document talks about an author but doesn't mention a book, leave "title" blank. If you know who the author is, provide the author. Don't include the book's subtitle. If the text is asking for a recommendation, without mentioning a book, then return an empty array. Provide your answer in a json array."}
            {"role": "user", "content": 'Wren's Explosion https://www.amazon.com/gp/395, and any Plath."'},
            {"role": "assistant", "content": '''[{"match":"'Wren's Explosion","title":"Explosion","author":"P.C. Wren","link":"https://www.amazon.com/gp/395"}, {"match":"any Plath","title":"", "author":"Sylvia Plath"}]'''},
            {"role":"user", "content":"3-days free trial isn't freemium."},
            {"role":"assistant", "content":"[]"},
            {"role":"user", "content": "Miranda Hamilton"},
            {"role":"assistant", "content": '''[{"match":"Miranda Hamilton","title":"Hamilton","author":"Lin-Manuel Miranda, Jeremy McCarter","link":""}]'''},
        ]

The Data

Because I enjoy working with data and think you might find it interesting to analyse the results, here’s the raw data produced by GPT
, sorted by title. Note there’s a match column in there which includes an excerpt
of the comment where the book was identified. I also normalised the book titles, lowercasing and removing ‘the’ if present at the start, and removed any subtitles. This enabled me to query the top books without missing too many items due to inconsistence in the names that gpt came up with.

Here is the input data in zipped csv format, it expands out to a 24 MB file.

Note

I have added an amazon affiliate links to amazon urls in the tables above, mainly as a learning exercise.

Read More

By |2023-10-04T07:08:42+00:00October 4, 2023|Entertainment|0 Comments

About the Author:

Leave A Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Go to Top