This month I read Japanese Books, and about Generative AI and Copyright and Computer Architecture. I also wrote about why I wrote this article and why I tried to track everything I read this month.
Japanese Books
This month I felt a sense of crisis that I did not know anything about Japanese history despite being (ethnically) Japanese, so I decided to read about it. Being an extremist I decided to start from oldest book in Japan, the Kojiki (古事記) which details the Shinto (神道) myths and legends about how Japan was created and the lineage of the Imperial Family (which still continues from 660 BC!). I was also inspired by friends who have been reading some similarly old books (there are many old books).
To be honest, I didn’t actually find the book particularly exciting, but there’s a lot of interesting cultural context and history around the book itself which I found to be more interesting than the book itself. For example:
The book isn’t very old. The book was written throughout 7th century AD, which is a few thousand years less old than other old books from other cultures. This is because the history of developed civilizations in Japan is a lot shorter than other nearby cool places like China.
The book features a lot of stories which are very similar to Greek mythology. The story of Orpheus and “Don’t Look Back” appears in the Kojiki in the same exact context (visiting the underworld), just with the equivalent Japanese gods. The common theory is that Greek mythology made its way through the Silk Road to Japan and the Japanese adopted the story when developing their own mythology. I find it amazing that the game of telephone managed to propagate so far.
The book wasn’t very well read throughout Japanese history. It was rediscovered in the Edo era (1600-1868) and in the Meiji era (1868-1912) it was used as a political tool to unify the Japanese people under a single religion (based on the same Shinto mythologies described in the Kojiki) in preparation to compete with the west. The motivation came from Japanese scholars noticing that Christianity was an effective (pervasive?) bedrock for western society, and initially there were some scholars who advocated to adopt Christianity as the religion of choice.
Eventually this State Shinto led to extremism, imperialism, colonialism, war, and after Japan lost the big war, the GHQ (the post-war US occupation in Japan) banned the use of State Shinto books like the Kojiki in Japan, leading to most (?) Japanese citizens having not read the Kojiki. Today the book sometimes still gets abused as a political tool to spread extremist views.
Reading the Kojiki was also a nice opportunity for me to learn about Japanese history more broadly (mostly through Wikipedia), which helped unify the bits and pieces of Japanese history I did know through US education, books, comics, etc.
I also read 池上彰の教養のススメ (Akira Ikegami's Guide to Cultivating Knowledge), which is a series of interviews conducted by Akira Ikegami to different professors at the Tokyo Institute of Technology (TITech), all advocating for liberal arts education in Japan and specifically at TITech. The historical context is that liberal arts education was ubiquitous in Japan, stemming from the Meiji-era push to study the west and learn from the west. Elite high schools of the time put a large emphasis on studying western philosophers and scholars like Descartes, Kant, and Schopenhauer. This focus naturally extended to universities, until after the war during the rapid economic growth in Japan where schools started to put a focus on practical studies instead of the liberal arts. In the book, different professors profess their opinions on how and why the lack of liberal arts education has significantly hurt Japan’s economic growth and society in the last N years. The interviews themselves are quite interesting and I learned a lot of random things like the politics of building dams in rural places and strategies to develop social consensus among stakeholders that are in dire conflict with each other… something that feels very relevant in the zeitgeist I live where things like artificial intelligence are rattling things left and right.
Finally I also read コンビニ人間 (Convenience Store Woman), which once I picked it up, I couldn’t stop reading it until I finished it (it’s also a short book). I won’t try to spoil it, but it was an amazing book filled with wit. I recommend it. The last time I read a book that I similarly couldn’t put down was The Vegetarian by Han Kang.
Generative AI and Copyright
Everyone is talking about artificial intelligence, and I am no exception. In the workplace, the focus of my research has shifted to working on generative AI (specifically working on text-to-3D generation), and in my daily life, I’ve been actively trying to incorporate ChatGPT into my workflow to imagine a life with AI.
I think there is a lot of interesting discussions around how to effectively use AI, but maybe not enough discussions about the pragmatic consequences. No, this is not about the singularity or The Terminator, but more so about how things like text-to-image models are already disrupting (in a negative sense) the labor market and potentially displacing them. AI doesn’t seem to be going away anytime soon, so I’ve been thinking a lot recently about how we can develop technology to go towards (hand wave) a world that we’d want to see in which technology provides incentives that the optimal solution according to capitalism is one that benefits everyone. This, of course, might be quite an ambitious undertaking… but that sounds like I should at least spend significant time to think about it. (If anyone wants to brainstorm, let me know)
To first educate myself with concrete things, I’ve been studying what copyright law thinks about generative AI, specifically in the context of text-to-[visual media] generation systems (being a computer graphics person). The article from the Congressional Research Services was probably the most comprehensible source that disambiguates between different classes of legal disputes that can arise from generative AI and provides references to relevant historical rulings. Although I technically didn't read this in March, while I was writing this article I also read this illuminating article from Stanford which delves broader & deeper into how Fair Useunder US copyright law could come into play with foundation models. I especially recommend reading sections 2.6 and 3 for anyone working on computer graphics.
It’s interesting to see how these laws differ by the country. For example, in the United States, the question of is training on copyrighted data considered to be copyright infringement? relies on the interpretation of fair use in US copyright law. Meanwhile, Japan does not have the fair use system and instead the Japanese copyright law was recently (2018) changed to explicitly allow AI systems to train on copyrighted data. There is of course a fine print which allows training as long as the goal of the AI system isn’t redistribution which can be disputed with current text-to-image models. This means that Japan (currently) is a unique country where AI implementers can have free reign over training on Japanese IP, and Japan is a content with a lot of popular IP through things like (but not limiting to) video games, manga, and anime. This seems very good for AI systems builders in Japan, but very bad for Japanese content creators.
A question I’m interested in but wasn’t able to find much legal opinion on is whether the trained models themselves can be copyrighted. On one hand, you could argue that hyper-parameter tuning, dataset curation, and architecture building is a form of creative expression. But given that the current ruling by the US copyright office is that at least prompt engineering does not yet fall under creative expression, it seems questionable whether hyper parameter tuning counts. (If anyone knows any experts who can enlighten me with wisdom, please let me know).
I of course am also reading lots of technical research papers on this topic too… and I hope I can write more intelligently about more specific ideas and explorations soon.
Computer Architecture
This semester I am taking ECE1755: Parallel Computer Architecture taught by Professor Mark Jeffrey, which also means I’ve been reading lots of papers on computer architecture (from the likes of ISCA and MICRO). Prior to this semester, I’ve never taken an advanced course in computer architecture, and my only prior knowledge come from:
CS 251 at Waterloo which teaches the very basics of CPU design (and concepts like pipelining and cache hierarchies)
Intellectual osmosis through 5 years of working at a GPU hardware company
Self-taught snippets of knowledge from having to write performant code for work and for research
Hobbyist readings of The Copetti Site, which has fun explanations and analysis of the architecture of various video game consoles
Being able to close the loop on bits and pieces of knowledge I’ve accumulated has been nice, but also learning the depth and complexities of (multi-core) computer architecture has been rather scary. Even something that feels as benign (at least to naive me) as cache coherence involves what feels like messy finite-state machines and protocols. The lesson learnt, which is a recurring one, is that simplicity and performance (usually) does not coexist… but also challenging that status quo should always be at least one goal of academic research. I definitely recommend this course if you are a UofT CS graduate student, and Mark is a super engaging teacher (this alone is a good reason to take the course!).
One of my favorite papers that I read for this course this month was the chiplets paper, alongside the follow-up paper from AMD that details the real-world implementation of this technology. The basic premise of chiplets is a computational geometry problem: knowing that random meteors will fall in random locations on a 2-dimensional circle, how do you subdivide the circle in to parts such that you can maximize the number of assemblies that survive the meteors? Traditionally, this problem was thought of in terms of subdividing the circle into entire assemblies, but in this paper they subdivide in to modular parts such that the survival rate of entire assemblies increases.
This of course comes at a cost of overhead of having to assemble those modular parts and make sure efficient communication can happen between them, and that’s where the meat of the technical contributions of the paper focuses on. I thought this was a fun paper that starts from key technological constraints starting from manufacturing (like how silicon dies have to be a circle due to the forming process, how the radius is limited due to lithographical resolution constraints, how defects are inevitable) and communicates a well guided and well principled solution to the problem.
The readings we did on on-chip networks inspired me to (re-)read a lot of transformer / attention papers from machine learning, like preset communication patterns like the Pixelated Butterfly, dynamic routing patterns, and some other related papers.
Why did you write this?
The reason I wrote this article is because I’ve been thinking a lot about how to balance my input and output, so I wanted to keep track of at least the things I read (input) as a mental reference point. This doesn’t include things I watch or hear or talk to people about, and I didn’t include any visual media even if it involves reading.
In a sense this is an experiment to see how keeping track of what I read will affect my life and work. Some observations:
I read a lot more than I expected, especially when looking at research papers. 57 papers is almost two papers a day, and this doesn’t include the numerous papers that I skim figures for.
Adding something to a list of things is something that some people find fun, so keeping track of my reading likely biases me to read even more.
My emotional response to writing this was I probably read too much, because writing this blog post and maintaining the things I read was a lot of work.
Although this was a fun experiment, I’m not entirely sure I should continue because this feels like it gamifies reading.
I want to be careful about this experiment, because (thanks to my reading) I learned that there is a theory (motivation crowding theory) that posits that injecting external motivation for doing something (usually money in the context of the theory, but in my interpretation includes things like ‘keeping track of a list’) actually undermines the intrinsic motivation for doing something. I have a feeling that doing this experiment for too long could rewire my brain to be counting the number of words I read instead of reading words.
In fact, motivation crowding seems to be a widespread issue. Computer science has a trap that many people fall into, which is that many forget their real (intrinsic) motivations and start optimizing for some external metric like number of publications or number of citations or money. My hypothesis for why this happens is for the simple fact that so many problems in computer science can be formulated as an optimization problem where you minimize some loss function, so this mindset naturally leaks into daily life. So many problems in reality (and computer science) are really not so simple in practice, and ‘min-maxing’ is almost always the wrong strategy to look for solutions. One of the books I read this month (池上彰の教養のススメ) anecdotally highlighted the plight of education systems that promote ‘min-maxing’ as the only path towards success.
I’m not immune to this fallacy either. I grew up playing lots of music. Although I genuinely enjoy making music, there was a period of my life though where I did not enjoy playing, and that was when external signals like placement and competitions started to creep into a larger part of my motivation. I started to (maybe unconsciously) optimize for them, and I really struggled. At one point, I “gave up” and started playing for the sake of enjoyment instead of stressing about things… and maybe surprisingly, this actually led to more success in those external signals anyways. I try to adapt a similar philosophy to my life and try to avoid anything that I sense could lead to Goodhart’s Law: "When a measure becomes a target, it ceases to be a good measure".
In any case, regardless of whether I continue doing this, I think this was a fun experiment to take a snapshot at least what a month of reading for me looks like. The other reason I did this is because I found it useful / interesting to read other people’s What I Read posts and wanted to try doing the same for others. Maybe a good strategy could be to write about broad themes I read about that month instead of explicitly tracking things.
Full List of Things I Read (that I successfully kept track of)
Books:
古事記 (Kojiki), Natsuki Ikezawa version
コンビニ人間 (Convenience Store Woman)
池上彰の教養のススメ
耳と感性でギターが弾ける本
Chokepoint Capitalism (just started)
走ることについて語るときに僕の語ること (What I Talk About When I Talk About Running) (just started)
Blog Posts / Articles:
Changing Melodies: Art and Research are Often Open-Ended Exploration
Digital technologies support new workflows in drill bit forensics
Surprising Results of a New Study of Copyright Substantial Similarity Analyses
(I’m terrible at keeping track of this, so there were probably others too)
Research Papers:
6 papers for SIGGRAPH Reviews
Delicate Textured Mesh Recovery from NeRF via Adaptive Surface Refinement
Generative 3D models: a key to more information within less bandwidth at higher quality
TMO: Textured Mesh Acquisition of Objects with a Mobile Device by using Differentiable Rendering
Compositional 3D Scene Generation using Locally Conditioned Diffusion
Debiasing Scores and Prompts of 2D Diffusion for Robust Text-to-3D Generation
Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation
Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior
Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors
Two Techniques to Enhance the performance of Memory Consistency Models
InvisiFence: Performance-transparent Memory Ordering in Conventional Multiprocessors
Enabling Interposer-based Disintegration of Multi-core Processors
Pioneering Chiplet Technology and Design for the AMD EPYC and Ryzen Processor Families
Zero-Shot Text-to-Parameter Translation for Game Character Auto-Creation
Visual Classification via Description from Large Language Models
Top-K Off-Policy Correction for a REINFORCE Recommender System
Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models
Efficient Content-Based Sparse Attention with Routing Transformers
From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification
Modeling content creator incentives on algorithm-curated platforms
Approximate Nearest Neighbor Search through Modern Error-Correcting Codes
Rickrolling the Artist: Injecting Backdoors into Text Encoders for Text-to-Image Synthesis
Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models
Poisoning-Assisted Property Inference Attack Against Federated Learning
Membership Inference Attacks Against Text-to-image Generation Models
Are Diffusion Models Vulnerable to Membership Inference Attacks?
Prompt Stealing Attacks Against Text-to-Image Generation Models
Students Parrot Their Teachers: Membership Inference on Model Distillation
Preventing Verbatim Memorization in Language Models Gives a False Sense of Privacy
The Stable Signature: Rooting Watermarks in Latent Diffusion Models
Anti-DreamBooth: Protecting users from personalized text-to-image synthesis
Perceptual Image Quality Assessment through Spectral Analysis of Error Representations