Woohyeuk Lee
Search
Search
Dark mode
Light mode
Explorer
fun
books
Books
Camus, The Stranger
Dostoevsky, The Idiot
Frankl, Man's Search for Meaning
Huxley, Brave New World
Isaacson, Elon Musk
Kalanithi, When Breath Becomes Air
Orwell, Animal Farm
Parker, The Art of Gathering
The Canceling of the American Mind
photography
P-China
P-Europe
P-Family
P-Korea
P-Madison
P-Misc
Photography
Digital Music Production
Fun
research
zotero
‘Hi Chatbot, let’s Talk about Politics!’ Examining the Impact of Verbal Anthropomorphism in Conversational Agent Voting Advice Applications (CAVAAs) on Higher and Lower Politically Sophisticated Users
A Roadmap to Pluralistic Alignment
Ablation Programming for Machine Learning
Artificial Intelligence, Values, and Alignment
chandrasekharan2018
chandrasekharan2019
Constitutional AI - Harmlessness from AI Feedback
Direct Preference Optimization - Your Language Model is Secretly a Reward Model
ELIZA — a computer program for the study of natural language communication between man and machine
Generative AI Misuse - A Taxonomy of Tactics and Insights from Real-World Data
Harms from Increasingly Agentic Algorithmic Systems
How Culture Shapes What People Want From AI
jiang2023
koshy2023
Language Models Learn to Mislead Humans via RLHF
li2022
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B
Measuring the Algorithmic Efficiency of Neural Networks
MemGPT - Towards LLMs as Operating Systems
On the Dangers of Stochastic Parrots - Can Language Models Be Too Big 🦜
Red Teaming Language Models to Reduce Harms - Methods, Scaling Behaviors, and Lessons Learned
Refusal in Language Models Is Mediated by a Single Direction
Removing RLHF Protections in GPT-4 via Fine-Tuning
santurkar
Scaling Laws for Neural Language Models
Shadow Alignment - The Ease of Subverting Safely-Aligned Language Models
The Canceling of the American Mind - Cancel Culture Undermines Trust and Threatens Us All—But There Is a Solution
The Llama 3 Herd of Models
Towards Bidirectional Human-AI Alignment - A Systematic Review for Clarifications, Framework, and Future Directions
Training language models to follow instructions with human feedback
Universal and Transferable Adversarial Attacks on Aligned Language Models
weld2022a
AI Alignment and Censorship
Datasets in LLM Training
Fine-lines in Content Moderation
NLP
Reddit Studies
Research
Self-censorship of AI
Why Uncensored Models?
syllabus
Concurrency
Syllabus
Virtualization
wander
Dear Travelers of Faith
On Trees, and Being
The Asian Parent Inside of Me
Treading Water
Utilitarian and Deontologist GPT - The Survival Lottery, Thought Experiments, and Unity
Wander
Home
❯
research
❯
zotero
Folder: research/zotero
32 items under this folder.
Oct 29, 2024
A Roadmap to Pluralistic Alignment
Oct 29, 2024
Ablation Programming for Machine Learning
Oct 29, 2024
Artificial Intelligence, Values, and Alignment
Oct 29, 2024
Constitutional AI - Harmlessness from AI Feedback
Oct 29, 2024
Direct Preference Optimization - Your Language Model is Secretly a Reward Model
Oct 29, 2024
ELIZA — a computer program for the study of natural language communication between man and machine
Oct 29, 2024
Generative AI Misuse - A Taxonomy of Tactics and Insights from Real-World Data
Oct 29, 2024
Harms from Increasingly Agentic Algorithmic Systems
Oct 29, 2024
How Culture Shapes What People Want From AI
Oct 29, 2024
Language Models Learn to Mislead Humans via RLHF
Oct 29, 2024
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B
Oct 29, 2024
Measuring the Algorithmic Efficiency of Neural Networks
Oct 29, 2024
MemGPT - Towards LLMs as Operating Systems
Oct 29, 2024
On the Dangers of Stochastic Parrots - Can Language Models Be Too Big 🦜
Oct 29, 2024
Red Teaming Language Models to Reduce Harms - Methods, Scaling Behaviors, and Lessons Learned
Oct 29, 2024
Refusal in Language Models Is Mediated by a Single Direction
Oct 29, 2024
Removing RLHF Protections in GPT-4 via Fine-Tuning
Oct 29, 2024
Scaling Laws for Neural Language Models
Oct 29, 2024
Shadow Alignment - The Ease of Subverting Safely-Aligned Language Models
Oct 29, 2024
The Canceling of the American Mind - Cancel Culture Undermines Trust and Threatens Us All—But There Is a Solution
Oct 29, 2024
The Llama 3 Herd of Models
Oct 29, 2024
Towards Bidirectional Human-AI Alignment - A Systematic Review for Clarifications, Framework, and Future Directions
Oct 29, 2024
Training language models to follow instructions with human feedback
Oct 29, 2024
Universal and Transferable Adversarial Attacks on Aligned Language Models
Oct 29, 2024
chandrasekharan2018
Oct 29, 2024
chandrasekharan2019
Oct 29, 2024
jiang2023
Oct 29, 2024
koshy2023
Oct 29, 2024
li2022
Oct 29, 2024
santurkar
Oct 29, 2024
weld2022a
Oct 29, 2024
‘Hi Chatbot, let’s Talk about Politics!’ Examining the Impact of Verbal Anthropomorphism in Conversational Agent Voting Advice Applications (CAVAAs) on Higher and Lower Politically Sophisticated Users