Papers

  • Deepseek Scaling

    February 5, 2025

    Explores how to improve open-source language models by optimizing their scaling strategies. The researchers developed the DeepSeek LLM models (7B and 67B parameters) using 2 trillion tokens for training and fine-tuned them with supervised learning and preference optimization. Their findings show that DeepSeek 67B outperforms LLaMA-2 70B in tasks related to coding, mathematics, and reasoning. DeepSeek LLM 67B surpasses GPT-3.5 in open-ended evaluations, demonstrating superior conversational abilities in both English and Chinese.
  • Introducing deepseek

    January 29, 2025

    The paper introduces DeepSeek-R1, a reasoning-focused AI model built using reinforcement learning (RL) without traditional supervised fine-tuning. This approach significantly improved reasoning capabilities, with benchmarks showing performance comparable to leading models like OpenAI-o1-1217. Notably, DeepSeek-R1-Zero, an earlier version, showed emergent reasoning skills purely through RL, though it faced issues like poor readability. The refined DeepSeek-R1 addressed these with better training data and techniques, demonstrating success in solving math, coding, and logic tasks. This breakthrough highlights that RL alone can incentivize advanced problem-solving behaviors, paving the way for more efficient AI reasoning systems.
  • Neural Networks are Decision Trees

    January 22, 2025

    Shows that any neural network, a type of computer system used for tasks like recognizing images or making predictions, can be turned into a decision tree, a simple, step-by-step guide to how decisions are made. This transformation doesn’t change the network’s accuracy but makes it much easier to understand. For example, instead of guessing why a system made a choice, you can follow clear rules in the tree. These trees can also speed up calculations in smaller networks. This work helps make AI more transparent and easier to trust in areas like healthcare and safety.
  • Language Models in Blockchain Applications

    January 15, 2025

    Explores the use of Large Language Models (LLMs) for blockchain security, highlighting their potential to improve smart contract auditing, detect abnormal transactions, and support cryptocurrency governance. Some key notes are: the ability of LLMs to automate vulnerability detection in smart contracts while emphasizing the importance of ethical considerations, regulatory compliance, and sustainability due to the energy demands of LLM training. While promising, challenges remain, such as ensuring accuracy, managing evolving cyber threats, and balancing efficiency with fairness. The paper stresses the need for interdisciplinary collaboration and continuous refinement of LLMs to maximize security benefits.
  • Six Emotional Shapes of Storytelling

    January 8, 2025

    Examines the emotional arcs of stories and identifies six basic shapes that underpin narrative structures. Using text analysis of 1,737 books from Project Gutenberg, the authors applied sentiment analysis and machine learning to map emotional trajectories. Stories following the "Cinderella" arc (rise-fall-rise) tend to be more successful, as measured by download counts. This research highlights the universality of emotional patterns in storytelling and offers tools for analyzing and even generating compelling narratives.
  • Watermarking Large Language Model Output

    January 1, 2025

    Proposes a method to mark text generated by AI in a way that is invisible to humans but detectable by algorithms. This watermarking embeds patterns into text without affecting its quality and allows detection from short excerpts. The watermark can reliably identify AI-generated text from just 25 tokens, with a minimal chance of false positives. This approach could help mitigate risks like misinformation or misuse of AI-generated content.
  • Decentralized Currency

    December 25, 2024

    This is the revolutionary paper that introduces Bitcoin, a system for electronic payments that eliminates the need for a trusted third party like a bank. It uses a peer-to-peer network and cryptographic proof, including a "blockchain," to timestamp and secure transactions, solving the problem of "double-spending." The network remains secure as long as honest participants control the majority of computing power, making it computationally impractical for attackers to alter the transaction history. Bitcoin's decentralized approach ensures transparency, security, and lower transaction costs, paving the way for a new era of digital currency.
  • Fine-tuning with LoRA

    December 18, 2024

    Introduces LoRA (Low-Rank Adaptation), a method for adapting large pre-trained language models efficiently. By freezing the main model and training small, low-rank matrices, LoRA significantly reduces the number of trainable parameters: up to 10,000 times less than full fine-tuning, while maintaining or improving task performance. It lowers memory requirements and introduces no extra inference latency. LoRA matches or outperforms full fine-tuning on various tasks, showing that large models can adapt using fewer resources. The method's simplicity allows seamless integration with existing architectures without compromising quality.
  • Categorizing Languages Through Metadata

    December 11, 2024

    The LinguaMeta project, detailed in the paper, unifies metadata for over 7,500 languages, covering aspects like language codes, speaker counts, writing systems, and regions. While metadata for widely spoken languages is robust, gaps exist for smaller, endangered languages. LinguaMeta offers comprehensive, traceable data aimed at supporting technology development for underrepresented languages.
  • This paper was actually written by me! This research introduces an AI medical chatbot trained on real patient-doctor conversations. Using advanced language models, it combines generative AI with retrieval-based methods like BERT to deliver accurate and context-aware responses. The chatbot aims to provide accessible preliminary healthcare insights, emphasizing it’s not a substitute for professional advice. This study highlights how fine-tuning large models for specific domains, paired with ensemble techniques, can improve conversational AI for real-world healthcare applications.
  • Subgraphs in Road Networks

    November 27, 2024

    Explores creating compact yet versatile road network models for navigation systems. It proposes algorithms to extract minimal subgraphs that preserve near-optimal routes for diverse travel preferences (e.g., shortest time, avoiding highways). Their greedy algorithm can reduce subgraph size by up to 60% compared to existing methods while maintaining accuracy. The research highlights how smaller subgraphs enable computationally intensive tasks, like dynamic routing or vehicle logistics, making navigation systems more adaptable and efficient.
  • Optimization of Quantum Measurement

    November 20, 2024

    Illustrates a way to improve how accurately we measure tiny quantum systems used in advanced computing. These systems, called superconducting circuits, are extremely sensitive and prone to errors during measurements. The researchers developed a smarter, faster method to fine-tune the measurement process, reducing errors to just 1.5% while keeping the system stable. This breakthrough could make quantum computers more reliable and able to handle bigger, more complex problems in the future.
  • Connecting Languages with Technology

    November 13, 2024

    This research explores the vast, untapped potential of data for thousands of languages, emphasizing that the main barrier to building language technology isn’t scarcity but the scattered nature of resources. A key insight reveals that, with better aggregation and community involvement, we could harness existing data to support language technologies for many under-resourced languages, making digital tools more accessible globally.
  • Improvements with Beam Search

    November 6, 2024

    Discusses methods to improve the confidence estimation in generative sequence labeling, a process used for tasks like entity extraction in AI. Traditional models rely on token-level probabilities, but this approach may miss the full scope of uncertainty. The authors propose using "beam search" statistics, leveraging multiple prediction candidates to better gauge model confidence. A key finding shows that methods like "Aggregated Sequence Probability" and "Adaptive Aggregated Sequence Probability" can reduce errors in confidence estimation, making predictions more reliable across various datasets. This improvement has practical implications for applications needing precise AI outputs, like virtual assistants or search engines.
  • Human-Algorithm Collaboration

    October 30, 2024

    Explores human-algorithm collaboration, specifically when an algorithm provides a shortlist for a human to make the final choice. The study finds that collaboration (where the algorithm suggests a subset of choices rather than making a solo decision) often improves outcomes when both human and algorithmic errors are independent. Using a shortlist of items, rather than a single option, can improve accuracy due to complementary strengths in decision-making, especially when neither the human nor the algorithm is perfect.
  • Accessible AI for Creative Tasks

    October 23, 2024

    Describes how generative AI (GAI) is transforming creative practices, especially for people with disabilities. Through interviews with 10 creatives with various disabilities, the study reveals how they adapt GAI to enhance both accessibility and artistic expression across different mediums, from painting to audio engineering. The participants shared insights on balancing creative practices with accessibility hacks, offering a unique perspective on integrating AI into artistic workflows.
  • Length Generalization in Large Language Models

    October 16, 2024

    Explores how large language models (LLMs) handle "length generalization," or the ability to solve longer problems using knowledge from shorter ones. The study finds that finetuning alone is ineffective at improving this skill, even with larger models. However, using a "scratchpad" method, where models break down tasks into steps, significantly enhances performance, especially when combined with in-context learning (learning from a few examples). LLMs can improve length generalization more through in-context learning than by traditional finetuning, offering a new approach to reasoning tasks like math and code execution​.
  • Merging Large Language Models

    October 9, 2024

    Focuses on efficiently creating high-performing large language models (LLMs) by merging multiple existing fine-tuned models rather than training new models from scratch. The challenge is finding methods to combine capabilities from various specialized models to generalize across tasks without the need for extensive retraining, thereby saving significant computational resources. A competition encourages participants to merge models under an 8 billion parameter limit using minimal compute resources. Merging expert models can potentially outperform existing models while dramatically reducing the cost and complexity of training large models from scratch.
  • Transformer-based Video Generation

    October 2, 2024

    Introduces VideoPoet, a model for generating high-quality videos from various input signals like images, text, and audio. It uses a transformer-based architecture similar to large language models, allowing it to generate videos in a 'zero-shot' manner, meaning it can create videos without being specifically trained for each task. A key finding is that VideoPoet outperforms existing video generation models, especially in creating fluid, realistic motions by incorporating multimodal inputs and a two-stage training process.
  • Attention is All You Need

    September 25, 2024

    Introduces the Transformer, a groundbreaking architecture in AI for processing sequences. Unlike traditional models that rely on complex recurrence or convolution, the Transformer uses only attention mechanisms to handle dependencies in data. This design allows for faster training and better performance in tasks like language translation. One key finding is that the Transformer outperforms previous state-of-the-art models in translation while requiring significantly less computational time. This is one of the most important papers to be released in recent history!
  • Bias in Natural Language Processing

    September 18, 2024

    Addresses the issue of bias in Natural Language Processing (NLP) models, which can produce harmful stereotypes and unequal outcomes for different social groups. Despite efforts to assess and mitigate these biases, current measurement methods are flawed. The authors propose using psychometrics, a field that measures abstract concepts, to improve how biases in NLP are evaluated. They focus on two key psychometric concepts: construct validity (ensuring measures capture the intended bias) and reliability (ensuring consistent results). One key finding is that NLP bias measures need more reliable and valid tools to prevent hidden biases from causing unintended societal harm.
  • Text Summarization

    September 11, 2024

    Examines the balance between redundancy and cohesion in extractive summarization, particularly for long, redundant texts like scientific papers. Two systems are introduced: one reward-based, optimizing for cohesion and informativeness, and another unsupervised, using psycholinguistic theories to simulate human memory. The reward-guided approach produces more cohesive summaries, though sometimes at the cost of informativeness. A key finding is that models focusing on cohesion create more structured and readable summaries, and can maintain or even improve informativeness, compared to those aimed solely at reducing redundancy.
  • Language Models as a Service

    September 4, 2024

    Discusses 'Language Models as a Service' (LMaaS), which refers to the use of powerful language models offered through APIs or web interfaces. LMaaS presents challenges such as limited accessibility, reproducibility, reliability, and trustworthiness due to its black-box nature and commercial restrictions. These challenges hinder efforts to understand and control these models. A key finding is that LMaaS exacerbates inequalities, as its pay-per-use model disproportionately affects lower-resource users, making it difficult for these groups to benefit from advances in AI.