Template-type: ReDIF-Paper 1.0 Author-Name: Kyle Higham Author-Workplace-Name: Motu Economic and Public Policy Research Author-Name: Hannah Kotula Author-Workplace-Name: Motu Economic and Public Policy Research Author-Name: Emma Scharfmann Author-Workplace-Name: University of California, Berkeley Author-Name: Steve Gong Author-Workplace-Name: Google Author-Name: Gaétan de Rassenfosse Author-Workplace-Name: Ecole polytechnique fédérale de Lausanne Title: A dataset of scientific citations in U.S. patent Office Actions Abstract: We present a curated dataset of about 850,000 citations extracted from Office Actions issued by examiners at the United States Patent and Trademark Office. These references, historically underused due to accessibility challenges, provide a granular view into the patent examination process and complement traditional front-page citation data. We classify each citation into one of 14 categories and focus on the 265,000 references to scientific literature, which we parse, clean, and disambiguate using machine learning and external bibliographic services. To enhance reusability, disambiguated records are linked to OpenAlex, a comprehensive research metadata platform. The dataset enables new research on examiner behavior, science–technology linkages, and the construction of citation-based metrics. All data and code are openly available to facilitate reuse across disciplines. Classification-JEL: O34; K29; D83; C81 Keywords: citation; patent; office actions; open data; non-patent literature; NPL Length: 12 pages Creation-Date: 2026-02 File-URL: https://cdm-repec.epfl.ch/iip-wpaper/WP31.pdf File-Format: application/pdf Handle: RePEc:iip:wpaper:31