DEVONthink
Toward Enduring Web Citations
An attempt to solve the twin problems of impermanence and imprecision.
n. A citation that combines a textual reference with a durable hyperlink to that exact passage in a preserved copy of the source.
v. (–cited, –citing) To create a citation by joining a selected text passage to a permanent, pinpoint hyperlink of its archived source.
Before going into the mechanics, see it in action:

Here’s the deepcite I created there—try the link for yourself:
The Twin Flaws of a Standard Hyperlink
The architecture of the web is fundamentally at odds with the demands of lasting citation. Any link we use as a reference is undermined by two distinct problems: one of permanence, the other of precision.
The permanence problem—link rot—is well-known. A normal hyperlink is a fragile, hopeful pointer to a resource you don't control. Pages change, URL schemes evolve, and critical information simply disappears. The Internet Archive has been fighting this battle for decades.
The precision problem is more subtle, a chronic friction we’ve just grudgingly accepted. A link to a 10,000-word article isn’t a citation; it’s a research assignment you’ve hoisted upon your reader. In effect they are asked to (a) take you at your word, (b) try to guess the right keywords for a ⌘ F
search with whatever contextual clues you’ve provided, or (c) just resign themselves to reading the document in full.
The web has an emerging tool for the precision problem in the text fragment URL. By appending #:~:text=...
to the end of a link, you can direct nearly every modern web browser to scroll to and highlight a specific passage on that page. At first blush, this recent W3C standard seems incredibly useful. Provide a colleague the precise pincite to one consequential fact you spot on line 2416 of a dense environmental report. Or point your future self back to a key insight toward the end of an obscure scientific study that took you multiple reads to appreciate its significance. This looks promising...
But on its own, this technology only exacerbates the permanence problem. It creates a citation so specific that a single punctuation fix on the live page will break it—a phenomenon one might aptly call “fragment rot.”
A truly useful web citation needs to solve both problems at once: it must point precisely to the relevant portion of a cited source, and it must do it enduringly. For that to happen, the source itself must be frozen in time, exactly as it appeared when the citation was made.

Calling All Archivists
Inspired by my earlier work on a similar script for DEVONthink (which also has significant new features and improvements as I'll detail in a follow up post), I saw the potential a text fragment tool could have. I enlisted my trusty AI coding assistant, and within five minutes, I had a working proof of concept. But my initial success was short-lived. I experienced fragment rot firsthand after just a few uses, and then again a few days later. The theoretical risk I’d anticipated was a practical, frequent reality.
This technical frustration soon collided with a much larger concern. Immediately after the presidential inauguration on January 20, 2025, information began disappearing from federal government websites—first sporadically, very soon systematically. Databases and other resources conservationists have long relied on from agencies like the EPA and NOAA, among others, abruptly went dark.
As a public interest environmental lawyer, my work is built on this data. My cases under the Endangered Species Act (ESA), Clean Water Act (CWA), and National Environmental Policy Act (NEPA—RIP) depend on a stable, verifiable administrative record. This wasn’t just another vaguely-menacing news item portending yet more symbolic violence on the Rule of Law. It represented (and still represents) a clear, concrete, and immediate threat to my clients' interests and to the science-based, mission-driven advocacy my colleagues and I have built our careers on.
I had already explored the world of self-hosting enough to have come across ArchiveBox, an open-source tool that creates high-fidelity, personal archives of web content. Its recent beta API made it the perfect engine. But ArchiveBox alone wasn’t sufficient. The URL for each archived snapshot includes a timestamp with microsecond accuracy, making it impossible to predict from the client-side. I needed a custom bridge to sit between my browser and my archive.
The Web Deepcite Tool
My solution is composed of two parts that work together: a script that runs in your browser, and an optional backend you can host yourself.
1. Browser Deepciter (client)
The heart of the system is a single JavaScript file. I run it in Orion Browser using its excellent Programmable Button functionality with a keyboard shortcut, but it works just as well as a standard bookmarklet in any other modern browser.
When you select text on a page and run the script as-is, it assembles and stores in your system clipboard a deepcite formatted in rich text looking like this:
note: the cite is hyperlinked with text fragments to the original:
https://example.com/#:~:text=This%20domain%20is
...
When you configure the script by pointing PROXY_BASE_URL
to your self-hosted backend, and specifying URL_PREFIX
to match your backend’s configuration, it creates a deepcite that looks the same, except that the citation's hyperlink points to the archived webpage.
2. Self-Hosted Backend (server)
The backend pairs a standard ArchiveBox instance with a FastAPI server that I wrote to act as a smart proxy with basic URL shortening / analytics functionality built in. When you create a deepcite, the backend tells ArchiveBox to save a 100% self-contained archive of the page using Singlefile.
When the link is visited, the proxy serves that file after injecting a minimalist banner at the top to indicate:
- archival date;
- any delay between when the citation was made and when the page was archived (this can happen if ArchiveBox had a long job queue or was unresponsive);
- link to archived PDF of page;
- link to original / live page; and
- QR code for archival URL.

Try First On My Demo Server
Setting up a self-hosted server can be a project. To help decide if this is a workflow you'd find useful, you can point the client script to my public demo instance. To do this, configure the variable at the top of the JavaScript file:
PROXY_BASE_URL = 'https://cit.is'
Getting Your Own Setup Running
If you're as excited about this as I am, and want your very own permanent private archival deepciter, head over to my open source code repository to get started:
The README.md
file in that repository provides the canonical step-by-step instructions. The setup process should be familiar to anyone who has dabbled in self-hosting. You will use a standard .env
file to configure the ArchiveBox Docker container, and a config.yaml
file to tell the proxy script where to find your ArchiveBox instance and how to behave. Once configured, you run the services with docker compose
and the proxy script via Python.
Next Steps
This toolkit is already a core part of my own workflow, but I am considering several future improvements and welcome feedback. I'm currently mulling adding the Internet Archive as an alternative to Archivebox, finding a creative way to bypass the need for a server script (perhaps by combining Internet Archive with an API call to a link shortening service), integrating deepcite functionality directly into ArchiveBox (i.e. by forking that project), and building browser extensions for a more polished UX than bookmarklets.
The web's citation problems aren't going away—if anything, the recent wave of government data disappearing has made clear how fragile our digital references really are. Deepcite won't solve every corner case, and setting up your own archive does require some technical effort. But for researchers, writers, and lawyers who depend on precise, durable evidence, the investment in a system you control is, I believe, a necessary one.
UPDATES
2025.06.20
I've added support for using SingleFile directly and bypassing ArchiveBox. In my testing so far SingleFile is faster, more reliable, simpler, and uses a lot less space. I.e., a win/win/win/win. SingleFile is therefore now the default mode.
2025.06.22
I'm excited to share that I'm busy building this out as a subscription service at cit.is. Stay tuned for announcements about a public beta soon. Meantime, please note I'm moving the demo deepcites from https://sij.law/cite/
to https://cit.is/
.
Hey folks, in this post I'm sharing and explaining a custom script I wrote for DEVONthink that greatly assists me when I'm reviewing large batches of Bates-stamped documents (e.g., administrative records or discovery dumps).
Specifically, this script will:
- copy selected text in a PDF to the clipboard;
- determine the Bates number of the page the selected text is on;
- generate a Markdown-formatted Bates cite for that page linking back to the document, page, and specific passage of text that was selected; and
- append this Bates 'deeplink' to the clipboard.
ⓘ About DEVONthink
DEVONthink is comprehensive document management and productivity software exclusive to macOS that has seen continuous updates, improvements, and new features added for over two decades. The developer calls it "your paperless office." That certainly rings true for me—it's been my go-to work and study app since I first started using it in law school 7 years ago.
Example usage.
Suppose I'm reviewing the file FWS 065382–065397.pdf
near the end of a 75,000-page administrative record (true story).
Suppose I find something really incriminating on the sixth page of that document—a candid email exchange where a staff biologist wrote: "If we do this, Franklin's Bumble Bee will go extinct." (fictitious example).
If I select that sentence in the PDF and use the keyboard shortcut I've assigned to my script, ⇧ + ⌥ + C
, the following is placed in my system clipboard:
"If we do this, Franklin's Bumble Bee will go extinct." [FWS 65387](x-devonthink-item://883C7BE0-A328-4818-A4B5-3AF7E5504135?page=6&start=534&length=53&search=If%20we%20do%20this%2C%20Franklin%27s%20Bumble%20Bee%20will%20go%20extinct.).
With Markdown rendering, that becomes, "If we do this, Franklin's Bumble Bee will go extinct." FWS 65387.
Now, if I open my Markdown editor of choice—Ulysses, Obsidian, or DEVONthink itself depending on the task at hand (more on that in a future post)—and paste (i.e., ⌘ v
), the text I selected, plus the correct Bates cite with a deeplink back to the source sentence in the PDF, is inserted.
I can then at any time simply click the Bates cite and get right back to the exact point in that specific document where the incriminating statement is found.
Deeplinks are neat, right? Let's set it up.
Pre-requisites.
- This tutorial assumes you have a macOS computer with Perl and DEVONthink installed. The standard version of DEVONthink will work but I do recommend buying the Pro version, among other reasons, for easier OCR. The agencies I sue often transmit non-OCR'd documents and DEVONthink Pro is a godsend when that occurs.
- The documents you're looking at should be Bates-numbered (though my script has a fall-back mode for non-Bates numbered documents—more on that below).
- The documents should be named according to their Bates starting number or their Bates range. The script I wrote can handle any of these filename conventions:
FWS 000533.pdf
FWS-000533.pdf
FWS_000533.pdf
NMFS 002561-002985.pdf
BLM 45.pdf
AR_45-62.pdf
As you can see:
- the agency prefix doesn't matter;
- the separator between the agency prefix and the pages number(s) can be (space),
-
, or_
; and - the filename can include either the Bates number of the document's first page, or the Bates range of the complete document separated by
-
.
This makes the script flexible enough to cover the file naming conventions I see most often in my legal practice. But I have seen others that my script couldn't feasibly be made to accomodate, for example:
20170912 1813 Redacted.pdf
20170922 0000 CSERC Scoping comment letter transmittal email.pdf
20200828 County of Tuolumne transmittal submission.pdf
Butler and Wooster 2003.pdf
California Resources 2020.pdf
Cayan et al 2008.pdf
Crozier et al 2006.pdf
If you're working with documents that are Bates-stamped but use a different file naming convention, like the example above, the script won't work until you rename the files to a supported naming convention. But when you're wrangling a 2,215-document AR comprising over 75,000 pages (true story), it's infeasible manually rename them. That's why I wrote—
A nifty helper script to handle Bates documents with nonconforming filenames.
This helper script, bates
, will batch-rename folders full of documents while preserving their original filenames as metadata. The Python-language helper script is quite complex in its own right— it can handle PDFs with multiple text layers, non-OCR'd PDFs, DRM-protected PDFs, PDFs where Bates stamps are inserted as annotations, and various different Bates stamp formatting conventions.
I've written complete documentation on installing and using bates
here, but the basic usage goes like this:
bates "~/Cases/My Big Case/Adminstrative Record" \
--prefix "BLM AR " \
--digits 6 \
--name-prefix "BLM " \
--log INFO
In this example, the script will take every PDF file in the folder ~/Cases/My Big Case/Administrative Record
, and search the first and last page for Bates stamps formatted like BLM AR ######
. Once it finds them, it will stash the original file name in the file's Finder comment metadata field, then rename the file BLM {first page Bates number}-{last page Bates number}
.
With this naming convention the DEVONthink script will work.
The Bates cite deeplink AppleScript.
Alright, with the prerequisites in place and our documents abiding a compatible filename convention, the actual script this post is about can work. I've written detailed documentation for it here, but it's actually quite simple to set up:
- Copy the script from here
- Open
/Applications/Script Editor.app
- Paste in the script and save it (e.g. on your Desktop) as
Bates Source Link.scpt
- Open
DEVONthink 3.app
and click the script icon in the menu bar (it looks sorta like§
) →Open Scripts Folder
, or alternatively openFinder
and clickGo
in the menu bar →Go to folder...
and enter~/Library/Application Scripts/com.devon-technologies.think3
- Move the
Bates Source Link.scpt
to theMenu
subfolder you should now see in the Finder window
That's it.
You can use the script now by clicking the script icon (§
) in the menu bar and then Bates Source Link
.
Assign a keyboard shortcut.
... but clicking the script menu and finding the right script each time becomes a bit cumbersome, right? I thought so, too, so let's configure a keyboard shortcut for it:
- In your menu bar, click → System Settings → Keyboard → Keyboard Shortcuts → App Shortcuts
- Select
DEVONthink 3.app
and click+
- Enter exact script name as it appears in the script menu in
DEVONthink
, i.e.,Bates Source Link
- Assign your desired shortcut. A good option that doesn't conflict with default shortcuts is
⇧ ⌥ b
(Option
+Shift
+b
).
Hyperkey
I've assigned ⇪ b
(Caps Lock
+ b
) to this script.Handling non-Bates documents.
Not all documents I work with are Bates stamped, so I made the script handle other documents too.
When a document doesn't follow one of the Bates naming convention detailed above, the script will still work. But instead of determining the Bates number for the active page and formatting a deeplinked Bates cite, it will instead format a deeplinked generic cite like this:
"If we do this, Franklin's Bumble Bee will go extinct." [FWS email thread at 6](x-devonthink-item://883C7BE0-A328-4818-A4B5-3AF7E5504135?page=6&start=534&length=53&search=If%20we%20do%20this%2C%20Franklin%27s%20Bumble%20Bee%20will%20go%20extinct.).
With Markdown rendering, that becomes, "If we do this, Franklin's Bumble Bee will go extinct." FWS email thread at 6.
Conclusion
That's it for this tip, folks. Let me know in the comments if you use DEVONthink and decide to give my script(s) a whirl. And as always, feel free to expand their functionality on my repo at sij.ai. I'd be particularly interested if anyone knows how to handle rich text links in AppleScript, to make this compatible in other apps besides Markdown editors.
Cheers!
〄