Stop Dropbox syncing node_modules with `find` and Hammerspoon

This shouldn't be necessary in 2023, but here we are

I’ve been a Dropbox customer for basically a decade. They’ve never lost my data! It’s been a good service, despite their growth and the frustration of dealing with their ever-noisier (and nosier) desktop client.

However, my main frustration with Dropbox is that there’s no way to tell it to ignore any file or folder that matches a particular pattern.

tl;dr: You can see my Hammerspoon config on my Github if you wanna skip to the good stuff.

There’s no such thing as a .dropboxignore file where you can tell it not to sync any files named, for instance, node_modules.

This is a problem: node_modules directories are usually wide and deep, full of thousands of small text files. It takes Dropbox forever to analyze the files and decide what to sync. While it’s working, your CPU heats up and your battery is depleted and you often end up with what Dropbox thinks are file conflicts.

If you’re working on a project and Dropbox decides you’ve got conflicting files, it can actually break a project.

I’ve even had Dropbox break an entire git repository! Git also stores many small files in deep directory structures, and if you’ve got a big repo or a flaky connection, Dropbox can mark a file inside the repository as conflicting, resulting in a corrupted repository and lost data.

They do provide a fix…but it requires manually modifying files’ extended attributes, one at a time, and their docs on the subject aren’t even accurate.

So, how can we get Dropbox to ignore all files of a certain name, a la .gitignore?

Edit extended attributes of the files you already have

First, let’s figure out what it takes to force Dropbox to ignore folders individually.

In order to tell Dropbox to stop syncing a file, you have to set an extended attribute on the file with your handy-dandy terminal emulator. (Extended attributes are really interesting if you want to know how they work!)

If you’re on MacOS >= 12.3, you can set the attribute using the xattr command (note that <this> denotes a placeholder; replace it with your file path, sans angle brackets):

xattr -w 'com.apple.fileprovider.ignore#P' 1 <path/to/your/file>Code language: Bash (bash)

This writes (-w) an extended attribute for the file at path/to/your/file. It sets the attribute’s key to com.apple.fileprovider.ignore#P, and its value to 1.

You can verify the attribute was written by running:

xattr -p 'com.apple.fileprovider.ignore#P' <path/to/your/file>

The -p option prints the value of the attribute at the given key. It should print 1.

Using find to render many files un-syncable as the Titanic

If you have a lot of projects in your Dropbox folder, it’d be a pain (maybe even impossible!) to do that on every folder individually. Who has time for that? Not me.

Instead, let’s use find to add that attribute to every folder that matches a certain set of criteria.

Side note

Consider setting these attributes for files elsewhere in your filesystem, before copying the files to Dropbox. That way, you can drag-and-drop your folders into Dropbox without your computer bursting into flames while it tries to sync 500 megs of node modules.

If you’re like me and you love overcomplicating things, you can use rsync:

rsync -aqE ~/Code/ ~/Dropbox/Code/

-aqE here tells rsync to preserve file permissions and extended attributes, and logs only errors to stdout.

(You could also just turn off Dropbox syncing while you fiddle with your files.)

Getting started

You can use find to look for every folder that matches a given pattern, and set the attributes there. Like this:

find -E <path> -regex '.*node_modules|.*\.git' -type d -print -pruneCode language: PHP (php)

Replace <path> with your own projects directory (minus the angle brackets). Here’s what it does:

  • find -E tells find to use “extended” regular expression syntax
  • <path> sets the path you’re searching through
  • -regex '.*node_modules|.*\.git' looks for any files named node_modules or .git with any amount of preceding text (for instance: projectname.git will match, instead of just .git)
  • -type d limits the results to directories
  • -print prints the path it finds
  • -prune tells find not to descend into subdirectories. Without prune, find would print all matches of e.g. node_modules — of which there will be many, since node_modules nest module dependencies in more node_modules folders. prune tells find, “stop descending the directory tree after you find the first match.”

Side note: find is really powerful and cool. Julia Evans even wrote a zine with a nice cheatsheet for it. The GNU findutils docs are also great. Long, but great.

If you’re happy with the list of files that appears, you can tell find to execute a command for every result with -exec:

find -E <path> -regex '.*node_modules|.*\.git' -type d -print -prune \
  -exec xattr -w 'com.apple.fileprovider.ignore#P' 1 {} \;Code language: Bash (bash)

We’re setting the extended attribute like before, but now we’re doing it on every single find result. Couple details:

  • {} is a placeholder for whatever file find is currently looking at
  • \; tells find that this is the end of the expression after -exec; we escape it with a backslash so that it’s not misinterpreted by the shell

There ya go! Now all the .git and node_modules directories are ignored by Dropbox. If you were doing this in your Dropbox directory, you should now see a little cloud with a line going through it next to the file names in Finder. Like this:

Screenshot of Finder showing the node_modules folder with a little cloud icon to its right. The cloud icon has a slash through it, indicating Dropbox isn't syncing the node_modules folder.

Finding files that meet complicated criteria

That regex is sort of a blunt instrument and doesn’t handle more complex conditions.

For instance, what if there’s a project where we’re actually writing code inside node_modules? (I’ve never done it but who knows! Maybe you have!)

Also, what if there are more folders we’d like to find? Maybe they’re different names. Or what if we need to check for other conditions?

find -exec has you covered!

To proceed, are three things to know about -exec (and its sibling, -execdir):

  1. They execute any script you give them on each result find discovers. (You can change this behavior but we don’t need to do that here!)
  2. If the script executed by -exec returns a 0 exit code, it is considered “true” and the result find is operating on is considered a match. Anything other than a 0 exit code is “false”, and find moves on to the next result.
  3. You can string multiple –exec (and other!) primaries together, and find stops executing other primaries for the current result on the first “false” (that is, non-zero) -exec result.

This means you can string together complex conditions, and find will only operate on the results that match all conditions!

For instance, you can tell find, “Find every directory in this folder that’s named vendor that is not ignored by Git, and set an extended attribute on them.”

In fact, I’d like to do just that.

I’ve got lots of old projects where I used a directory named vendor to contain copy-pasted JavaScript dependencies. (This was before NPM was a thing!) I want to sync those folders.

But I also have lots of projects that use Composer, the package manager for PHP projects. It stores dependencies in a vendor directory automatically. I do not want those in my Dropbox.

A good heuristic for this is: is vendor in my Git repository?

Here’s what I want to accomplish: if a vendor directory is considered part of my source tree, sync it to Dropbox; if not, then don’t.

Let’s string together a find command to take care of this for us.

find <path> -name 'vendor' -type d -prune \
    -exec git -C {} check-ignore -q {} 2>/dev/null \; -printCode language: Bash (bash)

Here we’re telling find to look for directories named vendor and then, for each result, execute git -C <the found vendor directory> check-ignore -q <the found vendor directory> 2>/dev/null. Here’s what that command does:

  • git -C {} executes the git binary from within the provided path. {} is a placeholder used by find that, when it executes the command after -exec, is substituted with the path to the file find is currently looking at (in this case, a directory called vendor somewhere inside <path>).
  • check-ignore -q <path> is a Git command that tells us if the <path> is currently ignored. -q tells git to silence the output; normally, git returns the path if it’s ignored, along with a 0 exit code. A 1 exit code indicates that the path is not ignored and is therefore in the project’s source tree.
  • 2>/dev/null redirects stderr output to /dev/null. This is useful in case a found vendor directory is not in a Git repository (in which case git prints an error and exits with a non-zero exit code). find seems to print stderr output in the results, which is weird, but whatever — we silence it this way.

I think this is what happens when we run that command:

  • find looks at every single file in <path>, one by one. For every file…
  • Test whether it matches the condition -name vendor. If it does not, move to the next file
  • If its name is vendor, is it a directory? If not, move to the next file
  • If it is a directory named vendor, do not descend further into the directory tree
  • Execute the script between -exec and \;, substituting {} for the current path. Did it return 0? Then this is a match
  • If all the previous steps succeeded, print the path

find moves on to the next file after the first failure of any primary.

The final puzzle piece: you can string multiple -execs together! So we can check if a file is gitignored, and if so, run subsequent commands (like modifying extended attributes).

find <path> -name 'vendor' -type d -prune \
    -exec git -C {} check-ignore -q {} 2>/dev/null \; \
    -exec xattr -w 'com.apple.fileprovider.ignore#P' 1 {} \; \
    -printCode language: Bash (bash)

There! Now every directory named vendor that is gitignored should not be synced to Dropbox. Nice!

We could add other exec calls or conditions above that last exec that actually sets the extended attribute; if any of them fail, the final exec is never called.

Alternation

Finally, find has logical operators, so you can combine multiple conditions to write one big ol’ find script:

find <yourpath> \
	-name 'vendor' -type d -prune \
	-exec /usr/bin/env git -C {} check-ignore -q {} 2>/dev/null \; -print \
	-exec xattr -w 'com.apple.fileprovider.ignore#P' 1 {} \
	-or -name 'node_modules' -type d -prune \
	-exec /usr/bin/env git -C {} check-ignore -q {} 2>/dev/null \; -print \
	-exec xattr -w 'com.apple.fileprovider.ignore#P' 1 {} \
	-or -name "__pycache__" -type d -prune -print \
	-exec xattr -w 'com.apple.fileprovider.ignore#P' 1 {} \
	-or -name ".sass-cache" -type d -prune -print
	-exec xattr -w 'com.apple.fileprovider.ignore#P' 1 {} \
Code language: PHP (php)

This finds any vendor directories ignored by git, any node_modules directories ignored by git, any Python or Sass cache directories, writes the magical Dropbox extended attribute, and prints the found path on success.

Ignoring parents

You could also use execdir to ignore parents of certain files.

For instance, if you have Rust projects and you don’t want to sync Rust’s cache directories, you can tell find to take care of it like so:

find <path> -type f -name 'CACHEDIR.TAG' -execdir xattr -w 'com.apple.fileprovider.ignore#P' 1 . \; -print

-execdir executes the provided command within the parent directory of the found file.

So this command:

  • Looks in <path> for any file (-type f) named ‘CACHEDIR.TAG’
  • If the current file matches that name, we execute the provided command in the parent directory of that file. So we can use the . placeholder for the current directory, instead of {}, to execute the xattr command with . as the target.

Neat!

Deleting the attribute

If you change your mind and would like Dropbox to sync a folder, you can do so with xattr:

xattr -d 'com.apple.fileprovider.ignore#P' <yourfile>Code language: HTML, XML (xml)

-d here deletes an attribute and its value. You can pass that command to find as above if you’d like Dropbox to sync all the files you just un-synced.

Automator

You can even use the MacOS Automator to execute xattr as a Quick Action, so you can change the attributes one-by-one via Finder. A good tutorial for that is here, but you’d need to substitute the attr command in his article for the xattr command above, as the article above is outdated.

I hate this. Why do I have to do this every time.

Well…you don’t!

You can use Hammerspoon to automate it.

Hammerspoon is a multitool for MacOS. You an use it for scripting and automations and even GUI interactions. You write scripts in Lua, and it executes ’em.

Hammerspoon has a file watcher utility that recursively detects file changes and can do something in response. That’s exactly what we need here.

Why not Automator?

Automator has a file watcher utility, but it doesn’t watch files recursively. So, for instance, if you’re in a directory that’s inside the folder Automator watches, and you npm install, Automator won’t see the new node_modules folder being created. Which means you can’t set the extended attribute on the file. Which means Dropbox will still sync its contents and ruin your life.

You can use Automator to add actions to Finder to sync or ignore files one-by-one, though, which I also did (see above).

Why not something else?

Why not indeed! I haven’t used them, but I’m sure ScriptKit, Mjolnir, Phoenix, and whatever other automation tools have similar functionality, if you’re more comfortable in another tool.

I like Hammerspoon because it’s lightweight, mature, and under active development. Lua is a nice and simple language that I also use in NeoVim plugins, so that’s nice too.

Once you’ve installed Hammerspoon, you can add the below scripts to your .hammerspoon directory and then reload your config.

The script

Here’s where my Dropbox script is currently. I’ve added comments to explain a bit about what’s going on. This script sets up a file watcher that recursively watches the path you provide it. Whenever you add or delete files (e.g. by running npm install or git init), the file watcher will see the changes, and check them against a list of files you provide. If any of the created files match any of the names in your list, it’ll set the extended attribute on the file, blocking it from syncing to Dropbox!

In ~/.hammerspoon/dropbox.lua:

--- Watches a folder for changes, and if the changed files or directories fall
--- in a directory in `ignore` or they're in a parent in `ignoreParent`, sets
--- writes the extended attributes on the changed file to instruct Dropbox to
--- ignore the file.
---@class Dropbox
---@field target string The target directory to watch
---@field ignore { [string]: boolean|fun(path: string): boolean }? Files to ignore. These files will be ignored by Dropbox as soon as they're created if their value is true, or the value is a function that returns true. False does nothing.
---@field ignoreParent { [string]: boolean|fun(path: string): boolean }? Same as `ignore`, but instead sets the *parent* of a file to be ignored. Useful for cache indicators in e.g. Rust.
local Dropbox = {}
-- By default, the module watches your Dropbox/Code directory.
Dropbox.target = os.getenv('HOME') .. '/Library/CloudStorage/Dropbox/Code'
-- The list of files we want to mark as ignored when they're created as a boolean.
-- We can also pass a function that takes the file path and returns a boolean.
Dropbox.ignore = {
	['node_modules'] = true,
	['.git'] = true,
	['__pycache__'] = true,
	['.sass-cache'] = true,
	['vendor'] = function(path)
		return Dropbox.isGitIgnored(path)
	end,
}
-- The list of files whose *parents* should be marked as ignored when they're created.
Dropbox.ignoreParent = {
	['CACHEDIR.TAG'] = true
}

--- Construct a new dropbox directory watcher.
---@param target string? The directory to watch for changes.
---@param ignore { [string]: boolean|fun(path: string): boolean }? Files to ignore. These files will be ignored by Dropbox as soon as they're created if their value is true, or the value is a function that returns true. False does nothing.
---@param ignoreParent { [string]: boolean|fun(path: string): boolean }? Same as `ignore`, but instead sets the *parent* of a file to be ignored. Useful for cache indicators in e.g. Rust.
---@return self
function Dropbox:new(target, ignore, ignoreParent)
	-- This is some funky lua stuff to emulate object inheritance
	local instance = {}
	setmetatable(instance, self)
	self.__index = self

	instance.target = target or self.target
	instance.ignore = ignore or self.ignore
	instance.ignoreParent = ignoreParent or self.ignoreParent

	return instance
end

-- Starts the Hammerspoon path watcher
function Dropbox:start()
	Dropbox.watcher = hs.pathwatcher.new(
		self.target,
		function(paths, flagTables)
			self:changeHandler(paths, flagTables)
		end
	):start()

	return self
end

-- When the path watcher detects a change, we handle the change event here.
function Dropbox:changeHandler(paths, flagTables)
	local function shouldHandleChange(changes, localPath)
		return Dropbox.fileExists(localPath) -- Does the file exist?
			and (changes['itemCreated'] or changes['itemRenamed']) -- Was it created or renamed?
			and not changes['itemXattrMod'] -- Was it an xattribute modification? (This prevents an endless loop)
	end

	for i, path in ipairs(paths) do
		local changes = flagTables[i]

		if shouldHandleChange(changes, path) then
			-- shouldBeIgnored returns a tuple of booleans.
			local ignoreFile, ignoreParent = self:shouldBeIgnored(path)

			if ignoreFile then
				hs.fs.xattr.set(path, 'com.apple.fileprovider.ignore#P', '1')
			end

			if ignoreParent then
				hs.fs.xattr.set(Dropbox.getParent(path), 'com.apple.fileprovider.ignore#P', '1')
			end
		end
	end
end

---@param path string File to check against the `ignore` and `ignoreParent instance tables.
---@return boolean ignoreFile, boolean ignoreParent True if file should be ignored by Dropbox; false if not.
function Dropbox:shouldBeIgnored(path)
	local ignore = {}
	-- hs.fs.displayName returns the filename or nil if the file does not exist.
	-- We grab the value at the key matching the filename in the ignore and ignoreParent tables.
	local file = self.ignore[hs.fs.displayName(path)] or false
	local parent = self.ignoreParent[hs.fs.displayName(path)] or false
	-- If the value is a function, execute it with the path as its sole argument.
	if type(file) == 'function' then
		ignore.file = file(path)
	else
		ignore.file = file
	end

	if type(parent) == 'function' then
		ignore.parent = parent(path)
	else
		ignore.parent = parent
	end

	return ignore.file, ignore.parent
end

--- Checks if a file exists at a given path. Uses hs.fs.displayName, which
--- simply returns `nil` if a file does not exist, rather than erroring.
--- @param path string
--- @return boolean exists
function Dropbox.fileExists(path)
	return not not hs.fs.displayName(path)
end

--- Runs `git status` inside a file (or its parent) to determine if the path
--- is in a git repository. `git status` errors if no repo is found in the
--- directory tree.
---@param path string The path to check.
---@return boolean isInGitRepo
function Dropbox.isInGitRepo(path)
	local _, status = hs.execute(string.format(
		[[ /usr/bin/env git -C %q status --porcelain > /dev/null 2>&1 ]],
		Dropbox.getContext(path)
	))

	return status or false
end

--- Runs `git check-ginroe` inside a file (or its parent) to determine if the path
--- inside the directory or the parent of the file. Returns true if the file is
--- gitignored. Should return false if file is not in git repo.
--- @param path string The path to check.
--- @return boolean ignored True of the file is ignored; false if not.
function Dropbox.isGitIgnored(path)
	local _, status = hs.execute(string.format(
		[[ /usr/bin/env git -C %q check-ignore -q %q > /dev/null 2>&1 ]],
		Dropbox.getContext(path), path
	))

	return status or false
end

---@param path string Path to a file.
---@return string path The path if it is a directory, or its parent if not.
function Dropbox.getContext(path)
	if Dropbox.isDirectory(path) then
		return path
	end

	return Dropbox.getParent(path)
end

---@param path string Path for which to get parent.
---@return string parent The parent of the given path.
function Dropbox.getParent(path)
	-- Path string up to the last forward slash
	return string.match(path, '^(.+)/')
end

---@param path string The path to check.
---@return boolean isDirectory True if the file is a directory; false if not.
function Dropbox.isDirectory(path)
	return hs.fs.attributes(path, 'mode') == 'directory'
end

return DropboxCode language: Lua (lua)

Finally, in our ~/.hammerspoon/init.lua, we can use the module like this:

local dropbox = require('dropbox')
dropbox:new():start()Code language: Lua (lua)

Alternatively, you can pass in some different options if you’d like to override the defaults:

local dropbox = require('dropbox')
dropbox:new(
	'~/Dropbox/src', -- A different target to watch
	{
		['some_directory'] = true,
		['another_directory'] = function () return true end
	}, -- A different set of files to ignore on creation
	{ ['PRIVATE.DIR'] = true } -- A different set of file parents to ignore on creation
):start()Code language: Lua (lua)

And that’s it!


Fin

Couple new tools in my ole toolbelt:

  • find to recursively search for files that meet certain criteria, and change them so that Dropbox ignores them
  • Hammerspoon to watch a directory and proactively change the files in a list as soon as they’re created
  • Better familiarity with find, Lua, and Hammerspoon

Maybe you find this useful. Or maybe you find it buggy or inaccurate! Please tell me if you do. I’d love to hear it.