Home AI Server

This post is an excerpt/cross-post from a blog post on Zettelwirtschaft

April 28th, 2025 I finally gave in to my long coveting of the cylindrical Mac Pro. I bought one on eBay for $69, upgraded it to 12 cores and 128gb of ram and spun up a home AI server. I named it Jai. You can interact with him just like ChatGPT or Claude or whatever via a chat interface.

shiny steel cylinder mac pro
image from ebay somewhere

I am slightly obsessed with this computer; it’s so elegant. A silent steel canister, breathing warm air when he’s thinking. It feels like the egg of Hal 9000 or the warp core of a spaceship.

Local AI

A lot of the ills of the AI boom can be remedied by running an AI server at home. The energy consumption is way lower and during the winter it heats our home. There’s no data harvesting, all our queries stay in-house. We can run uncensored models, not limited by what CEOs think it’s ok to talk to AI about. And other than the hardware and electricity cost, it’s free! I love how many different models there are to play around with and how easy it is to swap them in. I still use Claude and Cursor but any query I think Jai can handle I send to him.

The world could do with more tech decentralization, including AI.

Vanitas

Jai lives atop a kinda gothic dresser with various objects arranged around him. My roommate-at-the-time Claire Bates remarked that it looked like a vanitas. So I searched “vanitas” which Wikipedia informed me,

Vanitas is a genre of memento mori symbolizing the transience of life, the futility of pleasure, and the certainty of death, and thus the vanity of ambition and all worldly desires. The paintings involved still life imagery of transitory items. The genre began in the 16th century and continued into the 17th century. Vanitas art is a type of allegorical art representing a higher ideal.

Wow, how true that is. So I’ve been leaning into it, developing dramatic analogies ad nauseum.

Technically

It’s funny that the 2013 Mac Pro has dual graphics cards and AI graphics cards are driving the mushrooming of the worlds most valuable company right now and… I’m not using either of them in my AI server! They’re too old. But the 12 core CPU is so far totally sufficient for inference.

The setup

Jai is running Debian. For some reason, for some models, the pipwrapper of llama.cpp runs faster than pure llama.cpp. I keep 3 models loaded in ram via screen and llama.cpp:

gpt-oss-120b
This is the larger of the two models released by Open AI this past summer. I think of this as the main Jai. I run it with these flags:

./build/bin/llama-server \
--model ../jai/gpt-oss-120b-F16.gguf   --host 0.0.0.0   --port 8000   --ctx-size 131072   --threads 22   --threads-batch 22   --batch-size 256   --ubatch-size 64   --cache-type-k q8_0   --cache-type-v f16   --mlock   --no-mmap --jinja 

I get about 4 tokens per second with this which feels totally adequate for many requests. I’ve found this model to be quite capable to take over a good chunk of queries I would have sent to Claude. I give Jai coding tasks (such as the tribunal randomizer), SVG generation, rubber ducking whatever’s on my mind, generating spreadsheet formulas, bash scripts, being a conversational thesaurus, etc.

OLMo-2-0425-1B-Instruct-Q4_0
Olmo is a family of models released by Allen AI (started by Paul Allen formerly of Microsoft). I like that it is not only open weights but they released the training data and code as well. They have up to a 32B parameter model but I’m running a Q4 quant of the 1B version because it replies nearly instantly (25-30t/s) and takes up less than a gb of ram.
I refer to this one as “Jai wrong answers only”. I love the abstract shapes it renders when you ask for an SVG. And sometimes it gives me a legitimately decent answer!

Qwen3-30B-A3B-abliterated
This is an abliterated version of the Qwen3 model from Alibaba.
I like keeping a model around that will answer any question. Even gpt-oss wont answer questions about weed sometimes even when you tell it weed’s legal in Vermont. This model is not as fast or smart as I would expect for a 30B model but it’s good enough.

Other models
I’ve played with GLM Air and it was really powerful but I got upset when I hit up against the limitations of Chinese censorship. I felt I had to stop using it on principal. But I might see if I can finagle the new Kimi 1 trillion parameter model for coding and keep other models around when I want to talk about Tianamen Square.

Open Web UI

Each of these models is connected to Open Web UI also running on Jai via screen. I have Open Web UI linked to SSO so anyone with a Bindle Institute login can access Jai. I didn’t want to deal with the security implications of opening this service on the public internet so it’s only accessible within my home network at http://web.jai.

Open Web UI is a remarkably powerful Open Source web client for LLMs; I haven’t even scratched the surface of all the features it has, like tool-use and audio interaction. In terms of UI I don’t feel I’m missing anything compared to the big AI platforms.

Email responder

The initial premise was for Jai to reply to emails. I vibe coded a python script with Claude and now Jai has his own email address. Interacting with AI via the familiar email interface feels more human than a web chat gui. Having to wait hours or days for a response to an email is a cadence we accept from other humans, so why not AI? Being able to CC Jai into an email thread with multiple people is a unique AI interaction mode.
During the art opening for Jai I had my giant CRT media cart display just the <thinking> output from Jai when he receives an email. I think it’s interesting to reveal so publicly the thoughts someone is having while writing an email.

And that’s all for now! More updates about Jai will appear on Zettelwirtschaft and maybe also here.

 

Weekly Public Hours Every Thursday Evening!

screenshot of an image on wikipedia of a few guys standing around a table with ice cream on it with the caption 'people eating ice cream at a lan party'

We realized we hadn’t updated the site to note that open hours have resumed!  Come hang out with us every Thursday from 7pm-9pm (and probably later) at our space at 12 North st. (up the stairs at the side of the building.

Folks will be there working on personal projects, fixing things, chatting, eating, playing games and just hanging out. Come through!!

February 2023 Activities Update

We set up a Mastodon account. Toot @ us here.

Regular Activities:
  • Public Hours happening every Thursday from 7pm – (at least) 9pm in the space.
  • Repair Cafe 3rd Saturday of the Month from 11am-3pm. Volunteers and broken things welcome!
  • Monthly Member Meeting 2nd Saturday of the Month at 7pm, sometimes a potluck starting at 6pm. Drop a line about attending if you are interested in learning about joining the Lab.
  • Our most frequently active public online space is our Discord Server.
Possible Additional Activities for the Spring
  • Revival of the bike glow-up and ride (late April)
  • Some folks are talking about a monthly textile night

Free and or Open Source Game Night

Join Lab B every Thursday night at 7pm for FOSS (free + open source software) (or sometimes closed source but free) game nights. Meet up on our discord and we’ll go from there. If we’re planning on a specific game it will be listed on our events page

 

EDIT: regular (weekly, unstructured, in-person) open hours have resumed!

Third Old North End Repair Cafe this Saturday August 15th!

The next Old North End Repair Cafe is this Saturday, August 15th from 11am-3pm. Come on down and volunteers will do our best to repair anything you can carry in! And if you’d like to help out, please fill out our volunteer interest form (free snacks for volunteers!). This is an outdoor, socially distant event and masks are required.

Also, going on concurrently will be Degrowth Fest which Repair Cafe will be a part of. We’ll have a poster illustrating the merits of repair 🙂

More info on our Repair Cafe page.

 

Second Old North End Repair Cafe this Saturday!

Hello everyone, here are the results for our first Repair Cafe last month. 70% success rate is pretty good! Especially for our first repair cafe. It looks like 29% bicycle but that’s actually sewing!

The next Old North End Repair Cafe is coming up THIS SATURDAY, July 18th! Come on down to 12-22 North st between 11am and 3pm (we can only accept new repairs until 2:15pm)

More info on our Repair Cafe page

Repair Cafe June 20th

Laboratory B will finally hold our first Repair Cafe this weekend on Saturday June 20th. Assuming the weather is clear (we will decide Thursday night whether to postpone) we’ll have tables socially distanced apart for volunteers to repair whatever people show up with! Whatever you can carry with you, electronics, bicycles, furniture, small appliances, etc,  we’ll do our best to repair it for you. All for free!

If you’d like to volunteer please fill out our volunteer interest form.

  • Saturday, June 20th
  • 11am-3pm
  • 12-22 North st

The main Lab B Repair Cafe page is here

Warming up the Foundry!

When Laboratory B got started we were excited about the possibility of other hacker/maker/community workshop spaces starting up and sustaining in Vermont. That’s why we created Vermont Hackerspaces Inc as non non-profit designed to help others do great things. The Foundry is a community workshop getting started in the Northeast Kingdom. Building on the grit, and hard work they are going to bring together a community of creators, tinkers, crafter, artist and entrepreneurs. This great community is going to build a great new space for creation and innovation!

On July 7th, Vermont Hackerspaces Inc, agreed to become the Foundry’s fiscal sponsor while they get started. The Foundry is looking to develop it’s own 501(c)3 but it’s a long when from getting going to handling your own books. Check out the website or  Foundry’s facebook page and Foundry Info Pack for more info.