Portage / Routes / Squarespace → Astro

// Route specification

Squarespace Astro

A crossing for a Squarespace blog: posts, pages, images, and categories mapped to Astro content collections via the WordPress XML export — with a crawl fallback for the pages the export leaves on the dock.

Beta portage v0.9.2
Source
Squarespace 7.0 / 7.1
Method
WordPress XML (WXR) + crawl
Output
Markdown · MDX
Target
Content collections
Spec rev
2026.06 · r2

01Overview

This route carries a Squarespace blog into an Astro project as content collections backed by Markdown or MDX. The primary source is Squarespace's WordPress-format XML (WXR) export — a single, parseable file — supplemented by an optional crawl of the live site for content the export omits.

Squarespace's export is deliberately narrow: it covers the primary blog and basic pages, and little else. Portage treats the WXR as the spine of the crossing and the crawl as connective tissue, then reconciles both into one manifest before writing Astro-native files.

Moves intact

Published blog posts, basic pages, inline images, categories and tags, authors, publish dates, and the blog permalink structure.

02Export & crawl

A · WordPress XML export

From Settings → Import & Export → Export → WordPress, Squarespace produces a .xml (WXR) file. Portage parses each <item>: title, wp:post_name (slug), content:encoded (body), excerpt:encoded, pubDate, category (tags & categories), dc:creator (author), and wp:status.

extract · wxrbash
$ npx portage extract --from squarespace \
    --export ./Squarespace-Wordpress-Export.xml --to ./astro-project
→ 96 posts · 5 pages · 22 categories · 3 authors · 204 images referenced

B · Crawl fallback

Because the export omits index, gallery, portfolio, album, event, and commerce collections, Portage can crawl the live site from its sitemap.xml to recover pages the WXR leaves out, parsing rendered block HTML and SEO tags.

extract · wxr + crawlbash
$ npx portage extract --from squarespace --export ./export.xml \
    --crawl https://www.example.com --to ./astro-project
One blog per export

Squarespace exports only the primary blog page. If the site runs multiple blogs, export each separately, or lean on the crawl to recover the rest.

03Content mapping

Each WXR <item> becomes one Markdown/MDX file. Posts and pages split into separate collections by wp:post_type.

WXR fieldTypeAstro frontmatterNotes
titlestringtitleRequired.
wp:post_namestring(filename)Slug & entry id.
excerpt:encodedhtmldescriptionFalls back to a generated excerpt.
content:encodedhtml(body)Converted to Markdown/MDX — see §04.
pubDate · wp:post_datedatepubDateCoerced by zod.
category[domain=post_tag]string[]tags
category[domain=category]string[]categoriesDisambiguated by domain attribute.
dc:creatorstringauthors
wp:statusenumdraftdraftdraft: true.
wp:post_typeenum(collection)post → blog; page → pages.
(first inline image)urlheroImageDerived — WXR has no feature-image field; --hero controls it.
(crawled <title> · meta)stringseoOnly when --crawl is on — SEO meta is not in the WXR.

04Content transforms

Squarespace stores body content as content:encoded HTML — rendered blocks wrapped in sqs-block containers with inline styles. Portage strips the wrapper markup and inline style attributes, keeping the semantic content, then converts to Markdown. Common blocks map as follows.

Squarespace blockOutput (Markdown)Output (MDX)
text / markdownParagraphs & listsSame
image + caption![alt](src) + caption<Figure>
gallery / summaryImage sequence<Gallery>
codeFenced blockFenced block
quoteBlockquoteBlockquote
buttonMarkdown link<Button>
embed (YouTube, …)Link fallback<Embed>

05Assets

Squarespace serves images from images.squarespace-cdn.com with a size query (?format=750w, ?format=original). Portage resolves every reference, fetches the original, and rewrites the path.

  • Strip the format query — the original is requested with ?format=original and Astro regenerates responsive sizes.
  • Download & dedupe — assets are content-hashed; duplicates collapse to one file.
  • Rewrite references — feature and in-body images point at local paths.
  • External images stay remote unless --images localize-external.
Placement

--images assets (default) writes to src/assets/blog/ for full Astro optimization. --images public writes stable URLs to public/images/.

06Routes & redirects

Squarespace namespaces blog posts under a collection prefix (often /blog/). Portage preserves the structure by default and emits a redirect map for anything that changes.

Squarespace routeAstro routeHandled by
/blog/{slug}/blog/{slug}/Blog collection (--route-base)
/{page-slug}/{page-slug}/Pages collection
/tag/{slug} · /category/{slug}/tag/ · /category/Generated taxonomy pages
?format=rss/rss.xml@astrojs/rss + redirect
  • Slug prefix — preserved by default; flatten with --route-base /.
  • Existing 301s — Squarespace's built-in URL mappings can be exported and merged into the redirect map.
  • Trailing slashes & sitemaptrailingSlash: 'always' and @astrojs/sitemap configured to the new base URL.

07Out of scope

The WXR export is narrow by design, and much of Squarespace has no static equivalent. Portage migrates content and reports — explicitly — what it could not carry.

  • Commerce — products, checkout, inventory, and orders.
  • Member areas & paywalls — gated content and the auth around it.
  • Index, portfolio, gallery, album, event & audio collections — absent from the WXR; partially recoverable via crawl.
  • Site styles — design, custom CSS, fonts, and layout (rebuilt in Astro).
  • Forms, scheduling (Acuity), donations, and other dynamic features.
  • Secondary blogs — only the primary blog exports.
Set expectations

Treat the export as the spine and the crawl as connective tissue. Expect to hand-place a few pages the export and crawl can't fully reconstruct — Portage lists every one of them in the manifest.

08Edge cases

  • Pages missing from the export — recovered from the crawl when --crawl is set; otherwise listed as gaps.
  • No feature-image field — hero is derived from the first inline image; disable with --hero none.
  • SEO meta absent from WXR — recovered via crawl, else left to Astro defaults.
  • Drafts — Squarespace doesn't export unpublished posts; only published items arrive.
  • Tags vs. categories — both arrive as <category>; split by the domain attribute.
  • Inline styles & sqs- wrappers — stripped to leave clean semantic markup.
  • CDN size variants — collapsed to the downloaded original.

09Output

A predictable, buildable Astro project, with a manifest ledger at the root.

project treeoutput
astro-project/
├── src/
│   ├── content/
│   │   ├── blog/             ← 96 posts (WXR)
│   │   └── pages/            ← 5 pages (WXR + crawl)
│   ├── assets/blog/          ← 171 localized images
│   ├── components/portage/   ← MDX block stubs (if --content mdx)
│   └── content.config.ts
├── public/_redirects        ← 18 generated redirects
├── portage.manifest.json    ← extract / transform / load ledger
└── astro.config.mjs         ← trailingSlash · sitemap · redirects

Content collection schema

src/content.config.tstypescript
import { defineCollection, z } from 'astro:content';
import { glob } from 'astro/loaders';

const blog = defineCollection({
  loader: glob({ pattern: '**/*.{md,mdx}', base: './src/content/blog' }),
  schema: ({ image }) => z.object({
    title: z.string(),
    description: z.string(),
    pubDate: z.coerce.date(),
    heroImage: image().optional(),
    tags: z.array(z.string()).default([]),
    categories: z.array(z.string()).default([]),
    authors: z.array(z.string()).default([]),
    draft: z.boolean().default(false),
    seo: z.object({ title: z.string().optional(), description: z.string().optional() }).optional(),
  }),
});

export const collections = { blog };

Sample migrated post

src/content/blog/harbor-notes.mdmarkdown
---
title: "Harbor Notes, Week One"
description: "First impressions after leaving the builder."
pubDate: 2025-09-30
heroImage: ../../assets/blog/harbor-notes.jpg
tags: ["field-notes"]
categories: ["Journal"]
authors: ["Dana Reyes"]
---

The chart table is clearer than the dashboard ever was…

10CLI

Three stages, run in order. Add --crawl at extract time to fill the export's gaps.

portage · squarespace → astrobash
$ npx portage extract --from squarespace --export ./export.xml --crawl $URL --to ./astro-project
$ npx portage transform --schema content-collections --content mdx
$ npx portage load --images assets --redirects netlify
FlagValuesDefaultPurpose
--exportpathWordPress XML (WXR) file. Required.
--crawlurlRecover pages & SEO the WXR omits.
--route-basestring/blogPreserve or flatten the blog prefix.
--herofirst-image · nonefirst-imageHow heroImage is derived.
--contentmarkdown · mdxmarkdownBody format & block handling.
--imagesassets · public · localize-externalassetsImage placement.
--dry-runflagoffPlan & diff only.

11Verification

Every crossing is auditable. The dry-run reconciles the export against the crawl so you can see exactly what each source contributed.

  • Dry-run first--dry-run prints the plan and a diff; nothing is written.
  • Source attribution — the manifest records whether each entry came from the WXR, the crawl, or both.
  • Counted on, counted off — gaps (export-only collections, missing SEO) are listed explicitly, never hidden.
portage load --dry-runbash
$ npx portage load --dry-run
→ 96 posts (wxr) → 96 files     ✓ reconciled
→ 5 pages (wxr+crawl) → 5 files ✓
→ 204 images → 171 unique       ✓ deduped
→ 18 redirects mapped           ✓
→ 2 gallery collections         ⚠ not in export — crawl or hand-place