This is the first post in a series about the Webpage Text API that I recently launched.

I started building the Webpage Text API with these design goals:

Scalability: I need to be able to support a large number of clients and a large number of articles.

Caching: If an article is read by one client, it will likely be read by other clients. When feasible I want to retrieve the webpage and generate its webpage text result just once.

The core of the Webpage Text API contains two types of servers:

Retriever: A retriever is a server that knows how to take a single article URL and generate its webpage text result.

Frontend Server: A frontend server accepts requests from Unread and from other clients. A request can contain up to 100 article URLs. The frontend server is responsible for getting results for the article URLs and combining them into a single response.

Retriever

The Retriever runs a Ruby on Rails app. That app is heavily dependent on curb to make HTTP and HTTPS requests, and on my webpage text generation library.

The Retriever app is run by Passenger. NGINX receives the HTTPS requests from the frontend servers and sends them to Passenger.

Requests to the Retrievers have URL paths that incorporate the URL for which webpage text is needed. For example:

/1.0/articles/https%3A%2F%2Fwww.goldenhillsoftware.com%2Funread%2F

NGINX is configured to cache responses locally. If a Retriever has a cached response, it can return it without even sending the request to Passenger or the Ruby on Rails app.

Frontend Server

Each Frontend Server is configured with an ordered list of Retrievers. A Frontend Server determines which Retriever should process a request for a specific article URL by generating a numeric hash of that article URL, and using that hash to identify a server. This ensures that all requests for webpage text for a specific article URL are processed by a single Retriever as long as the set of retrievers does not change.

Like Retrievers, Frontend Servers are based on NGINX, Passenger, and a Ruby on Rails app. Each Frontend Server also has a memcached process that provides a second tier of caching for webpage text results for individual article URLs.

There is no specific limit to either the number of Retrievers or the number of Frontend Servers that can be deployed at any one time.


Please write to me at sales@goldenhillsoftware.com if you are considering using the Webpage Text API.

The Webpage Text API is on Product Hunt. Please consider upvoting it or participating in the conversation there.

In case you missed it, yesterday I announced the availability of the Webpage Text API: a cloud service that lets you easily retrieve the HTML for the content of a webpage without the junk (chrome, navigation, ads, and scripts) that tends to clutter modern webpages.

I am excited to announce the availability of the Webpage Text API, a cloud service that lets an app or web service request the HTML for the content of a webpage without the junk (chrome, navigation, ads, and scripts) that tends to clutter modern webpages.

The Webpage Text API has been powering the webpage text feature of Unread since I released Unread 2 in February 2020. It is perfect for RSS readers, read later services, browser extensions, newsbots, and other applications where the user wants the content of the webpage without the cruft.

I started developing the Webpage Text API for Unread in 2018, before Mercury Parser went open source. At the time Unread had webpage text retrieval capabilities powered by Readability.js. That worked well, but I needed the ability to cache webpage text and associated images ahead of time. It was impractical to generate webpage text for thousands of articles at a time on-device, so I researched server-based options.

At that time Mercury Reader provided an API and generously made it available for free. However their terms of service would not allow Unread to aggressively cache webpage text for articles ahead of time. The Mercury Parser source code had not yet been made public.

I looked into commercial options, but none fit my needs. So I started writing my own server-based system. I started by incorporating the heuristics used by Readability.js. I then added test cases from hundreds of different websites to improve the webpage text quality.

After Mercury Parser went open source, I evaluated whether it would be more suitable for generating webpage text for Unread. I discovered that I got higher quality results from my own Webpage Text API than I would from Mercury Parser. This inspired me to continue improving the Webpage Text API, and to now offer it as a commercial product.

Check out this demo to get a sense for the quality of results. To get started with the Webpage Text API, contact me at sales@goldenhillsoftware.com.

Unread 2.7.1 is available from the App Store. This update incorporates these improvements:

  • When invoked on Hacker News linked list articles, the share sheet and article actions now use the linked article URL instead of the Hacker News URL.
  • This update fixes a bug that would prevent Unread from showing webpage text for articles from Google Alert feeds.
  • This update fixes a bug that could cause Unread to crash when refreshing in the background.

If you are happy about this update or if you are excited about the recent linked list articles feature, please consider leaving a review.

Unread 2.7 is available from the App Store. This update improves the experience of reading linked list articles and adds other improvements.

Improved Handling of Linked List Articles

A linked list article is an article consisting of a link to another article and a quote from or comment on that article. Linked list articles are often found on Daring Fireball, The Loop, MacStories, Pixel Envy, Six Colors, and TidBITS.

Each subscription now has three different webpage text setting values that can be applied to traditional articles and to linked list articles:

  • Feed Text
  • Webpage Text
  • Both (Feed Text and Webpage Text)

When showing both, Unread will show the article content from the feed and the content from the linked article. When showing just feed text or webpage text for a linked list article, the swipe left menu will include a Feed & Webpage Text option.

Most articles from Hacker News are linked list articles. When Unread shows both feed and webpage text for a Hacker News article, you will see a link to the Hacker News comments page and the full article on one screen.

Screenshots of Unread showing linked list articles

This update also incorporates these improvements around linked list articles:

  • For new subscriptions and on new installations of Unread, Unread will now default to showing both feed and webpage text for linked list articles from some feeds.
  • This update improves Unread’s heuristics around determining what articles are linked list articles. It also adds heuristics around determining the URLs of the original article and the linked article when not explicitly specified by the feed or relayed by the feed syncing service.
  • When Unread can determine both the original article URL and the linked article URL, Unread will always use the URL of the original article for the share sheet and for article actions such as Instapaper. For example, when Unread determines that it is displaying a Daring Fireball article that links to another article, it will use the daringfireball.net URL for the share sheet and for article actions.
  • When showing both feed and webpage text for an article, the Mail Article action now incorporates both feed and webpage text into the outgoing message.
  • Unread will now remove images added to the feed text of Hacker News articles that are added by Inoreader. Such images will still be considered possible article list thumbnails.

Other Improvements

  • Context menus for individual subscriptions and folders now include a Mark All Read item. You can use that to mark all articles for that subscription or folder read without opening the article list. This menu item is not available when syncing with a Fever account.
  • This update incorporates improvements around finding article list thumbnails.
  • Table cells in webpage text now have borders.
  • Context menu items for subscriptions and folders are now available as accessibility rotor actions.
  • This update adds minor improvements around the display of smaller images in articles.
  • This update fixes a bug that could cause a crash when attempting to save a broken image to the photo library.

I am submitting a link to this announcement to Hacker News. The improvements in this update make the experience of following Hacker News much better. If you are excited about this update, please consider upvoting it on Hacker News.

Last week I released Unread 2.6 with full-text search capabilities, a compact article list option for iPhone, and more. In this post I will describe Unread’s compact article list option in detail.

While many customers enjoy Unread’s expansive article list format, others find that it takes too long to scroll through long lists of articles. The compact article list format is a more traditional iOS list of items with an optional thumbnail and summary. It continues Unread’s tradition of not truncating long titles or summary text.

Compact article list screenshots
Unread’s new compact article lists: one showing thumbnails and summaries, one showing only thumbnails, and one showing neither

Unread’s expansive article list format with its large thumbnails provides more of a magazine layout.

Compact article list screenshots
Unread’s expansive article lists: one showing thumbnails and summaries, one showing only thumbnails, and one showing neither

iPad

I believe that the iPad would also benefit from changes that make it easier to scroll through long lists of articles. I am seriously considering such changes for a future version of Unread. However the new compact article list format does not look good on the iPad’s large screen, so it is unavailable on iPad.

Other Article List Improvements

These article list improvements are available on the iPhone and iPad:

  • There is now an option to show author names in article lists.
  • When an article list summary contains text from a block quote, Unread will now wrap the quoted text in quotation marks. This makes summaries of linked list articles from Daring Fireball clearer.
  • This update improves the heuristics around determining whether an article’s author name is meaningful versus whether it is just the name of the website. Unread does not show the author name if it determines that the author name is just the website name or a placeholder.
  • Unread removes unnecessary prepositions from author name strings, and normalizes what appear to be lists of multiple author names. For example if a feed reports an author name of “by Moe, Larry, and Curly” Unread will now report the author name as “Moe, Larry, Curly”. The consistent formatting of author names makes the article list look much nicer when author names are shown.

If you have not done so already, you can get the newest version of Unread from the App Store.

Last week I released Unread 2.6 with full-text search capabilities, a compact article list option for iPhone, and more. In this post I will describe Unread’s search capabilities in detail.

Search Syntax

Unread will look for articles containing all of the words in a specified search. For example if you search for cats dogs, Unread will find an article with the text cats and dogs. Each word is treated as a prefix, so Unread will also find an article with that text if the entered criteria is ca do.

If you put quotes around a set of words, Unread will find an article that contains the exact phrase. The last word in the phrase can be the prefix of a word. For example if you search for “cats and do”, Unread will find an article with the text cats and dogs.

Comfortable Navigation via Touch

Unread’s search capabilities fit well within its comfortable gesture-based navigation.

To initiate a search from an article list, swipe the screen to the left and select Search Articles from the resulting menu. Then start typing your search criteria.

Before entering search criteria, you can dismiss the search box by simply tapping the content area of the screen. After starting to enter search criteria, you can exit the search by swiping the screen to the left and selecting Cancel Search from the resulting menu. There is also a Cancel button next to the search box, but you do not need to use it when tapping the top of the screen requires an uncomfortable reach.

You can expand the scope of a search to all articles, as opposed to those for the article list you are looking at, by swiping the screen to the left and selecting Search All from the resulting menu. If you are on an article list to which the Hide Read Items setting applies, you can also toggle Hide Read Items from the swipe left menu.

If you scroll up just past the top search result, the text field will get focus and the software keyboard will appear again so that you can edit your search criteria.

Hardware Keyboard Navigation

If you use an iPad with a hardware keyboard, you can take full advantage of Unread’s search capabilities without ever touching the screen.

You can invoke a search from an article list by typing command-f or command-option-f. You can cancel a search at any time by typing escape or command-period. Extend a search to all articles by typing command-shift-a. Toggle Hide Read Items from a list of search results by typing command-shift-h.

While the keyboard focus is on the search box, you can select the first search result row by typing down arrow. Similarly you can move the keyboard focus back to the search box by typing up arrow when the first search result is selected.

Searching for Subscriptions

When on the list of subscriptions and folders, you can swipe left and select Search Subscriptions from the resulting menu. You can also use the command-f or command-option-f hardware keyboard shortcuts. Categories, folders, smart streams, and tags matching the search criteria will also be included in search results.

Webpage Text

When webpage text of an article is cached, that article is searchable based on both its webpage text and its feed text.

Honoring Low Power Mode

While indexing articles for search is done efficiently, it requires some amount of additional power. When a device is in low power mode, Unread will only index the article’s title, subscription title, and article list summary. This is faster and uses even less battery power, but is sufficient for most searches. After low power mode is turned off, Unread will index the full text of articles that arrived while the device was in low power mode.

If you have not done so already, you can download the newest version from the App Store.

A few days ago I released Unread 2.6 with full-text search capabilities, a compact article list option for iPhone, and more. I just released Unread 2.6.1 with these improvements:

  • This update fixes a bug that caused empty article lists for some customers. A database update was required to make articles searchable, and that update failed on some devices. This update fixes this and will perform that update again if necessary.
  • When using the compact article list format and a custom article list text size, custom article list text sizes now scale more linearly between Medium and Galactic.
Banner image: search field

Unread 2.6 is available from the App Store. This update adds:

  • Full-text search capabilities
  • A compact article list option for iPhone
  • More flexible caching settings
  • A variety of additional improvements

I am particularly excited about this update. It is the most substantial feature update since version 2.0, and it delivers capabilities that customers have been requesting for some time.

Full-Text Search

This update adds the ability to search for articles. You can initiate a search by opening an article list, swiping left, and selecting Search Articles from the resulting menu. By default search results are limited to articles that are included in a particular article list, i.e., articles from the current subscription or folder. While on a list of search results you can toggle Hide Read Items or expand the scope of the search to all articles from the swipe left menu.

You can find an article based on words and phrases in its title, author name, or text. If webpage text for an article is cached, you can find it based on words and phrases in the webpage text that may not be in the feed text.

Subscriptions and folders are also searchable. Rather than scrolling all the way down to a specific subscription, you can find it quickly by selecting Search Subscriptions from the swipe left menu and typing the subscription name.

Unread will index your existing articles to make them searchable the first time you refresh your account after installing this update.

Search for articles with the word “desert”

Compact Article List Option for iPhone

On iPhone there is a new Format setting under Article Lists that can be set to Compact or Expansive. The compact format is designed for scrolling through long lists of articles quickly. The compact format uses less whitespace. It does not divide article lists into sections by date or by feed. When Show Thumbnails is enabled, the compact article list uses a smaller thumbnail image and places it to the right of the article title.

The expansive article list format also has minor spacing tweaks. The compact article list format is not available on iPad.

Compact article list without summaries
Compact article list with summaries

Caching Settings

The Caching setting is now split into two different settings, one for caching webpage text and another for caching images. You can now choose to always cache webpage text but to only cache images when on Wi-Fi, for example.

Additional Improvements

  • Recent Articles widgets can now be configured to show saved articles. This feature requires a subscription.
  • On iPad, article list images are now vertically aligned to the top of their respective article list entries.
  • When incorporating text from a blockquote into an article summary, Unread will now add quotation marks around the blockquoted text. This change will apply to new articles downloaded after installing this update.
  • A new Show Authors setting lets you show author names in article lists.
  • This update improves the heuristics around determining whether an article’s author name is meaningful versus whether it is just the name of the website. Unread tries to avoid showing an author name if that author name is the same as the name of the website or looks like a placeholder.
  • This update adds heuristics to clean up author name strings. If an author name starts with “by ”, that prefix will be removed. Unread tries to normalize author name strings representing multiple author names. Multiple author names are consistently delimited by commas (,).
  • When displaying an article list that excludes read articles, an article will no longer be removed after it has been read. Similarly when displaying an article list that contains only saved articles, an article will no longer be removed after being unsaved.
  • This update contains many minor improvements to the sets of menu items available in context menus and in swipe left menus, particularly around menu items that pertain to subscription management.
  • Pressing command-i on a hardware keyboard will now bring up the Edit screen for a subscription or folder if one is selected.
  • Pressing delete on a hardware keyboard will now delete the currently selected subscription, folder, or account after prompting for confirmation.
  • The Keep Read setting now defaults to One Month on new installations of Unread.
  • The Hide Read Items setting now defaults to off on new installations of Unread.
  • Mark Read on Scroll now works when scrolling with VoiceOver and with Voice Control.
  • This update removes the Premium Settings section from the Subscription screen.
  • This update fixes a bug that prevented animations from working when expanding a GIF inside an article.
  • This update fixes a bug that prevented feed title colors on the article screen from being random when using the Panic theme.
  • The previous feature update removed the ability to enable article actions without a subscription. With this update, article actions are not available without a subscription even if they were enabled with an older version of Unread or if the subscription since expired.
  • Mark All Read Below now marks articles read even if they are not loaded into the article list because you have not scrolled down that far.
  • This update fixes bugs around sharing articles to Tweetbot.
  • This update adds performance improvements around updating the widgets.
  • This update adds minor text alignment tweaks to images in article lists.
  • This update fixes a bug that resulted in bad thumbnail images for articles from routinehub.co under some circumstances.
  • Unread now uses Stacksift for crash reporting. I updated the privacy policy to reflect this.

Unread 2.6 requires iOS/iPadOS 14.5 or later. It is available from the App Store. If you are excited about this update please consider leaving a review in the App Store, writing about it on your blog, or telling your friends on social media.

Unread 2.5.4

John Brayton

June 9, 2021

Unread 2.5.4 is available from the App Store. This update fixes a bug that would prevent the swipe left menu from working from the article screen when using the developer beta of iOS/iPadOS 15.

Unread 2.5.3

John Brayton

May 19, 2021

Unread 2.5.3 is available with these improvements:

  • A new Send Mail In setting lets you choose whether to send email through an in-app Mail view or through the default email app on the device. This setting applies to the Mail Link article action and to contacting customer support. It does not apply to the Mail Article action because sending rich text requires the in-app Mail view.
  • This update adds VoiceOver improvements to the tutorial and adds a VoiceOver document to Technical Notes.
  • This update adds modest improvements to the algorithm that determines the best article list images for articles, and fixes some bugs around images with URLs that cannot be parsed.
  • Navigating via trackpad is smoother.
  • This update fixes bugs around sharing to DEVONthink.
  • This update fixes some layout bugs around Split View and Slide Over on iPad.

I am working on a much bigger update with some long-awaited capabilities, and had planned to incorporate these improvements into that update. I look forward to sharing that update soon, but in the meantime I wanted to get these fixes out.

I just released Unread 2.5.2 with some bug fixes:

  • This update fixes some bugs around articles and unread counts from feeds that were recently added to an account from another device.
  • This update fixes a bug that caused layout issues after opening Unread from one of its widgets under some circumstances.
  • This update fixes a bug that could cause crashes while Unread is refreshing in the background.
  • This update fixes a syncing bug that would occur on beta versions of iOS 14.5.

Last week I released an Unread update, version 2.5, with a significantly expanded free tier. Today I am releasing Unread 2.5.1 with these improvements:

  • The Unread 2.5 improvements around formatting tweets in articles from Substack feeds did not work when syncing with a NewsBlur account. This update fixes that.
  • This update fixes a bug that prevented the Priority Sources setting on medium and large Unread Counts widgets from working as expected under some circumstances.

If you enjoy using Unread please consider leaving a review in the App Store, writing a blog post about it, or telling your friends on social media.

Unread 2.5 is available. This update significantly expands Unread’s free tier and more.

Expanded Free Tier

This update removes the reading limits that were in place before purchasing a subscription. Until now, Unread without a subscription was essentially a free trial that became less functional after a certain amount of use. That was as I intended, but now I want more people to use Unread.

Adjusting some settings on home screen widgets now requires purchasing a subscription:

  • You now need a subscription to configure a Recent Articles widget to show articles from a specific feed or folder.
  • You now need a subscription to configure a medium or large Unread Counts widget to show unread counts by folder, or to show unread counts in a specific folder by feed.
  • You now need a subscription to configure a small Unread Count widget to show the count for a specific feed or folder.

Article actions such as Instapaper and Mail Link now require a subscription. If you enabled article actions in a prior version of Unread but have not purchased a subscription, those article actions will continue to work without a subscription until the next feature update.

Settings that require a subscription are marked as such on the Settings screen before a subscription is purchased.

Default Email App Support

If Apple Mail is not configured with any accounts and iOS is configured to use an alternative email app as the default, Mail Link will use that alternative email app.

The Mail Article action still requires Apple Mail. There is no way to send rich text to an alternative email app. The Article Actions settings screen no longer lets you enable Mail Article if Apple Mail is not setup with an account.

Additional Improvements

  • Unread now renders iframe elements from most feeds. This change will not apply to articles already downloaded to the device before installing this update.
  • This update improves the formatting of tweets quoted in articles from Substack feeds. This change will not apply to articles already downloaded to the device before installing this update.
  • This update fixes compatibility issues with the Display Zoom accessibility setting.
  • This update improves haptic feedback around Unread context menus for links and images in articles.
  • This update fixes a bug that prevented the ordering and grouping settings from working on lists of articles from Feed Wrangler Smart Streams, Feedbin Saved Searches, and Inoreader Active Searches under some circumstances.
  • This update fixes bugs that prevented Unread from working well with mice that have scroll wheels.
  • This update adds fixes for some formatting issues that affected MacRumors articles. This change will not apply to articles already downloaded to the device before installing this update.
  • This update improves the algorithm for finding article list thumbnails for articles with YouTube videos.
  • This update adds performance improvements around drawing images in article lists.
  • The About screen and the Unread Subscription screen have been reorganized.
  • Since iOS 14 lets you choose a default system browser other than Safari, I removed the Chrome and Firefox options from the Open Links In setting. I also removed the Open in Chrome and Open in Firefox share sheet actions.
  • Long pressing to show an Unread context menu now requires long pressing for a slightly longer period of time. This should avoid Unread context menus appearing when not expected.
  • Unread now consistently ignores articles (“a”, “an”, “the”) at the start of a subscription title when sorting those subscriptions.
  • Unread can now be launched with a feed: URL. Unread will respond by letting you subscribe to the feed specified by that URL. If you have other apps on your device that can be opened with feed: URLs, iOS might open Unread or it might open one of the other apps.
  • There is no longer an optional Article Actions menu item in the swipe left menu of an article. Article Actions are setup from the Settings screen.

Unread 2.5 requires iOS 14 or later.

2020 was a big year for Unread. I released version 2.0 in February. Unread 2 added the automatic webpage text feature with caching, article actions, subscription management, hardware keyboard navigation, multiple window support, and more.

Since then I have released these updates:

  • Version 2.1 added trackpad and mouse support for iPadOS 13.4 and the Magic Keyboard.
  • Version 2.2 added new article actions.
  • Version 2.3 added home screen widgets.
  • Version 2.4 added widget improvements, layout improvements, and Unread context menus as an option over system context menus.

The subscription pricing model is helping me to deliver frequent improvements. Plans are likely to change over time, but I anticipate at least three more 2.x updates before shipping Unread 3.0.

I have one big area of focus for Unread 3 and have started work on it. I am spending about half of my development time on that Unread 3 feature set, and about half of my development time on Unread 2.x updates. I want to continue delivering frequent updates while also making progress on Unread 3.

I want to thank all of my customers for using Unread. I always enjoy interacting with customers. Your suggestions, feature requests, and bug reports are always helpful.

Happy new year.