In the Unread 4.8 release announcement, I described the reliability improvements in Unread’s webpage text generation as follows:
When the webpage text cloud service is unable to generate webpage text for a specific article, Unread now falls back to the on-device webpage text generator that it uses for articles associated with website accounts. This makes webpage text generation much more reliable.
In this post I describe this set of improvements in more detail.
Two Webpage Text Generators
Unread has a cloud service that generates webpage text for most webpages. Unread asks it to generate webpage text for a URL, and the cloud service returns webpage text for that URL. Using this cloud service to generate webpage text reduces the amount of data and battery power that Unread needs from your device.
When I added the website accounts feature for paywalled articles, I added an on-device webpage text generator. Despite the benefits of using a cloud service, I did not want customer website credentials or account-protected information going through my servers. Unread’s servers never see information pertaining to website accounts, and have no part in generating webpage text when using a website account. Until this release, Unread only used the on-device webpage text generator when there was an applicable website account.
Locked Down Websites
I am seeing an increasing number of websites refusing webpage requests from clients coming from data centers and cloud platforms. I believe the rise of artificial intelligence has motivated websites to block requests for webpages that look like they come from an automated agent.
As of now, Unread’s webpage text cloud service is unable to generate webpage text for just under 2% of requested article URLs. Typically in those situations, my servers are getting a 403 Forbidden response from the website. The website appears to detect that the request is coming from a data center or cloud platform.
The on-device webpage text generator does not have this issue because it makes requests directly from your device. So now when Unread’s webpage text cloud service cannot generate webpage text for an article, Unread falls back to its on-device webpage text generator.
Pattern Detection
The webpage text cloud service tracks when it succeeds and when it does not succeed at generating webpage text from each website. It uses that data to maintain a list of websites from which it is consistently unable to generate webpage text. Unread maintains a synced copy of that list of websites. When Unread needs to generate webpage text from a website in that list, it uses the on-device webpage text generator without first asking the cloud service.
This avoids the delay of Unread first issuing a request to the cloud service when that will likely result in an error.
The End Result
In summary, this determines which webpage text generator Unread uses:
- If retrieving a webpage requires website account credentials, Unread uses its on-device webpage text generator. Unread’s servers have no part in generating webpage text when using a website account.
- If the webpage is from a website where the cloud service is consistently unable to generate webpage text, Unread uses the on-device webpage text generator.
- Otherwise, Unread asks the cloud service to generate webpage text.
- In the rare circumstance in which the cloud service tries generate webpage text and encounters an error, Unread will fall back on its on-device webpage text generator.
The update is available now from the App Store.