Enhance Dataset API For Faster Rendering
The Challenge: Inefficient Data Serving
Have you ever found yourself waiting for a web page to load, especially when dealing with large datasets? We've all been there. Our current system for serving data to one of our dataset views, unfortunately, suffers from significant inefficiencies. It's like trying to pour a whole lake through a tiny straw – it just doesn't scale well, especially when you're working with massive amounts of information. The core of the problem lies in a single, overloaded endpoint, which we'll call api/[table]/map.ts for simplicity. This endpoint is the gatekeeper for all data requests for this view, and it's doing way too much heavy lifting. Every time someone visits a dataset view page, this endpoint gets called, and it embarks on a multi-step journey:
First, it fetches all records from the database. Imagine asking for a specific book in a library and the librarian bringing you every single book ever written before searching for yours. That's essentially what's happening here. Once it has this overwhelming collection, it then proceeds to apply filtering based on specific configuration parameters. This is followed by transforming columns and values to make them more human-readable. Think of it as translating a complex technical manual into everyday language. If the data is intended for a map, it then goes through a further transformation to convert the result into GeoJSON. Finally, the fully processed dataset is returned in the response.data.
This convoluted process, while perhaps functional for smaller datasets, creates a cascade of problems as the data grows. We're talking about large payloads and high latency. When datasets balloon in size, the API response becomes gargantuan. This doesn't just mean a longer wait time; it translates to several agonizing seconds of backend processing and additional frontend rendering delays. Mapbox, for instance, can really struggle when it has to paint a massive amount of data onto its canvas.
Beyond the wait time, we encounter browser memory pressure. For exceptionally large datasets, the browser simply can't handle holding the entire transformed dataset in memory. This can lead to the browser becoming unresponsive, freezing, or even crashing entirely. Users are left frustrated, and the experience is far from ideal. Furthermore, the API logic is over-coupled. The filtering and presentation-layer transformations are deeply intertwined with the data access. This means that other functionalities, like download buttons for CSV, GeoJSON, or KML, are forced to work with this modified data. They can't access the original, raw records directly from the database. This is a significant drawback because users typically want the original, canonical data for downloads, not a processed version.
The Solution: A Smarter, Modular Approach
Fortunately, there's a much better way to handle this! By refactoring our dataset view APIs, we can significantly improve performance, scalability, and user experience. The key is to split data access into purpose-specific endpoints. Instead of relying on one monolithic endpoint that does everything, we'll introduce smaller, focused APIs, each designed for a particular task. This modular approach will streamline processes and reduce the burden on individual endpoints.
Here's how we envision this working:
-
GET api/[table]/map: This endpoint will be specifically designed to serve the minimal data required for rendering a map view. We're talking about essential information like the geometry of features, a stable record identifier (so we can easily reference individual records), and any fields necessary for styling or filtering, such as color codes or categorization columns. By sending only what’s needed for the map, we dramatically reduce payload size and backend processing time for this common view. -
GET api/[table]/[recordId]: This endpoint will be dedicated to retrieving the full, raw record for a single feature. This is incredibly useful for scenarios where a user needs detailed information about a specific data point. For example, when a user clicks on a point on the map, this endpoint will be called on demand to fetch all the original attributes of that specific record, without any transformations applied. -
POST api/[table]/records(getMany): For views that require displaying details for multiple records simultaneously, like ourGalleryview, we'll introduce agetManyendpoint. This will allow fetching multiple raw records in a single, efficient request, rather than making numerous individual calls. This is particularly beneficial for improving the loading speed and responsiveness of list-based or gallery-style views.
To further enhance performance and prevent redundant work, we'll implement client-side caching and request de-duping. Imagine clicking on several map points in quick succession. Without caching, each click might trigger a new API request for the same data. With client-side caching, we can store fetched records (mapping recordId to the record data) directly in the browser's memory. This means subsequent requests for the same record will be served instantly from the cache. Furthermore, request de-duping ensures that if multiple components on the page happen to request the same record(s) concurrently, they will share a single network call and response, rather than initiating duplicate requests. This is a powerful technique for optimizing resource usage and speeding up perceived load times.
Crucially, we will move filtering and transformation client-side. By shifting these presentation-layer operations from the backend to the frontend (perhaps within components like DataFeature), we decouple data access from presentation logic. This significantly reduces backend latency and allows for more dynamic and interactive filtering directly in the user's browser. It also means the API primarily serves raw, unadulterated data, making it more versatile for different use cases.
For data export functionalities, we'll introduce dedicated export endpoints. These new endpoints will specifically return raw, untransformed records. Instead of loading everything into memory and then streaming, they will be designed to stream results directly to disk. This prevents memory pressure during exports and ensures users get the exact data they expect from the database.
To further shrink the data traveling over the network, we will enable response compression, likely through a tool like @nuxtjs/precompress. Enabling gzip encoding for API responses will drastically reduce payload sizes, leading to faster downloads and a snappier user experience, especially for users with slower internet connections.
Looking ahead, we can also explore spatial querying. For map views, this means we could eventually return only those features that physically intersect the current map viewport. This is an advanced optimization that would further reduce the amount of data transferred and processed, making map interactions incredibly fluid, even with extremely large geographic datasets.
If any of these steps require more detailed tracking, we can certainly create sub-issues to manage the work. We propose starting a dedicated branch, perhaps named dev/api-refactor, where we can submit individual, manageable pull requests. This iterative approach will allow us to build and test the changes incrementally, ensuring a robust and successful refactor.
For more information on API design best practices, you can refer to resources like Google's API Design Guide and REST API Best Practices. These guides offer valuable insights into creating efficient, scalable, and maintainable APIs. For deeper dives into client-side caching strategies, exploring topics like HTTP caching and browser storage APIs can be very beneficial. Understanding these concepts will not only help in implementing the proposed changes but also in designing future features with performance in mind.