I’ve seen lots of clones of the Philips Ambilight system that extend a TV picture out onto the surrounding walls by shining coloured LEDs out from the back of the TV. I wanted to do the same with my TV but all the clones out there seemed to be software based, requiring a PC to be producing the TV picture which they then grab frames from, analyse to get the light colours and drive the lights. My TV picture comes from MythTV, so I already have the PC running permanently, but I was worried that the extra load might cause problems when watching blu ray rips. I also wanted to have a lot of LEDs and I didn’t want the lag that is often seen between the TV picture and the LED output.
So I thought I’d try making a completely hardware based Ambilight clone that directly takes in a 1080p HDMI signal.
Click here to jump to the demo videos.
UPDATE: Version 2 is here
Here’s what I wanted it to do:
- Direct HDMI input
- Full frame rate 1920×1080 @ 60 Hz
- Up to 8 strips of WS2811 LEDs
- Up to a total of 256 LEDs, each able to set its colour independently from any arbitrary area of the picture
Multiplying these numbers showed that this would need to do a lot of work to keep up. That’s just over 124 million pixels per second, and with 256 independently controlled LEDs it will need to decide if a pixel needs to contribute to a particular LEDs colour nearly 32 billion times per second, and that’s before it does the maths to calculate the actual colour.
As that was way beyond any microcontroller’s capability and beyond most microprocessors I decided to use an FPGA, specifically a Xilinx Spartan-3E in the form of a Papilio One 250K board.
HDMI Receiver Board
The first step was to create an HDMI receiver board for the Papilio and after a bit of research the ADV7611 from Analog Devices seemed to be a good (if quite expensive) chip to base it on. It’s a one chip solution to take in HDMI and output RGB pixel data on a 24-bit bus, apparently it will even handle HDCP protected content using built-in keys.
To make things easier Analog Devices have a Circuit Note titled ‘CN0282: USB Powered DVI/HDMI-to-VGA Converter (HDMI2VGA) with Audio Extraction‘ which is a complete design of a HDMI to VGA converter using this chip. I then just had to rip out all the parts of the design that I didn’t need and layout a board in Eagle, changing the ESD protection to a one chip solution stocked by Farnell and swaping the power supplies for linear regulators to save board space. I also added an eight channel 3.3V to 5V level shifter to drive the LED strips.
Once the board was laid out, I sent it off to Seeed Studio to be manufactured. One month and $25 later I received eleven copies of my board, ready to be populated.
A couple of evenings with the soldering iron later and the board was ready for testing.
Testing The Board
Testing began the same way it does with everything else, set the bench supply to a low current limit, power up the board and check voltages at various points to make sure everything is behaving as expected and that no magic blue smoke is leaking out. This initial check passed and as the ADV7611 is controlled via I2C the next step was to connect it to a Bus Pirate and attempt to query the chips ID and version.
This second test didn’t go so well, I couldn’t get a reliable answer from the chip and when it did respond it wasn’t with the values I expected. My first suspicion was that the pad on the bottom of the chip wasn’t soldered properly, as it’s the only ground for the whole chip it’s a critical connection and trying to solder it through a 2mm hole in the PCB is not easy. Luckily after applying more heat and more solder my suspicion was confirmed and the chip started talking.
With the chip responding it was time to try and get it to produce some output, this thing has a lot of registers to configure it, so this part involved a lot of time trawling through the datasheet, but eventually I figured out how to get it into ‘free run’ mode which is where it outputs a blue screen with internally generated video timing when it has no input. With this mode enabled I was able to see the horizontal and vertical video sync output signals and that the blue pins on the data bus were active.
The final and hardest part of testing the board was getting the HDMI input working. This involves setting a whole load more registers and setting the contents of the EDID EEPROM. Eventually after wading through the datasheet and various application notes I got it working, and with a Raspberry Pi connected as the video source I was able to change the colour of the screen on the Rasberry Pi and see the changes reflected on the output bus of the ADV7611.
The top level of the design for the FPGA contains three main components, an AVR compatible CPU and a UART, both from the CPU_LECTURE project at opencores.org, and an ambilight component, which is where the real work happens. I won’t spend any time describing the AVR and UART as they’re really well documented by their author. I’m using them to allow the ambilight component to be configured via the USB serial port on the Papilio and to configure the ADV7611 chip via I2C (which needs around 300 I2C writes to setup).
The block digram above gives an overview of the components and connections within the main ambilight component. On the far left are the inputs VIDCLK, R, G, B, HSYNC and VSYNC which all come directly from the pins of the ADV7611. Also on the left are CFGCLK, CFG_ADDR and CFG_DATA which connect the ambilight’s configuration RAM to the AVR CPU. On the far right is the OUTPUT bus that connects to the output pins that drive the LED strips.
The are two clock domains in this design. VIDCLK is the pixel clock from the ADV7611 (called LLC or Line Locked Clock in the datasheet) and at 1080p60 is running at 165Mhz. This clock is used by the hscale4, scaler and light averager components as well as both ports of the line buffer RAM, one port of the config RAM and one port of the result RAM. CFGCLK is generated in the FPGA and runs at 16MHz, it is used by all of the other components. The config and result dual port RAMs connect the clock domains.
FPGA Design: hscale4
The incoming video data is connected to the hscale4 component. This simply divides the horizontal resolution of the video by four. This is done to give the following components more than one clock cycle to preform the work for each pixel. I wasn’t able to get the remaining components to operate in a single clock cycle and still meet timing for 165MHz.
The internals of hscale4 are pretty simple, the 8-bit R, G and B buses go to separate adder trees which accumulate four pixels of data, at which point the bottom two bits are discarded to divide the result by four, producing a value that is the average of the last four pixels.
Additionally the HSYNC and VSYNC signals are delayed to account for the pipeline depth of hscale4 and two clock enable signals are generated CE2 and CE4, which respectively enable on every 2nd or 4th clock. CE4 isn’t actually used anywhere any more.
FPGA Design: scaler
The next stop is the scaler component which divides both the horizontal and vertical resolution of the video, dividing the horizontal by 8 and the vertical by 32, which with the scaling already performed by hscale4, results in scaling both dimensions by 32. This reduces the incoming 1920×1080 video to 60×34.
The operation of the scaler is similar to that of hscale4, except because it’s scaling in two dimensions it has to remember the accumulated count for each horizontal pixel to allow it to add to their counts for each of the 32 rows that are being averaged together. This is what the line buffer RAM is used for. It is sized to accommodate two lines of scaled video so that the scaler can be scaling incoming video into one line while the light averager component can be reading from the previously scaled line.
The scaler has four clock cycles of VIDCLK to process each incoming pixel from hscale4. It takes one clock cycle to read the previous accumulated value from the line buffer RAM and a second to pass that value through the adder trees and back into the RAM. Additionally, the scaler component is effectively running at half the frequency of VIDCLK by using the CE2 clock enable signal, which makes it much easier to get the design to meet timing.
As well as writing scaled video data to the line buffer RAM, the scaler also generates a LINE_READY signal to indicate that the next line is ready for processing and LINE_ADDR which is the line number that is ready.
FPGA Design: light averager
The light averager is the central component that does the real work, it reads light definitions from the config RAM, compares them against pixel data in the line buffer RAM and accumulates each light’s colour in the result RAM.
The config RAM is treated as an array of 32-bit values, one for each of the 256 light definitions. Those 32-bit values are broken down as follows:
- 6-bits: xmin (0 – 63)
- 6-bits: xmax (0 – 63)
- 6-bits: ymin (0 – 63)
- 6-bits: ymax (0 – 63)
- 4-bits: divisor shift count (0 – 15, divide by 1 – 32768)
- 3-bits: output pin (0 – 7)
- 1-bit: unused
So each light defines a rectangle of the scaled video that should be averaged to produce its final colour. To keep things simple the area of the rectangle must be a power of two so that the division that completes the average calculation can be performed by a right shift.
The result RAM is 72 bits wide, with one 72-bit value for each light. These values are broken down as:
- 21-bits: R (0 – 0x1FFFFF)
- 21-bits: G (0 – 0x1FFFFF)
- 21-bits: B (0 – 0x1FFFFF)
- 3-bits: output pin (0 – 7)
- 6-bits: unused
The R, G and B values in the result RAM are large enough to accumulate the value of every pixel on the screen, however the final step performs the right shift to divide them and calculate the average, and the next component will only read the lower 8-bits of each value. The light averager also copies the output pin address from the config RAM to the result RAM so that there are only two components that need access to the config RAM (the CPU and the light averager).
The operation of the light averager is controlled by one 16-bit counter. The bits of the counter are then used to create the following control signals, from least significant bit upwards:
- 1-bit: WRITE_CYCLE
- 8-bits: LIGHT_ADDR
- 6-bits: XPOS
- 1-bit OVERFLOW
These control signals are combined with the line address (or YPOS) that comes from the scaler so that for every pixel of video data, it iterates through every light definition, and for every light definition there is both a read and a write cycle.
The light averager always reads the current accumulated value for every pixel for every light, and it always passes this value and the new pixel value through the adder trees. It then generates a WRITE_ENABLE signal by comparing the current XPOS and YPOS against the light definition, and this WRITE_ENABLE signal controls whether the new value is actually written to the result RAM, and therefore whether the colour of the pixel contributes to the light’s final colour.
This is all performed in a three stage pipeline to keep the combinatorial delays low enough to meet timing.
FPGA Design: result distributor
The result distributor waits for the vertical blanking period to start, at which point it iterates through each of the 256 entries in the result RAM. It uses the output pin address to control the output demux and passes the final RGB colour of each light to the WS2811 driver.
It must complete its iteration through the results before the vertical blanking period ends as the light averager will start overwriting the result data as soon as the next frame begins.
FPGA Design: WS2811 driver
The WS2811 driver takes the parallel RGB data from its input and converts it to WS2811 compatible serial output. It uses the 16MHz clock, taking four clocks to output each bit. Over the four clock period it outputs ’1110′ for a ’1′ and ’1000′ for a ’0′, which keeps things nice and simple and is close enough to the correct timing to work.
The firmware serves three purposes, first it initialises the ADV7611 chip via I2C, second it populates the config RAM with the light definitions, and finally it provides an interactive interface via a serial terminal to allow the user to modify the light definitions, query the resulting light colours, and read and write on the I2C bus to play around with the config of the ADV7611.
Currently the interactive interface supports the following commands:
- Set light definition:
- S light_number x_min x_max y_min y_max divisor_shift_count
- Get light definition:
- G light_number
- Get light result:
- R light_number
- Initialise I2C:
- I I
- Configure ADV7611 (from compiled in table of I2C address/values):
- I C
- I2C Write:
- I W address value
- I2C Read:
- I R address
Below are videos of the first time the whole system was tested. For this test my MythTV frontend PC is mirroring the display so that the same output goes to both the HDMI and DVI outputs. The HDMI connection goes to the TV, the DVI goes through a DVI to HDMI cable and into the HDMI Light. Both outputs are 1920×1080, the first test is 50Hz (UK TV), the second is at 24Hz (Blu Ray Rip).
For this test there is a single strip of LEDs draped over the back of the TV and there are 32 LEDs on the left edge, 57 LEDs on the top edge and 32 LEDs on the right edge, for a total of 121 LEDs.
As you can see the alignment between the LEDs and the TV is slightly off. Also, I believe the LEDs are updating just ahead of the TV, so I may have to introduce some artificial lag to match the TVs internal processing delays.
For anyone attempting to recreate any of this, here’s a list of known issues:
- The power jack on the HDMI receiver board is connected wrong, don’t populate it, instead add a link to bridge the 5V from the Papilio board to the HDMI receiver 5V rail.
- The 3.3V to 5V level shifter IC is not suitable for the job as it’s not capable of driving a signal down any significant length of wire.
- LLC (VIDCLK) isn’t connected to an FPGA pin that’s on a clock net so the external pin mapping and board layout needs rearranging to be able to get a more optimal routing within the FPGA. However, it does seem to work fine as it is with CLOCK_DEDICATED_ROUTE set to FALSE for that pin.
- Added 10/11/2014: The pull-up resistors for the DDC and CEC lines were incorrect and prevented the EDID from being retrieved, this has now been fixed in the eagle files. The changes were:
- R1: 27R -> 27K
- R2: 47R -> 1K5
- R3: 47R -> 1K5
- R7: 47R -> 47K
- R8: 47R -> 47K
- R9: 100R -> 100K
UPDATE 14/12/2014: There’s now a version 2 so these links are out of date.
Download HDMI receiver board schematic: png
Download HDMI receiver board gerbers: zip