summaryrefslogtreecommitdiffstats
path: root/_posts/2019-09-23-RaptorCS-Blackbird-a-horror-story.md
blob: ff0cb63d397715f6bc08f8fa6d93dfbfbd3aca28 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
---
layout: post
title: "RaptorCS POWER9 Blackbird PC: An expensive mistake"
---

**November 2018**: Ordered [Basic Blackbird
Bundle](https://www.raptorcs.com/content/BK1B01/intro.html) w/32 GB RAM:
$1,935.64

**June 2019**

Order ships, and arrives without RAM. It had been long enough that I didn't
realize the order had only been partially fulfilled, so I order some RAM from
the [list of recommended chips][RAM] ($338.40), along with the other necessities
that I didn't purchase from Raptor: a case ($97.99) and a PSU ($68.49), and grab
some hard drives I have lying around. Total cost: about $2,440. Worth it to get
POWER9 builds working on builds.sr.ht!

[RAM]: https://wiki.raptorcs.com/wiki/POWER9_Hardware_Compatibility_List/Memory

I carefully put everything together, consulting the manual at each step, plug in
a display, and turn it on. Lights come on, things start whizzing, and the screen
comes to life - and promptly starts boot looping.

**June 27th**

Support ticket created. What's going on with my board?

**June 28th**

Support gets back to me the next day with a suggestion which is unrelated to the
problem, but no matter - I spoke with volunteers in the IRC channel a few hours
earlier and we found out that - whoops! - I hadn't connected the CPU power to
the motherboard. This is the end of the PEBKAC errors, but not the end of the
problems. The machine gets further ahead in the boot - almost to "petitboot",
and then the display dies and the machine reveals no further secrets.

I sent an update to the support team.

**July 1st**

> We have normally only seen this type of failure when there is a RAM-related
> fault, or if the PSU is underpowered enough that bringing the CPUs online at
> full power causes a power fault and immediate safety power off.
>
> Can you watch the internal lights while the system is booting, and see if the
> power LED cluster immediately changes from green to orange as the system stops
> responding over SSH?

The IRC channel suspects this is not related to the problem. Regardless, I reply
a few hours later with two videos showing the boot up process from power-out to
display death, with the internal LEDs and the display output clearly visible.

**July 4th**

"Any progress on this issue?", I ask.

**July 15th**

"Hi guys, I'm still experiencing this problem. If you're unsure of the issue I
would like to send the board back to you for diagnosis or a refund."

**July 25th**

> Sorry for the delay. Having senior support check out the videos.
>
> Thanks for writing back. We should have something for you by tomorrow during
> the day.

**July 31st**

> Hi Drew.
>
> The videos are being reviewed this week. Thank you for sending them.
>
> Please stay tuned.

**September 15th**

No reply from support. I have since bought a little more hardware for
self-diagnosis, namely the necessary pieces to connect to the two (or is it 3?)
serial ports. I manage to get a log, which points to several failures, but none
of them seem to be related to the problem at hand (they do indicate some network
failures, which would explain why I can't log into the BMC over SSH for further
diagnosis). And the getty is looping, so I can't log in on the serial console to
explore any further.

---

That was a week ago. Radio silence since.

So, 10 months after I placed an order for a POWER9 machine, 3 months after I
received it (without the RAM I purchased, no less), and over $2,500 invested...
it's clear that buying the Blackbird was an expensive mistake. Maybe someday
I'll get it working. If I do, I doubt the "support" team will have been
involved. Currently my best bet seems to be waiting for some apparent staff
member (the only apparent staff member) who idles in the IRC channel on Freenode
and allegedly comes online from time to time.

I'm not alone in these problems. Here are some (anonymized) quotes I've heard
from others while trying to troubleshoot this on IRC.

On support:

> Raptor has been burning a LOT of good will in the community. They really need
> a kick up the ass re: post-sales customer support.

> ugh, ddevault, yeah. [Blackbird ownership] has not been a smooth experience
> for me, either.

> my personal theory is that they have really bad ticket software that 'loses'
> tickets somehow

On reliability:

> I've found openbmc's networking to be... a bit unreliable... maybe 20% of the
> time it does not responed[sic]/does not respond fast enough to networking
> requests.

> yeah the vga handoff failing doesn't surprise me (other people here have
> reported it). but the BMC not getting a DHCP lease is odd. (well maybe not
> that odd if you look at the crumminess of the OpenBMC software stack...)

So, yeah, don't buy from Raptor Computer Systems. It's too large and unwieldly
to be an effective paper weight, either!

---

**Erratta**

*2019-09-24 @ 00:19 UTC*: Raptor has reached out and apologized for my support
experience. We are discussing these problems in more detail now. They have also
issued a refund for the unshipped RAM.

*2019-09-24 @ 00:51 UTC*: Raptor believes the CPU to be faulty and is shipping a
replacement. They attribute the delay to having to reach out to IBM about the
problem, but don't have a satisfactory answer to why the support process failed.
I understand it's being discussed internally.

*2019-09-24 @ 13:08 UTC*:

> After investigation, we are implementing new mechanisms to avoid support
> issues like the one you experienced. We now have a self-serve RMA generation
> system which would have significantly reduced your wait time, and are taking
> measures to ensure that tickets are no longer able to be ignored by front line
> support staff. We believe we have addressed the known failure modes at this
> time, and management will be keeping a close eye on the operation of the
> support system to ensure that new failure modes are handled rapidly.

They've tweeted this about their new self-service RMA system as well:

> We've made it easy to submit RMA requests for defective products on our Web
> site. Simply go to your account, select the "Submit RMA Request" link, and
> fill out the form.  Your product will be warranty checked and, if valid, you
> will receive an RMA number and shipping address!

— @RaptorCompSys via [Twitter](https://twitter.com/RaptorCompSys/status/1176432946670186498)

I agree that this shows positive improvements and a willingness to continue
making improvements in their support experience. Thanks to Raptor for taking
these concerns seriously. I hope to have a working Blackbird system soon, and
will publish a follow-up review when the time comes.